notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-24 01:11:38 -05:00

Author	SHA1	Message	Date
Sakis	88a6b7729e	Merge pull request #3082 from alphagov/fix-sender-logging Add disk space check for sender worker	2021-01-06 10:58:30 +02:00
sakisv	9bb9070ba0	Add disk space check for sender worker Reused the existing `ensure_celery_is_running` function to terminate the script	2021-01-04 14:01:19 +02:00
Chris Hill-Scott	d55b66a6d8	Merge pull request #3075 from alphagov/cache-provider-lookup Cache provider lookups for 10 seconds	2021-01-04 10:00:25 +00:00
Leo Hemsted	386c3671bb	Merge pull request #3073 from alphagov/pyup-scheduled-update-2020-12-23 Scheduled weekly dependency update for week 51	2020-12-31 14:37:36 +00:00
Leo Hemsted	4814c66c1d	fix schema metaclasses marshmallow v0.22.0 added load_instance and include_relationship options, which we need to keep old ModelSchema code working	2020-12-31 14:13:05 +00:00
Leo Hemsted	a33ec5c7f1	remove deprecated ModelSchema class	2020-12-31 13:56:20 +00:00
Leo Hemsted	ee2bec2f72	pin marshmallow-sqlalchemy to keep marshmallow <=3.0 dep	2020-12-31 13:56:18 +00:00
Leo Hemsted	156c7aa32a	bump python client brings in jwt2.0 compat	2020-12-31 13:56:04 +00:00
Leo Hemsted	1da16eda23	freeze reqs	2020-12-31 13:55:37 +00:00
pyup-bot	b298440f00	Update sqlalchemy from 1.3.20 to 1.3.22	2020-12-31 13:55:37 +00:00
pyup-bot	97d35b86b5	Update pyjwt from 1.7.1 to 2.0.0	2020-12-31 13:55:37 +00:00
pyup-bot	659a43e435	Update cachetools from 4.1.1 to 4.2.0	2020-12-31 13:55:37 +00:00
pyup-bot	e4c5633150	Update eventlet from 0.29.1 to 0.30.0	2020-12-31 13:55:37 +00:00
pyup-bot	0c0821b9f9	Update prometheus-client from 0.8.0 to 0.9.0	2020-12-31 13:55:37 +00:00
pyup-bot	39877e1e40	Update marshmallow-sqlalchemy from 0.23.1 to 0.24.1	2020-12-31 13:55:37 +00:00
pyup-bot	e560b4a972	Update flask-marshmallow from 0.11.0 to 0.14.0	2020-12-31 13:55:37 +00:00
pyup-bot	20994c2d5d	Update cffi from 1.14.3 to 1.14.4	2020-12-31 13:55:37 +00:00
David McDonald	57f5bd76de	Merge pull request #3081 from alphagov/ses-error-logs SES error logs	2020-12-31 13:13:20 +00:00
Leo Hemsted	d470c928cd	Merge pull request #3072 from alphagov/doc-dl-exc handle doc dl connection errors correctly	2020-12-31 11:24:00 +00:00
David McDonald	56879d0d22	Make sure error message is logged as part of the exception	2020-12-31 11:08:09 +00:00
Chris Hill-Scott	8834377a5d	Merge pull request #3074 from alphagov/serialise-process-type Serialise process_type for template history	2020-12-31 09:54:00 +00:00
Chris Hill-Scott	624bd1d12e	Make function-level setup fixture clear cache This means that anyone adding a new test to this file doesn’t have to remember to clear the cache in their test, or forget to and have a hard-to-debug test failure. Using `setup_function` means we don’t have to convert this module into using class-based tests.	2020-12-31 09:37:07 +00:00
Chris Hill-Scott	55afc9a401	Increase provider lookup cache TTL to 10 seconds Tested locally with TTL values of: - 2 seconds - 5 seconds - 10 seconds The benefit really started showing at 10 seconds, where >50% of lookups hit the cache rather than the database. For graphs see https://github.com/alphagov/notifications-api/pull/3075#issuecomment-750836404	2020-12-31 09:36:55 +00:00
David McDonald	977554781f	Add better logging message for tech failure So we can easily identify which notification ID failed	2020-12-30 17:28:21 +00:00
David McDonald	2480f91667	Raise better exception on InvalidParameterValue error There are several reasons why we might get an `InvalidParameterValue` from the SES API. One, as correctly identified before in https://github.com/alphagov/notifications-api/pull/713/files is if we allow an email address on our side that SES rejects. However, there are other types of errors that could cause an `InvalidParameterValue`. One example is a `Header too long: 'Subject'` error that we have seen happen in production. This shouldn't raise an `InvalidEmailError` as that is not appropriate. Therefore, we introduce a new exception `EmailClientNonRetryableException`, that represents any exception back from an email client that we can use whenever we get a `InvalidParameterValue` error. Note, I chose `EmailClientNonRetryableException` rather than `SESClientNonRetryableException` as our code needs to catch this exception and it shouldn't be aware of what email client is being used, it just needs to know that it came from one of the email clients (if in time we have more than one). In time, we may wish to extend the approach of having generic `EmailClient` exceptions and `SMSClient` exceptions as this should be the most extendable pattern and a good abstraction.	2020-12-30 17:18:16 +00:00
David McDonald	2079202160	Stop logging email addresses for SES errors We shouldn't be logging PII so we should not log email addresses. We remove the email address and just log the normal exception message. Note, this meant before that you could see the email address and more easily track down the notification ID in the database. Now instead, you will need to search in the DB for notifications that have gone into technical failure at the time of the log message (as we still don't log the notification ID alongside the failure).	2020-12-30 17:18:15 +00:00
David McDonald	6a95925897	Merge pull request #3078 from alphagov/up-memory Add more memory for the sender and letter workers	2020-12-29 15:50:49 +00:00
Sakis	5e08cc7bc6	Merge pull request #3076 from alphagov/fix-app-logging Fix app logging	2020-12-24 18:55:58 +02:00
sakisv	1bfdac8417	Temporarily remove disk space check from multi_worker script There seems to be some kind of complication in this script that doesn't allow it to terminate properly. This is being removed for now to allow deploying the rest of the fixes in time for the holiday period.	2020-12-24 18:44:26 +02:00
Chris Hill-Scott	c64e935168	Merge pull request #3079 from alphagov/pass-language-to-lambda Pass language through to lambda	2020-12-24 16:06:40 +00:00
Chris Hill-Scott	9825469613	Make language attributes abstract properties This will make it impossible to create a new client without at least having to define these properties. Which should get someone thinking about language support…	2020-12-24 15:19:46 +00:00
Chris Hill-Scott	c3a1d5c506	Pass language through to lambda If we’re sending non-GSM characters, we need to mark the language in the XML as Welsh (`cy-GB` in CAP, `Welsh` in IBAG). Currently, the CBC proxy checks the content we’re sending, and then uses an approximation based on ASCII to determine whether we’re sending any non-GSM characters, and if so, sets the language appropriately. Instead, we should can functionality from the notifications-utils repo to determine the language. If any non-GSM characters are used, then the we can set the language to Welsh. We’ll need to update the proxy to look at this new language flag.	2020-12-24 15:15:32 +00:00
David McDonald	1ac3ca250c	Add more memory for the sender and letter workers On monday, we had a build of emails in the email queue that weren't getting picked up by the sender worker and causing delays. After further investigation with Andy from the PaaS, we believe the following happened. We received a bunch of traffic at 8:30ish which consisted of some very large emails in terms of their length and complexity. The amount of memory used by the app instances got very high and a few apps crashed due to OOM (recorded by 5 cf app event crashes). When new app instances tried to spin up, they weren't able to as they potentially also ran out of memory immediately. This left us in the position of having fewer app instances than we needed, on top of which they were all using a very large amount of CPU and may have been limited how quickly an individual app instance would process tasks. This meant that we were overall processing fewer tasks then we needed to and our queue of emails started to build up. So it appears our sender workers did not have the memory available that they needed. By looking at a graph for the past 30 days of memory usage on the sender workers, we see that it on several days breached 90% memory usage for long periods of time. This in combination of the hypothesis above of what happened leads us to decide that we want to give the app instances a bigger memory quota so it has been upped from 3GB to 4GB. Whilst doing, I also looked at long term memory usage graphs for our other workers and saw that the letters worker was similarly close to around 90% of memory used so have taken the opportunity to bump that too.	2020-12-24 15:03:39 +00:00
David McDonald	701c7ba80d	Merge pull request #3077 from alphagov/freezetime Fix test that fails after 5:30pm	2020-12-24 10:35:53 +00:00
David McDonald	9aba3d758b	Fix test that fails after 5:30pm Was failing when ran after 5:30pm as this would cause the letters to be in a different subfolder (for one day later). Solved by freezetiming it Example build that failed: https://cd.gds-reliability.engineering/builds/1876957	2020-12-24 09:57:52 +00:00
sakisv	a6ecfd66b6	Terminate instance if it's running out of disk space	2020-12-23 19:40:04 +02:00
sakisv	2108498eb1	Send worker-sender celery logs to /dev/null We are using our custom logger to log to `NOTIFY_LOG_PATH`, so this logging from celery is neither needed nor desired. We also need to define the location of the pidfiles, because of what appears to be a bug in celery where it uses the location of logs to infer the location of the pidfiles if it is not defined, i.e. in this case it was trying to find the pidfiles in `/dev/null/%N.pid`.	2020-12-23 19:39:56 +02:00
Chris Hill-Scott	b8a191c3e3	Merge pull request #3071 from alphagov/add-flake8-bugbear Do extra code style checks with flake8-bugbear	2020-12-23 16:20:48 +00:00
Chris Hill-Scott	51b7192750	Cache provider lookups for 2 seconds For every email or text message we send we have to work out which provider to send it with. Every time we do this we go and load the list of providers from the database. For emails, the result will always be the same. For text messages the result is randomly chosen to balance the load between the providers. For international text messages the result is always the same (we only have one international text message provider). This commit adds an in-memory cache with a 2 second TTL so that we’re not fetching the providers from the database every time, which should speed things up a bit. This does mean that, for text messages, the random choice will ‘stick’ for two seconds on each instance, before being re-chosen. I think this is OK because it will even out to the same distribution over time. I really don’t like having to clear the cache in the tests, so would welcome suggestions on a better way of doing this…	2020-12-23 16:17:27 +00:00
Chris Hill-Scott	b1dc8cc758	Serialise process_type for template history We already serialise it in the templates response. We should make sure the field is also present in the history response, if we want to use cached template versions when processing notifications.	2020-12-23 13:57:38 +00:00
Rebecca Law	a2bb775b6f	Merge pull request #3069 from alphagov/add-request-id-if-in-context Pass request_id onto the task if called from a task	2020-12-23 12:26:04 +00:00
Chris Hill-Scott	0ed1f32972	Move flake8 config into setup.cfg At the moment the two are duplicative, and it’s not clear which takes precedence. We need to keep the `excludes` line otherwise running the `flake8` command takes ~3 times longer and may lint 3rd party files that we don’t control.	2020-12-23 12:23:15 +00:00
Leo Hemsted	325f271e25	handle doc dl connection errors correctly previously we'd see an error message in the logs: `AttributeError: 'NoneType' object has no attribute 'status_code'` because we were assuming the requests exception would always have a response - it won't have a response if it wasn't able to create a connection at all.	2020-12-23 12:21:24 +00:00
Rebecca Law	a1b31a6c20	Check for app_context and request in g to prevent Attribute Errors. We can add a request_id for tasks that are not spawned by an HTTP request, for example scheduled or nightly tasks. That means you can match up all the tasks spawned by a single task, for example, create-night-billing spawns 4 tasks, those would all have the same idea. Not sure if that is helpful or not. Also it might be confusing to have a request_id for logs that were not started from a request so I have left it out.	2020-12-23 09:47:47 +00:00
Chris Hill-Scott	3b0b96834d	Do extra code style checks with flake8-bugbear Flake8 Bugbear checks for some extra things that aren’t code style errors, but are likely to introduce bugs or unexpected behaviour. A good example is having mutable default function arguments, which get shared between every call to the function and therefore mutating a value in one place can unexpectedly cause it to change in another. This commit enables all the extra warnings provided by Flake8 Bugbear, except for: - the line length one (because we already lint for that separately) - B903 Data class should either be immutable or use `__slots__` because this seems to false-positive on some of our custom exceptions - B902 Invalid first argument 'cls' used for instance method because some SQLAlchemy decorators (eg `declared_attr`) make things that aren’t formally class methods take a class not an instance as their first argument It disables: - _B306: BaseException.message is removed in Python 3_ because I think our exceptions have a custom structure that means the `.message` attribute is still present Matches the work done in other repos: - https://github.com/alphagov/notifications-admin/pull/3172/files	2020-12-22 16:26:45 +00:00
Rebecca Law	025b51c801	If the request_id exists in the Flask global context, add it to the kwargs for the task. The request_id is set is the task is created from a http request, if that task then calls through to another task this will set the request_id from the global context. We should then be able to follow the creation of a notification all the way from the original http request to the sending task.	2020-12-22 15:21:32 +00:00
David McDonald	7152ac7cba	Merge pull request #3068 from alphagov/bad-logging Fix logging line to include response context	2020-12-22 11:26:24 +00:00
David McDonald	fae7e917b5	Fix logging line to include response context We have been getting log lines of the following: `API POST request on https://api.notifications.service.gov.uk/notifications/sms/mmg failed with None` It's not clear what error caused the request to fail because the value of `api_error.response` is always `None`. There appears to be something wrong with this logging. `raise_for_status` will raise an `HTTPError`, so then there should be no reason to then pass that error into another `HTTPError` (which is causing the response to be lost). We can instead simply catch the `HTTPError` and log it's status code. This might not be perfect, but it's definitely an improvement and should give us some more context about why these requests occasionally fail.	2020-12-21 14:39:10 +00:00
Pea M. Tyczynska	2749a707f2	Merge pull request #3067 from alphagov/fix-cancel-broadcast Fix cancel broadcast by converting reference date to string	2020-12-21 13:56:19 +00:00
David McDonald	b40a9c0e83	Merge pull request #3063 from alphagov/letter-retention-fixes-third-approach Do not let us delete letters that have not reached a final state	2020-12-21 12:51:29 +00:00

1 2 3 4 5 ...

7784 Commits