notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-24 09:21:39 -05:00

Author	SHA1	Message	Date
Chris Hill-Scott	0aa7cf1aaf	Tell Pyup to ignore outdated Eventlet version We already do this in the admin app: https://github.com/alphagov/notifications-admin/pull/3876/files Upgrading Eventlet is blocked until this change in Gunicorn is released: https://github.com/benoitc/gunicorn/pull/2581/files	2021-11-15 11:14:34 +00:00
Chris Hill-Scott	6c0bda0388	Bump Celery to latest version This brings in the version 5.2.1 of Kombu, which fixes a security vulnerability: > Celery 5.2.0 includes 'kombu' v5.2.1, which includes dependencies > updates that resolve security issues. — https://pyup.io/repos/github/alphagov/notifications-api/commits/?page=1#b654c27699a5164cbbe50e042d5d34141f560255 This is the commit from Kombu: `f3b04558fa` I believe the dependency of Kombu which has issues is urllib3, which has two open advisories for versions less than 1.26.5: - https://github.com/urllib3/urllib3/security/advisories/GHSA-q2q7-5pp4-w6pg - https://github.com/urllib3/urllib3/security/advisories/GHSA-5phf-pp7p-vc2r	2021-11-15 11:12:33 +00:00
David McDonald	608ef12573	Merge pull request #3367 from alphagov/better-log-message Improve log message searchability for duplicate receipts	2021-11-15 09:38:30 +00:00
David McDonald	c98996a461	Improve log message searchability for duplicate receipts There were two problems with the existing message. 1. There was no space between the new status and the time taken which made reading and searching harder 2. They key bits of information (before and after status) were separated by the time taken (which will always be unique) meaning you couldn't do an easy search for a message that is say in delivered being attempted to be set to temporary-failure.	2021-11-12 14:06:38 +00:00
Ben Thorner	48e1482d90	Merge pull request #3366 from alphagov/celery-extend-request-id-180213914 Extend request tracing to cover Celery logs	2021-11-12 11:10:38 +00:00
Ben Thorner	d66c68d6d6	Merge pull request #3364 from alphagov/celery-headers-request-id-180213914 Move Celery task Request ID injection into headers	2021-11-12 11:10:29 +00:00
Ben Thorner	4a577eca62	Merge pull request #3359 from alphagov/improve-clarify-botocore-exception-180017131 Improve and clarify large task error handling	2021-11-12 11:10:17 +00:00
Ben Thorner	1872854a4e	Improve and clarify large task error handling Previously we were catching one type of exception if something went wrong adding a notification to the queue for high volume services. In reality there are two types of exception so this adds a second handler to cover both. For context, this is code we changed experimentally as part of the upgrade to Celery 5 [1]. At the time we didn't check how the new exception compared to the old one. It turns out they behaved the same and we were always vulnerable to the scenario now covered by the second exception, where the behaviour has changed in Celery 5 - testing with a large task invocation gives... Before (Celery 3, large-ish task): 'process_job.apply_async(["a" * 200000])'... boto.exception.SQSError: SQSError: 400 Bad Request <?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>InvalidParameterValue</Code><Message>One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes.</Message><Detail/></Error><RequestId>96162552-cd96-5a14-b3a5-7f503300a662</RequestId></ErrorResponse> Before (Celery 3, very large task): <hangs forever> After (Celery 5, large-ish task): botocore.exceptions.ClientError: An error occurred (InvalidParameterValue) when calling the SendMessage operation: One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes. After (Celery 5, very large task): botocore.parsers.ResponseParserError: Unable to parse response (syntax error: line 1, column 0), invalid XML received. Further retries may succeed: b'HTTP content length exceeded 1662976 bytes.' [1]: `29c92a9e54`	2021-11-11 17:37:50 +00:00
Leo Hemsted	850cdc16a0	Merge pull request #3363 from alphagov/py39 Python 3.9	2021-11-11 15:04:37 +00:00
Leo Hemsted	ccdba8a0d8	make pyup point at new files '(requirements-dev hasn'\''t existed for ages anyway)'	2021-11-11 13:54:21 +00:00
Leo Hemsted	036bc92245	switch from freeze reqs script to pip-tools instead of alexey's home-grown script, pip-tools offers a quicker, more efficient and better supported way to freeze requirements. see prior art here: https://github.com/alphagov/notifications-admin/pull/3753 https://github.com/alphagov/notifications-ftp/pull/333	2021-11-11 13:54:21 +00:00
Leo Hemsted	6b5d7ca639	switch to python 3.9	2021-11-11 13:54:14 +00:00
Ben Thorner	ac06529128	Enable request tracing on Celery success/fail logs Previously these logs wouldn't have a Request ID attached since the Celery hooks run after the __call__ method where we enable request tracing for normal application logs. For the failure log especially it will be useful to have this feature.	2021-11-10 18:04:20 +00:00
Ben Thorner	369a9f7521	Refactor queue_name and request_id into properties This reduces the complexity of the original functions, which will go up a bit in the next commit.	2021-11-10 18:04:19 +00:00
Ben Thorner	89a8dd1a03	Move Celery task Request ID injection into headers Previously we passed along this piece of state via the kwargs for a task, but this runs the risk of the task accidentally receiving the extra kwarg unless we've covered all the code paths that could invoke it directly e.g. retries don't invoke __call__. This switches to using Celery "headers" to pass the extra state. It turns out that a Celery has two "header" concepts, which leads to some confusion and even a bug with the framework [1]: - In older (pre v4.4) versions of Celery, the "headers" specified by apply_async() would become _the_ headers in the message that gets passed around workers, etc. These would be available later on via "self.request.headers". - Since Celery protocol v2, the meaning of "headers" in the message changed to become (basically) _all_ metadata about the task [2], with the "headers" option in apply_async() being merged [3] into the big dict of metadata. This makes using headers a bit confusing unfortunately, since the data structure we put in is subtly different to what comes out in the request context. Nonetheless, it still works. I've added some comments to try and clarify it. Note that one of the original tests is no longer necessary, since we don't need to worry about argument passing styles with headers. [1]: https://github.com/celery/celery/issues/4875 [2]: `663e4d3a0b (diff-07a65448b2db3252a9711766beec23372715cd7597c3e309bf53859eabc0107fR343)` [3]: `681a922220/celery/app/amqp.py (L495)`	2021-11-10 18:03:40 +00:00
Katie Smith	770d323274	Merge pull request #3341 from alphagov/rebuild-letters Add task to recreate the PDF file for a non-templated letter	2021-11-10 11:18:16 +00:00
Katie Smith	3cffba6d09	Add command to run `recreate_pdf_for_precompiled_or_uploaded_letter` We already had the `replay-create-pdf-for-templated-letter` command. This adds a new command, `recreate-pdf-for-precompiled-or-uploaded-letter` which does the same thing but for non-templated letters.	2021-11-10 09:51:31 +00:00
Katie Smith	3d4796c924	Add task to resanitise and replace a PDF for precompiled letter This adds a task which is designed to be used if we want to recreate the PDF for a precompiled letter (either one that has been created using the API or one that has been uploaded through the website). The task takes the `notification_id` of the letter and passes template preview the details it needs in order to sanitise the original file and then replace the version in the letters-pdf bucket with the freshly sanitised version.	2021-11-10 09:51:31 +00:00
Katie Smith	ec9c3cac5f	Rename `replay_create_pdf_letters` command This changes the name to make it clearer that this command is for templated letters only, and not for PDF letters.	2021-11-10 09:51:31 +00:00
Ben Thorner	ff78ea3232	Merge pull request #3360 from alphagov/test-limit-timeout-notifications Optimise query to get notifications to "time out"	2021-11-09 15:54:04 +00:00
Ben Thorner	cdb43fbaf6	Only loop timeout task if there's more work Previously this would repeat the task even the current iteration of the loop had processed a non-full batch. This could cause the task to error incorrectly if one or two notifications breach the timeout threshold in between iterations.	2021-11-09 15:41:14 +00:00
Ben Thorner	77c8c0a501	Optimise query to get notifications to "time out" From experimenting in production we found a "!=" caused the engine to use a sequential scan, whereas explicitly listing all the types ensured an index scan was used. We also found that querying for many (over 100K) items leads to the task stalling - no logs, but no evidence of it running either - so we also add a limit to the query. Since the query now only returns a subset of notifications, we need to ensure the subsequent "update" query operates on the same batch. Also, as a temporary measure, we have a loop in the task code to ensure it operates on the total set of notifications to "time out", which we assume is less than 500K for the time being.	2021-11-09 13:50:32 +00:00
David McDonald	98b6c1d67d	Merge pull request #3354 from alphagov/bump-utils-to-fix-no-break-space Bump utils to 48.0.0	2021-11-05 16:37:27 +00:00
David McDonald	e4f523e3a0	Bump utils to 48.0.0 Brings in fixes to support for non breaking spaces See https://github.com/alphagov/notifications-utils/pull/908	2021-11-05 15:09:09 +00:00
Ben Thorner	e7fbd018d1	Merge pull request #3355 from alphagov/celery-5-180017131 Upgrade to Celery 5	2021-11-05 13:04:14 +00:00
Richard Baker	e10f45b3a7	Cast Celery worker_max_tasks_per_child to int or None We use this config option when running workers that process non-memory-safe tasks to restart the worker after n tasks. Celery 5 requires this to be passed as an int or None. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-11-05 11:09:09 +00:00
sakisv	9e9091e980	Reduce concurrency for other workers too for consistency Any worker that had `--concurrency` > 4 is now set to 4 for consistency with the how volume workers. See previous commit (Reduce concurrency on high volume workers) for details	2021-11-04 16:31:22 +02:00
sakisv	92086e2090	Reduce concurrency on high volume workers We noticed that having high concurrency led to significant memory usage. The hypothesis is that because of long polling, there are many connections being held open which seems to impact the memory usage. Initially the high concurrency was put in place as a way to get around the lack of long polling: We were spawning multiple processes and each one was doing many requests to SQS to check for and receive new tasks. Now with long polling enabled and reduced concurrency, the workers are much more efficient at their job (the tasks are being picked up so fast that the queues are practically empty) and much lighter on resource requirements. (This last bit will allow us to reduce the memory requirement for heavy workers like the sender and reduce our costs) The concurrency number was chosen semi-arbitrarily: Usually this is set to the number of CPUs available to the system. Because we're running on PaaS and that number is both abstracted and may be claimed for by other processes, we went for a conservative one to also reduce the competion for CPU among the processes of the same worker instance.	2021-11-04 11:38:05 +02:00
Ben Thorner	3ecbdbb260	Temporarily disable task argument checking This was added in Celery 4 [1]. and appears to be incompatible with our approach of injecting "request_id" into task arguments (example exception below). Although our other apps are on Celery 5 our logs don't show any similar issues, probably because all their tasks are invoked without request IDs. In the longterm we should decide if we want to enable argument checking and fix the tracing approach, or stop tracing request IDs in Celery tasks. [1]: https://docs.celeryproject.org/en/stable/userguide/tasks.html#argument-checking 2021-11-01T11:37:36 delivery delivery ERROR None "RETRY: Email notification f69a9305-686f-42eb-a2ee-61bc2ba1f5f3 failed" [in /Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py:68] Traceback (most recent call last): File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 53, in deliver_email raise TypeError("test retry") TypeError: test retry [2021-11-01 11:37:36,385: ERROR/ForkPoolWorker-1] RETRY: Email notification f69a9305-686f-42eb-a2ee-61bc2ba1f5f3 failed Traceback (most recent call last): File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 53, in deliver_email raise TypeError("test retry") TypeError: test retry [2021-11-01 11:37:36,394: WARNING/ForkPoolWorker-1] Task deliver_email[449cd221-173c-4e18-83ac-229e88c029a5] reject requeue=False: deliver_email() got an unexpected keyword argument 'request_id' Traceback (most recent call last): File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 53, in deliver_email raise TypeError("test retry") TypeError: test retry During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/task.py", line 731, in retry S.apply_async() File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/canvas.py", line 219, in apply_async return _apply(args, kwargs, *options) File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/task.py", line 537, in apply_async check_arguments((args or ()), *(kwargs or {})) TypeError: deliver_email() got an unexpected keyword argument 'request_id' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/trace.py", line 450, in trace_task R = retval = fun(args, *kwargs) File "/Users/benthorner/Documents/Projects/api/app/celery/celery.py", line 74, in __call__ return super().__call__(args, *kwargs) File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/trace.py", line 731, in __protected_call__ return self.run(args, **kwargs) File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 71, in deliver_email self.retry(queue=QueueNames.RETRY) File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/task.py", line 733, in retry raise Reject(exc, requeue=False) celery.exceptions.Reject: (TypeError("deliver_email() got an unexpected keyword argument 'request_id'",), False)	2021-11-01 11:39:57 +00:00
Ben Thorner	29c92a9e54	Try removing boto package again	2021-11-01 09:54:10 +00:00
Ben Thorner	efe4c6f06e	Fix notify-api crashing in PaaS This is purely by elimination: I couldn't see anything in the logs to indicate the cause of the crashes, just that the processes were exiting. The crash seemed to happen immediately after the AWS logs part of the wrapper script, which was a small indicator it might be something AWS-related. Since this package is no longer included by other dependencies, we need to include it explicitly.	2021-11-01 09:54:09 +00:00
Ben Thorner	89e390a3fc	Run "make freeze-requirements" Most of these are due to dependency changes in celery / kombu: -boto==2.49.0 `9b2a172078` +cached-property==1.5.2 `560518287a` +click-didyoumean==0.3.0 +click-plugins==1.1.1 +click-repl==0.2.0 `f462a437e3/requirements/default.txt` +pycurl==7.43.0.5 `59d88326b8/requirements/extras/sqs.txt` +vine==5.0.0 `f6c3b3313f` I'm not sure about the following, but neither are critical so I don't think it's worth tracking down where they came from. +prompt-toolkit==3.0.21 +wcwidth==0.2.5	2021-11-01 09:54:08 +00:00
Ben Thorner	d0550533a7	Remove redundant polling_interval setting This appeared without explanation in [1], but it's the same as the default value [2] so we don't need to specify it - doing so gives the impression we made a decision, but that's not clear here. [1]: https://github.com/alphagov/notifications-api/pull/2142/files#diff-84f1a9419471e289c6b6e2b0209b329e20df6cef81d1f7f0a193ddc2fc6ad69dR153 [2]: https://docs.celeryproject.org/en/stable/getting-started/backends-and-brokers/sqs.html#polling-interval	2021-11-01 09:54:07 +00:00
Ben Thorner	60799399ab	Remove anyjson package This is no longer required by Celery [1] and now causes an error when deploying with the new versions of other packages: use_2to3 is invalid [1]: https://docs.celeryproject.org/en/stable/history/whatsnew-4.0.html#requirements	2021-11-01 09:54:06 +00:00
Ben Thorner	44b3b42aba	Rewrite config to fix deprecation warnings The new format was introduced in Celery 4 [1] and is due for removal in Celery 6 [2], hence the warnings e.g. [2021-10-26 14:31:57,588: WARNING/MainProcess] /Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/utils.py:206: CDeprecationWarning: The 'CELERY_TIMEZONE' setting is deprecated and scheduled for removal in version 6.0.0. Use the timezone instead alternative=f'Use the {_TO_NEW_KEY[setting]} instead') This rewrites the config to match our other apps [3][4]. Some of the settings have been removed entirely: - "CELERY_ENABLE_UTC = True" - this has been enabled by default since Celery 3 [5]. - "CELERY_ACCEPT_CONTENT = ['json']", "CELERY_TASK_SERIALIZER = 'json'" - these are the default settings since Celery 4 [6][7]. Finally, this removes a redundant (and broken) bit of development config - NOTIFICATION_QUEUE_PREFIX - that should be set in environment.sh [8]. [1]: https://docs.celeryproject.org/en/stable/history/whatsnew-4.0.html#lowercase-setting-names [2]: https://docs.celeryproject.org/en/stable/history/whatsnew-5.0.html#step-2-update-your-configuration-with-the-new-setting-names [3]: `252ad01d39/app/config.py (L27)` [4]: `03df0d9252/app/__init__.py (L33)` [5]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-enable_utc [6]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-task_serializer [7]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-accept_content [8]: `2edbdec4ee/README.md (environmentsh)`	2021-11-01 09:54:05 +00:00
Leo Hemsted	19394ab9dd	construct celery queues once in the base config previously, we were confusing things by appending to CELERY_QUEUES in both dev and test configs - these are executed at import time, so the list contained all queues twice, regardless of what config you're actually using. Fortunately, the -Q command that we supply the workers with overrides this config option, so other environments weren't affected. Given that, we can tidy up this code by just declaring it in the base config every time	2021-11-01 09:54:04 +00:00
Ben Thorner	c2fe1b04bb	Fix test checking for nested exception Previously this type of exception was raised at the top level and the task did not retry [1]. Since Celery 4+ the behaviour changed so that a Retry exception will be raised unless we explicitly say we want to raise the original one [2]. It's unclear if we actually want to retry this task for any type of exception, but it's out-of-scope for this PR to decide on this, so here we just reraise the exception to make it compatible with the new version of Celery and the existing test. [1]: https://github.com/alphagov/notifications-api/pull/2832/files#diff-926badba91648d56a973e16bd92da3345b23bc60dc89360119b1df08de52723fL77 [2]: `32b52ca875 (diff-db604dd7cb51e386710260ff2eba378aac19ba11eec97904bbf097b68caeada6L625)`	2021-11-01 09:54:03 +00:00
Ben Thorner	3e49de5330	Upgrade to Celery 5.1.2 There are several other changes we need to make in order to install the new version. For more context, see: - `208e90e40f` - `e3d1993a58` - `7e93611fce` In the next commits we'll look at tidying up the config and other dependencies so the change is deployable.	2021-11-01 09:54:00 +00:00
Ben Thorner	4125ed3f10	Merge pull request #3352 from alphagov/remove-raw-request-stats-180016688 Revert "add raw request timings to provider send functions"	2021-10-29 15:04:25 +01:00
Ben Thorner	fd7373ed73	Merge pull request #3353 from alphagov/letter-sending-cc-dvla-180096891 CC DVLA in tickets about outstanding letters	2021-10-29 15:04:16 +01:00
Ben Thorner	d1586a8f81	CC DVLA in tickets about outstanding letters Previously we sent them emails about this manually. We also tried a Zendesk macro/trigger approach, but using a CC means: - We can control the behaviour ourselves (Zendesk triggers can only be edited by admins outside our team). - We keep the DVLA notification approach consistent and in one place, so notifications always go to the same people. - Any further (public) updates to the ticket will also trigger a notification to DVLA (previous trigger only notified on creation).	2021-10-29 11:46:29 +01:00
Ben Thorner	64327c10ae	Bump utils to 47.1.0 This includes the new email_ccs feature needed for the next commit, but also an upgrade to bleach [1]. [1]: https://github.com/alphagov/notifications-utils/pull/909	2021-10-29 11:46:28 +01:00
Ben Thorner	3eeba0266b	Revert "add raw request timings to provider send functions" This reverts commit `f2f2509c9b`. Raw request stats were added to investigate a hunch about a performance issue we were seeing [1], but turned out not to be relevant. We don't use them anymore so we can tidy up. [1]: https://github.com/alphagov/notifications-api/pull/2858	2021-10-28 11:12:18 +01:00
Ben Thorner	32873ef70f	Merge pull request #3349 from alphagov/freeze-requirements-180017131 Run "make freeze-requirements"	2021-10-28 10:46:01 +01:00
David McDonald	ffc7aec61c	Merge pull request #3350 from alphagov/normalised_to_update Bug fix: update normalised_to, not just `to` after letter sanitise	2021-10-27 12:12:06 +01:00
David McDonald	5a51ab6131	Bug fix: update normalised_to, not just `to` after letter sanitise When a precompiled letter is sent to us, we set the `to` field as 'Provided as PDF' in `1c1023a877/app/v2/notifications/post_notifications.py (L100-L104)` This then also sets `normalised_to` as `providedaspdf`. However, when template preview sanitises the letter, pulls out the address and gives it to the API, we were only setting `to` to be the new address and had forgotten to also amend `normalised_to` to be the normalised version. This meant that for all these letters we accidentally left `normalised_to` as `providedaspdf`. The impact of this was that we can not then search for these letters in the admin user interface as they rely on the `normalised_to` field containing the recipient address. This commit fixes that bug by also setting the `normalised_to` field	2021-10-27 11:56:25 +01:00
Ben Thorner	32e8f9cbc6	Run "make freeze-requirements" This is so we can clear the diff prior to upgrading to Celery 5, which has a number of transitive package changes associated with it. It makes sense for this to be a separate change in case it causes issues of its own. However, the only major difference in this commit is pyparsing [1]. [1]: https://github.com/pyparsing/pyparsing/blob/master/docs/whats_new_in_3_0_0.rst	2021-10-27 11:00:48 +01:00
Chris Hill-Scott	2edbdec4ee	Merge pull request #3344 from alphagov/broadcast-event-field Store the `event` field from CAP XML broadcasts	2021-10-26 12:00:17 +01:00
Chris Hill-Scott	54bcf618da	Store the `event` field from CAP XML broadcasts We don’t store everything that comes in the CAP XML when someone creates a broadcast via the API. One thing we do store is `<identifier>` (in a column called `reference`) which is a unique (to the external system) identifier for the broadcast. We show this in the front end instead of the template name, because broadcasts created from the API don’t use templates. However this ID isn’t very friendly – the Environment Agency just supply a UUID. The Environment Agency also populate the `<event>` field with some human readable text, for example: > 013 Issue Severe Flood Warning EA (013 is an area code which will be meaningful to the Flood Warning Service team) We should show this in the UI instead of the reference. The first step towards this is storing it in the database and returning it in the REST endpoints. Later we can have the admin app prefer `cap_event` over `reference`, where `cap_event` is present. We can’t backfill this data because we don’t keep a copy of the original XML. Seems like `<event>` is a mandatory property of `<info>`, so we don’t need to worry about the field being missing (`<info>` is optional in CAP but we require it because it contains stuff like the areas which we need in order to send out the broadcast`). *** https://www.pivotaltracker.com/story/show/176927060	2021-10-26 11:12:27 +01:00
Ben Thorner	d703251b13	Merge pull request #3348 from alphagov/better-callback-stats-180016688 Include status in stats about delivery times	2021-10-22 11:59:24 +01:00

1 2 3 4 5 ...

8489 Commits