notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2026-01-18 16:41:56 -05:00

Author	SHA1	Message	Date
Leo Hemsted	49cc1b643f	split delete task up into per service we really don't gain anything by running each service delete in sequence - we get the services, and then just loop through them deleting per service. By deleting per service in separate tasks, we can take advantage of parallelism. the only thing we lose is some log lines but I don't think we're that interested in them. only set query limit at the move_notifications dao function - the task doesn't really care about the technical implementation of how it deletes the notifications	2021-12-14 15:24:34 +00:00
Ben Thorner	87cd40d00a	Scale timeout task to work on arbitrary volumes Previously this was limited to 500K notifications. While we don't expect to reach this limit, it's not impossible e.g. if we had a repeat of the incident where one of our providers stopped sending us status updates. Although that's not great, it's worse if our code can't cope with the unexpectedly high volume. This reuses the technique we have elsewhere [1] to keep processing in batches until there's nothing left. Specifying a cutoff point means the total amount of work to do can't keep growing. [1]: `2fb432adaf/app/dao/notifications_dao.py (L441)`	2021-12-13 17:14:28 +00:00
Ben Thorner	76aeab24ce	Rewrite DAO timeout method to take cutoff_time Previously we specified the period and calculated the cutoff time in the function. Passing it in means we can run the method multiple times and avoid getting "new" notifications to time out in the time it takes to process each batch.	2021-12-13 16:56:21 +00:00
Ben Thorner	ab4cb029df	Remove alert for email / sms in created In response to [1]. [1]: https://github.com/alphagov/notifications-api/pull/3383#discussion_r759379988 It turns out the code that inspired this new alert - in the old "timeout-sending-notifications" task - was actually redundant as we already have a task to "replay" notifications still in "created", which is much better than just alerting about them. It's possible the replayed notifications will also fail, but in both cases we should see some kind of error due to this, so I don't think we're losing anything by not having an alert.	2021-12-06 14:11:42 +00:00
Ben Thorner	9bd2a9b427	Extract tests for conditionally creating callback This will help ensure the function doesn't change arbitrarily, now that it's used in multiple other places.	2021-12-06 14:11:41 +00:00
Ben Thorner	04da017558	DRY-up conditionally creating callback tasks This removes 3 duplicate instances of the same code, which is still tested implicitly via test_process_ses_receipt_tasks [1]. In the next commit we'll make this test more explicit, to reflect that it's now being reused elsewhere and shouldn't change arbitrarily. We do lose the "print" statement from the command instance of the code, but I think that's a very tolerable loss. [1]: `16ec8ccb8a/tests/app/celery/test_process_ses_receipts_tasks.py (L94)`	2021-12-06 14:11:34 +00:00
Ben Thorner	aea555fce2	Make test for timeout with callbacks consistent This now matches the behaviour of the test above it: mocking out the DAO function in order to focus on the specific behaviour of the function under test.	2021-12-06 14:00:42 +00:00
Ben Thorner	5bf3fe6c0f	Clarify no callbacks are sent in timeout test This now complements the test below it, which we will refactor to be consistent in the next commit.	2021-12-06 14:00:41 +00:00
Ben Thorner	8b7e81958d	Delete duplicate 'timeout' tests for notifications These scenarios are already covered by the DAO tests. It's enough to just check the DAO function is called as expected. While sometimes it can be better to have more end-to-end tests, the convention across much of this app is to do unit tests.	2021-12-06 14:00:40 +00:00
Ben Thorner	c3e11d676f	Remove unnecessary test_request_context manager This doesn't affect how the tests run and just adds complexity.	2021-12-06 14:00:39 +00:00
Ben Thorner	05bd26d444	Fix names for a few tests in test_nightly_tasks.py I find it really difficult to visually parse test files unless we have a consistent convention for how we name our test functions. In most of our tests the name of the test function starts with the name of the function under test.	2021-12-06 14:00:38 +00:00
Ben Thorner	0318229216	Stop 'timing out' old 'created' notifications This is being replaced with a new alert and runbook [1]. It's not always appropriate to change the status to 'technical-failure', and the new alert means we'll act to fix the underlying issue promptly. We'll look at tidying up the remaining code in the next commits. [1]: https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-email-or-sms-still-in-created	2021-12-06 14:00:36 +00:00
Ben Thorner	f96ba5361a	Add new task to alert about created email / sms This will log an error when email or SMS notifications have been stuck in 'created' for too long - normally they should be 'sending' in seconds, noting that we have a goal of < 10s wait time for most notifications being processed our platform. In the next commits we'll decouple similar functionality from the existing 'timeout-sending-notifications' task.	2021-12-06 14:00:31 +00:00
Ben Thorner	528223ed61	Use central NotifyCelery base class in utils Note that the new base class doesn't include a bespoke feature we had here: 'log_on_worker_shutdown'. We've agreed it's reasonable to remove it for now as it was introduced many years ago and its use case is unclear - we can always add it back if needed.	2021-11-16 13:58:12 +00:00
Chris Hill-Scott	0236318189	Republish gov.uk/alerts every night to clear down planned tests We have made it so that gov.uk/alerts shows a ‘1 planned test’ banner for the whole of the day when there has been an operator test on that day. We need to remove the banner when the day is over. The most straightforward way to do this is to republish the site at the start of every day. The gov.uk/alerts code[1] will work out if there are or aren’t any planned tests to show that day. 1. `5a274af6d0/app/models/alerts.py (L38-L44)`	2021-11-15 14:23:32 +00:00
Ben Thorner	d66c68d6d6	Merge pull request #3364 from alphagov/celery-headers-request-id-180213914 Move Celery task Request ID injection into headers	2021-11-12 11:10:29 +00:00
Ben Thorner	89a8dd1a03	Move Celery task Request ID injection into headers Previously we passed along this piece of state via the kwargs for a task, but this runs the risk of the task accidentally receiving the extra kwarg unless we've covered all the code paths that could invoke it directly e.g. retries don't invoke __call__. This switches to using Celery "headers" to pass the extra state. It turns out that a Celery has two "header" concepts, which leads to some confusion and even a bug with the framework [1]: - In older (pre v4.4) versions of Celery, the "headers" specified by apply_async() would become _the_ headers in the message that gets passed around workers, etc. These would be available later on via "self.request.headers". - Since Celery protocol v2, the meaning of "headers" in the message changed to become (basically) _all_ metadata about the task [2], with the "headers" option in apply_async() being merged [3] into the big dict of metadata. This makes using headers a bit confusing unfortunately, since the data structure we put in is subtly different to what comes out in the request context. Nonetheless, it still works. I've added some comments to try and clarify it. Note that one of the original tests is no longer necessary, since we don't need to worry about argument passing styles with headers. [1]: https://github.com/celery/celery/issues/4875 [2]: `663e4d3a0b (diff-07a65448b2db3252a9711766beec23372715cd7597c3e309bf53859eabc0107fR343)` [3]: `681a922220/celery/app/amqp.py (L495)`	2021-11-10 18:03:40 +00:00
Katie Smith	3d4796c924	Add task to resanitise and replace a PDF for precompiled letter This adds a task which is designed to be used if we want to recreate the PDF for a precompiled letter (either one that has been created using the API or one that has been uploaded through the website). The task takes the `notification_id` of the letter and passes template preview the details it needs in order to sanitise the original file and then replace the version in the letters-pdf bucket with the freshly sanitised version.	2021-11-10 09:51:31 +00:00
Ben Thorner	d1586a8f81	CC DVLA in tickets about outstanding letters Previously we sent them emails about this manually. We also tried a Zendesk macro/trigger approach, but using a CC means: - We can control the behaviour ourselves (Zendesk triggers can only be edited by admins outside our team). - We keep the DVLA notification approach consistent and in one place, so notifications always go to the same people. - Any further (public) updates to the ticket will also trigger a notification to DVLA (previous trigger only notified on creation).	2021-10-29 11:46:29 +01:00
David McDonald	5a51ab6131	Bug fix: update normalised_to, not just `to` after letter sanitise When a precompiled letter is sent to us, we set the `to` field as 'Provided as PDF' in `1c1023a877/app/v2/notifications/post_notifications.py (L100-L104)` This then also sets `normalised_to` as `providedaspdf`. However, when template preview sanitises the letter, pulls out the address and gives it to the API, we were only setting `to` to be the new address and had forgotten to also amend `normalised_to` to be the normalised version. This meant that for all these letters we accidentally left `normalised_to` as `providedaspdf`. The impact of this was that we can not then search for these letters in the admin user interface as they rely on the `normalised_to` field containing the recipient address. This commit fixes that bug by also setting the `normalised_to` field	2021-10-27 11:56:25 +01:00
Ben Thorner	d703251b13	Merge pull request #3348 from alphagov/better-callback-stats-180016688 Include status in stats about delivery times	2021-10-22 11:59:24 +01:00
Ben Thorner	f974108934	Include status in stats about delivery times Previously these metrics weren't very useful because they could be skewed by long timings for failed notifications, which can take up to 72 hours to deliver. I'm intentionally not trying to have a dual running period (with the old and new names) because: - We don't use the current stats for anything (checking Grafana). - The current stats get turned into a "bucket" metric in Prometheus [1][2], which isn't very useful because it can only tell us the mean time to deliver, but we're actually interested in percentiles. Switching to a new naming is an opportunity to fix the raw data and the way it's aggregated, using the same kind of "summary" metric that we now use for stats about our Celery tasks [3]. [1]: `c330a8ac8a/paas/statsd/statsd-mapping.yml (L82)` [2]: https://prometheus.io/docs/practices/histograms/#quantiles [3]: https://github.com/alphagov/notifications-aws/pull/890	2021-10-20 17:22:59 +01:00
Leo Hemsted	0b8c6ef263	Merge pull request #3339 from alphagov/letter-runbook-link tweak zendesk message for no ack files alert	2021-10-20 15:23:33 +01:00
Pea Tyczynska	1b6f9505da	Call `publish-govuk-alerts` task when alert expires The `auto-expire-broadcast-messages` task checks for expired broadcasts at five minute intervals. This change now calls the `publish-govuk-alerts` task in govuk-alerts if there are expired broadcasts so that the site is updated. Co-authored-by: Katie Smith <katie.smith@digital.cabinet-office.gov.uk>	2021-10-18 08:41:25 +01:00
Katie Smith	04bfd6bfdb	Trigger task to publish alerts when sending or cancelling alert When we send or cancel a broadcast message, we now trigger a task in govuk-alerts repo that polls our API for alerts and publishes a fresh list of alerts. Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>	2021-10-18 08:41:24 +01:00
Leo Hemsted	b8c4e19072	tweak zendesk message for no ack files alert include a link to a runbook entry. also the list of acknowledgement files can be very long, so make that the last thing, and use new lines to space out the message.	2021-10-08 13:45:02 +01:00
Katie Smith	58597653df	Update how "sending to TV numbers" Zendesk tickets are created	2021-09-29 11:26:20 +01:00
Katie Smith	0c0c7f4478	Update how "letters still created status" Zendesk tickets are created	2021-09-29 11:23:28 +01:00
Katie Smith	2f66e38fb9	Update how "missing ackfile for letters" Zendesk tickets are created	2021-09-29 11:10:50 +01:00
Katie Smith	64c0a3fb9d	Update how 'letters still sending' Zendesk tickets are created These now use the new Zendesk form.	2021-09-29 11:07:37 +01:00
Katie Smith	b114dadcae	Update how pending virus check Zendesk tickets are created This updates the tickets that are created when the `check_if_letters_still_pending_virus_check` scheduled task detects letters in the `pending-virus-check` state.	2021-09-29 11:03:48 +01:00
Katie Smith	9ff0ca0363	Update how live broadcast Zendesk tickets are created These now use the Notify Form in Zendesk	2021-09-29 10:59:07 +01:00
sakisv	9faa3d34e1	Fix tests Specifically, no longer test for a p1 zendesk when sending an alert and drop misleading "p1" from test name when cancelling an alert. We're no longer creating a P1 from the code, but we _do_ create a zendesk ticket when sending out an alert. When cancelling, what we want to test is that we don't create a second ticket when the alert is cancelled.	2021-09-10 10:14:28 +03:00
Ben Thorner	bf0bf4e31c	Favour new "areas" format for PagerDuty alerts Broadcasts created via the API [1] and the Admin app [2] should both now have this field set. It's also more informative to show this, and broadcasts created via the API don't have IDs anyway. There's a small risk that an old broadcast that gets approved won't have this data, but it's for information only and we intend to backfill all old broadcasts in the near future. [1]: `023a06d5fb` [2]: `7dbe3afa19`	2021-08-27 14:22:12 +01:00
Ben Thorner	a7d92b9058	Replace / remove redundant uses of "areas" In one case ("areas=['manchester']") the format was even invalid, but in general the original value of the column is pretty much irrelevant for tests that involve updating it (it's highly unlikely the column would default to the same value as the test data).	2021-08-27 13:31:49 +01:00
Ben Thorner	312a895822	Merge pull request #3294 from alphagov/auto-expire-alerts-178926353 Auto expire old broadcast messages	2021-07-22 09:53:41 +01:00
Ben Thorner	5e9d8e5fa0	Auto expire old broadcast messages Since the expiry is sent as part of the message payload, we don't need to invoke the CBC proxies (and indeed there's no way to do so for an expired alert). In future we plan to extend this task so it triggers the regeneration of content on gov.uk/alerts. It's worth noting that 'finishes_at' can theoretically be None, in which case it's unclear when the alert should expire. While alerts from the Admin app should always have an expiry [1], we have many in the DB that don't, so it's worth checking for this scenario. [1]: `078ac10c8d/app/models/broadcast_message.py (L255)`	2021-07-21 13:05:11 +01:00
Ben Thorner	08f48379b4	Move ID generation into link test method Unlike the other IDs which are stored in the DB, this isn't relevant for the Celery task as it invokes a link test. Moving it into the proxy client will also enable us to generate a second ID in the next commits, where we start doing a link test for the failover lambda.	2021-07-19 16:00:55 +01:00
Ben Thorner	b6774bf0f7	Generate Vodafone link test sequence nos in proxy Previously the Celery task to trigger a link test had to know about the special case of a sequence number for Vodafone. Since we're about to change the client to perform multiple tests it makes sense to give it the knowledge of how to generate number itself. Note that we have to import the db inline to avoid a circular import, since this module is itself imported by app/__init__.py. Other invocations of the Vodafone client use stored sequence numbers from the DB, which are called "message numbers" in that context. Since the two use cases are very different (even the names are different!), having them in two places shouldn't cause any confusion.	2021-07-19 15:43:36 +01:00
Leo Hemsted	2ad9a3a380	retry service callbacks on 429 if we're served a 429, put the item on the retry queue and retry the same as if the service returned a 5xx. 429 is commonly returned for rate limit exceeding, and retrying on a delay is a typical response to that.	2021-07-13 16:09:17 +01:00
Pea Tyczynska	c28e9451d4	Bump moto version to try solve dependencies version conflict Also update mock import statements in some test files as they stopped working with this dependency update.	2021-07-08 15:37:19 +01:00
Rebecca Law	18dd9050a4	- make sure when processing a job that we check the total_sent + job.notification_count against the service.message_limit.	2021-06-28 13:07:48 +01:00
Rebecca Law	fd7486d751	- Merge daily limit functions into one, refactor call for daily limit check from process_job - refactor tests to standardise test names - refactor some tests to be more clear - remove unnecessary tests - include missing test	2021-06-24 11:05:22 +01:00
Rebecca Law	35b20ba363	Correct the daily limits cache. Last year we had an issue with the daily limit cache and the query that was populating it. As a result we have not been checking the daily limit properly. This PR should correct all that. The daily limit cache is not being incremented in app.notifications.process_notifications.persist_notification, this method is and should always be the only method used to create a notification. We increment the daily limit cache is redis is enabled (and it is always enabled for production) and the key type for the notification is team or normal. We check if the daily limit is exceed in many places: - app.celery.tasks.process_job - app.v2.notifications.post_notifications.post_notification - app.v2.notifications.post_notifications.post_precompiled_letter_notification - app.service.send_notification.send_one_off_notification - app.service.send_notification.send_pdf_letter_notification If the daily limits cache is not found, set the cache to 0 with an expiry of 24 hours. The daily limit cache key is service_id-yyy-mm-dd-count, so each day a new cache is created. The best thing about this PR is that the app.service_dao.fetch_todays_total_message_count query has been removed. This query was not performant and had been wrong for ages.	2021-06-22 16:15:36 +01:00
David McDonald	be035664c4	Add operator channel to broadcast settings route Looks identical to the government channel in terms of the interface	2021-06-09 13:49:06 +01:00
Rebecca Law	1bf5ce08b2	Add a error log for alert tasks. Many of the team members do not look at emails from zendesk, adding a current_app.logger.error message for things we care about to give developers a better chance of seeing them. I have purposely not added an erro log for `check_for_services_with_high_failure_rates_or_sending_to_tv_numbers` because it's not something we need to look at immediately.	2021-05-26 11:06:21 +01:00
Katie Smith	829b646931	Allow "government" in broadcast_channel schema This will allow admin to pass through a value of "government" for the broadcast_channel. We don't have any logic around the value of service.broadcast_channel, so no updates are needed to the tasks etc.	2021-05-11 16:56:56 +01:00
Katie Smith	4624328c36	Make service_broadcast_settings.provider non-nullable We set all existing null values to "all", then make the column non-nullable. Admin is already passing through the value of "all".	2021-05-10 15:59:22 +01:00
Katie Smith	1767535def	Allow service.allowed_broadcast_provider to be "all" We want to replace the value `None` for service.allowed_broadcast_provider with the value of "all". As a first step, we need to allow both values. Once notifications-admin has been changed to pass through "all" and all the data in the database has been updated, we can update the code to stop supporting both values.	2021-05-06 15:32:02 +01:00
Katie Smith	8365c749e4	Change letter zip file names for Insolvency Service letters DVLA would like to be able to identify letters sent by the Insolvency Service, so we are changing the zipfile name. They need all zipfile names to have the same structure, so we can't just add a marker to files sent by that service - we have to change all filenames. The new format is like this: `{NOTIFY}.{DATE}.{SEQUENCE_ID}.{UNIQUE_ID}.{SERVICE_ID}.{ORG_NAME}.{EXTENSION}`	2021-05-06 09:18:44 +01:00

1 2 3 4 5 ...

867 Commits