notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2026-01-07 19:27:58 -05:00

Author	SHA1	Message	Date
Pea Tyczynska	4a90cde701	Merge pull request #3429 from alphagov/cancel_alert_via_api Cancel broadcast via API	2022-01-21 14:04:35 +00:00
Pea Tyczynska	b6dd189462	Test cancel request via API returns 404 if service id does not match	2022-01-20 18:28:10 +00:00
Pea Tyczynska	52dbdb7518	Move validate_and_update_broadcast_message_status to a utils file This is because that function is used both when broadcast status is updated via API and via admin, so it's a shared resource. Also move and update tests for updating broadcast message status so things are tested at source and repetition is avoided.	2022-01-20 18:14:41 +00:00
Pea Tyczynska	c9afb2f038	Remove unnecessary error handling The context here should be enough for the users, custom error message is not needed.	2022-01-20 18:14:40 +00:00
Ben Thorner	0f6dea0deb	Revert running status aggregation in parallel The top-level task didn't run successfully after this was deployed due to the worker being killed due to heavy disk usage. While the more parallel version does log much more, it doesn't totally explain the disk behaviour. Nonetheless, reverting it is sensible to give us the time we need to investigate more.	2022-01-20 12:22:33 +00:00
Pea Tyczynska	a4c20e8ba6	Return 404 if reference from cancel message does not match If the reference from cancel CAP XML we received via API does not match with any existing broadcast, return 404. Do the same if service id doesn't match. Also refactor code to cancel broadcast out into separate function It should be a separate function that is only called by create_broadcast function. This will prevent create_broadcast from becoming too big and complex and doing too many things.	2022-01-19 15:42:27 +00:00
Pea Tyczynska	3b4a9d8942	Cancel broadcast via API When a service sends us an XML CAP broadcast message with Cancel status, and that broadcast is in broadcasting state, we cancel it.	2022-01-19 15:42:26 +00:00
Pea Tyczynska	940126abfb	Reject unapproved broadcast upon cancel API request When a service sends us a cancel broadcast XML via API, if that broadcast was not approved yet, reject it.	2022-01-19 15:41:38 +00:00
Ben Thorner	6be489daa7	Merge pull request #3425 from alphagov/parallelise-ft-status-180693991 Parallelise status aggregation by service and day	2022-01-19 13:49:28 +00:00
Ben Thorner	9182ebf4e5	Parallelise status aggregation by service and day This follows a similar approach as [1]. Recently we've seen lots of errors from this task, which we think are a consequence of it doing too much work and tripping Celery's visibility timeout. While we can optimise the query [2], it's likely the errors will return as the number of live services grows. Parallelising the aggregation now will make it more futureproof. [1]: https://github.com/alphagov/notifications-api/pull/3397 [2]: https://github.com/alphagov/notifications-api/pull/3417	2022-01-12 15:47:59 +00:00
Ben Thorner	9fc8b904c6	DRY up status aggregation tests (move DAO tests up) The previous DAO tests were also confusing because they were testing two functions at the same time, so moving the tests up to the task level seems very reasonable, and will make it easier to change how this code works in the next commits.	2022-01-11 16:11:36 +00:00
Katie Smith	ed725c1513	Add endpoint to allow org team members to be removed This is similar to the corresponding endpoint for services. However, it is a little simpler since we don't need to worry about always having at least one team member for an organisation. The new dao function added, `dao_remove_user_from_organisation`, is also simpler than `dao_remove_user_from_service` since we don't have any organisation permissions to deal with.	2022-01-11 15:20:48 +00:00
Ben Thorner	081e0cab88	Merge pull request #3417 from alphagov/optimise-status-query-180693991 Optimise query to populate notification statuses	2022-01-11 14:18:36 +00:00
Rebecca Law	2257cae398	Fix bug in organisation report for its services and usages. If a service has not sent any SMS for the financial year the free allowance was showing up as 0 rather than the number in annual billing. The query has been updated to use an outer join so that the free allow will be returned when there is no ft_billing. There is a potential performance enhancement to only return the data for the services of the organisation in the `fetch_sms_free_allowance_remainder_until_date` subquery. I will investigate in a subsequent PR.	2022-01-11 10:04:36 +00:00
Ben Thorner	394bf9abd9	Extend test for updating fact statuses This covers that we only exclude test notifications and the key type is copied over correctly. In the next commits we're going to modify this part of the query, so it's important it's covered.	2022-01-05 16:49:30 +00:00
Katie Smith	13b6d1e490	Remove unused test function `set_up_get_all_from_hash` stopped being used in `52831813d8`	2022-01-04 14:04:03 +00:00
Katie Smith	3530d26ba3	Use `client` fixture everywhere There were a few tests which weren't using the `client` fixture but were using the code it contains. This simplifies them to use the fixture.	2022-01-04 14:04:03 +00:00
Katie Smith	0b7410818e	Allow `get_all_notifications_for_service` to accept POST requests We want admin to send a POST request to this route if the data contains a message recipient (a phone number or email address) so that this does not show in the logs. This changes the route to accept both GET and POST requests.	2022-01-04 14:04:03 +00:00
David McDonald	3a214da379	Merge pull request #3408 from alphagov/db-connection-close Close DB connection whilst making HTTP to SMS providers	2021-12-22 11:02:13 +00:00
David McDonald	2584946823	Close DB connection whilst making HTTP to SMS providers At the moment, when we are processing and sending an SMS we open a DB connection at the start of the celery task and then close it at the end of the celery task. Nice and simple. However, during that celery task we make an HTTP call out to our SMS providers. If our SMS providers have problems or response times start to slow then it means we have an open DB connection sat waiting for our SMS providers to respond which could take seconds. If our SMS providers grind to a halt, this would cause all of the celery tasks to hold on to their connections and we would run out of DB connections and Notify would fall over. We think we can solve this by closing the DB session which releases the DB connection back to the pool. Note, we've seen this happen in staging during load testing if our SMS provider stub has fallen over. We've never seen it in production and it may be less unlikely to happen as we are balancing traffic across two providers and they generally have very good uptime. One downside to be aware of is there could be a slight increase in time spent to send an SMS as we will now spend a bit of extra time closing the DB session and then reopening it again after the HTTP request is done. Note, there is no reason this approach couldn't be copied for our email provider too if it appears successful.	2021-12-21 17:45:53 +00:00
Pea Tyczynska	32cd7a0eb6	Merge pull request #3395 from alphagov/fix_org_usage_report Fix calculating remaining free allowance for SMS	2021-12-21 15:02:54 +00:00
Pea Tyczynska	d334e405c5	Refactor tests for sms remainder to make them easier to read	2021-12-21 14:43:56 +00:00
Ben Thorner	de9ae08ecc	Downgrade log for letter deletion exceptions If the S3 object is missing [1], then that's what we want, so we don't need such a severe log for it, but we still want to know as it's not expected. This is separate to more general "ClientError" exceptions, which could mean anything. There weren't any tests to cover missing S3 objects, so I've added one. I don't think we need a test for ClientErrors: - If there was no handler, the task would fail and we'd learn about it that way. - The scope of the calling task is now much smaller, so it matters less than it used to [2]. [1]: `81a79e56ce/app/letters/utils.py (L52)` [2]: `f965322f25`	2021-12-20 12:45:48 +00:00
Leo Hemsted	0dc0e184b9	clean up and rewrite notification_dao_delete_notifications a bunch of these tests are now covered in the task test, so got rid of some. Now that the "how long ago to delete" questions is asked in the task rather than in the dao, and only one service is looked at at a time, we don't need to worry about data retention, etc. Hopefully made the tests simpler - there may still be some duplicates or overlaps between the various cases.	2021-12-14 15:24:35 +00:00
Leo Hemsted	49cc1b643f	split delete task up into per service we really don't gain anything by running each service delete in sequence - we get the services, and then just loop through them deleting per service. By deleting per service in separate tasks, we can take advantage of parallelism. the only thing we lose is some log lines but I don't think we're that interested in them. only set query limit at the move_notifications dao function - the task doesn't really care about the technical implementation of how it deletes the notifications	2021-12-14 15:24:34 +00:00
Ben Thorner	c1f0c24d82	Trim down tests for DAO timeout function a bit The first test is enough to cover that "created" and "delivered" notifications aren't affected by this function.	2021-12-13 17:17:41 +00:00
Ben Thorner	87cd40d00a	Scale timeout task to work on arbitrary volumes Previously this was limited to 500K notifications. While we don't expect to reach this limit, it's not impossible e.g. if we had a repeat of the incident where one of our providers stopped sending us status updates. Although that's not great, it's worse if our code can't cope with the unexpectedly high volume. This reuses the technique we have elsewhere [1] to keep processing in batches until there's nothing left. Specifying a cutoff point means the total amount of work to do can't keep growing. [1]: `2fb432adaf/app/dao/notifications_dao.py (L441)`	2021-12-13 17:14:28 +00:00
Ben Thorner	76aeab24ce	Rewrite DAO timeout method to take cutoff_time Previously we specified the period and calculated the cutoff time in the function. Passing it in means we can run the method multiple times and avoid getting "new" notifications to time out in the time it takes to process each batch.	2021-12-13 16:56:21 +00:00
Ben Thorner	b81a66da50	Fix assertions in tests for timeout DAO function Previously most of the assertions were being run before we had actually called the function. There was also a redundant block of assertions that just asserted the initial state of the test data.	2021-12-13 16:48:30 +00:00
Ben Thorner	2fb432adaf	Merge pull request #3383 from alphagov/email-sms-created-alert-180344153 Add new log / alert for 'created' email / SMS	2021-12-13 12:56:05 +00:00
David McDonald	7d8eed8228	Optimise queries run for creating pagination links We have been running in to the problem in pallets/flask-sqlalchemy#518 where our page loads very slow when viewing a single page of notifications for a service in the admin app. Tracing this back and using SQL explain analyze I can see that getting the notifications takes about a second but the second query to count how many notifications there are (to work out if there is a next page of pagination) can take up to 100 seconds. As suggested in that issue, we do the pagination ourselves. Our pagination doesn't need us to know exactly how many notifications there are, just whether there are any on the next page and that can be done without running the slow query to count how many notifications in total by using `count_pages=False`. This commit is analagous to `c68d1a2f23` The only difference is that in that case, the pagination links are used to show prev and/or next links in the admin app. In this case, the pagination links are only used to see if there is a page 2, and if there is, say that we are only showing the first 50 results.	2021-12-10 17:47:27 +00:00
David McDonald	edadeb9131	Use `get_prev_next_pagination_links` when searching by to field The only change in behaviour is that we are no longer including a `last` pagination link. This is OK because the frontend doesnt use it, just the prev and next links as per https://github.com/alphagov/notifications-admin/blob/master/app/main/views/jobs.py#L248	2021-12-10 12:29:55 +00:00
David McDonald	6ac4e67f78	Add test for pagination behaviour We already have a test case for over 50 results, but this adds one for 50 (ie a single page of results or less)	2021-12-10 12:29:12 +00:00
Pea Tyczynska	a74d1b026f	Fix calculating remaining free allowance for SMS The way it was done before, the remainder was incorrect in the billing report and in the org usage query - it was the sms remainder left at the start of the report period, not at the end of that period. This became apparent when we tried to show sms_remainder on the org usage report, where start date is always the start of the financial year. We saw that sms sent by services did not reduce their free allowance remainder according to the report. As a result of this, we had to temporarily remove of sms_remainder column from the report, until we fix the bug - it has been fixed now, yay! I think the bug has snuck in partially because our fixtures for testing this part of the code are quite complex, so it was harder to see that numbers don't add up. I have added comments to the tests to try and make it a bit clearer why the results are as they are. I also added comments to the code, and renamed some variables, to make it easier to understand, as there are quite a few moving parts in it - subqueries and the like. I also renamed the fetch_sms_free_allowance_remainder method to fetch_sms_free_allowance_remainder_until_date so it is clearer what it does.	2021-12-09 18:58:10 +00:00
Ben Thorner	a8edfeb941	Remove command to replay callbacks In response to [1]. I've already removed the runbook that referred to this. [1]: https://github.com/alphagov/notifications-api/pull/3383#discussion_r765644576	2021-12-09 10:46:19 +00:00
Ben Thorner	ab4cb029df	Remove alert for email / sms in created In response to [1]. [1]: https://github.com/alphagov/notifications-api/pull/3383#discussion_r759379988 It turns out the code that inspired this new alert - in the old "timeout-sending-notifications" task - was actually redundant as we already have a task to "replay" notifications still in "created", which is much better than just alerting about them. It's possible the replayed notifications will also fail, but in both cases we should see some kind of error due to this, so I don't think we're losing anything by not having an alert.	2021-12-06 14:11:42 +00:00
Ben Thorner	9bd2a9b427	Extract tests for conditionally creating callback This will help ensure the function doesn't change arbitrarily, now that it's used in multiple other places.	2021-12-06 14:11:41 +00:00
Ben Thorner	04da017558	DRY-up conditionally creating callback tasks This removes 3 duplicate instances of the same code, which is still tested implicitly via test_process_ses_receipt_tasks [1]. In the next commit we'll make this test more explicit, to reflect that it's now being reused elsewhere and shouldn't change arbitrarily. We do lose the "print" statement from the command instance of the code, but I think that's a very tolerable loss. [1]: `16ec8ccb8a/tests/app/celery/test_process_ses_receipts_tasks.py (L94)`	2021-12-06 14:11:34 +00:00
Ben Thorner	aea555fce2	Make test for timeout with callbacks consistent This now matches the behaviour of the test above it: mocking out the DAO function in order to focus on the specific behaviour of the function under test.	2021-12-06 14:00:42 +00:00
Ben Thorner	5bf3fe6c0f	Clarify no callbacks are sent in timeout test This now complements the test below it, which we will refactor to be consistent in the next commit.	2021-12-06 14:00:41 +00:00
Ben Thorner	8b7e81958d	Delete duplicate 'timeout' tests for notifications These scenarios are already covered by the DAO tests. It's enough to just check the DAO function is called as expected. While sometimes it can be better to have more end-to-end tests, the convention across much of this app is to do unit tests.	2021-12-06 14:00:40 +00:00
Ben Thorner	c3e11d676f	Remove unnecessary test_request_context manager This doesn't affect how the tests run and just adds complexity.	2021-12-06 14:00:39 +00:00
Ben Thorner	05bd26d444	Fix names for a few tests in test_nightly_tasks.py I find it really difficult to visually parse test files unless we have a consistent convention for how we name our test functions. In most of our tests the name of the test function starts with the name of the function under test.	2021-12-06 14:00:38 +00:00
Ben Thorner	0318229216	Stop 'timing out' old 'created' notifications This is being replaced with a new alert and runbook [1]. It's not always appropriate to change the status to 'technical-failure', and the new alert means we'll act to fix the underlying issue promptly. We'll look at tidying up the remaining code in the next commits. [1]: https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-email-or-sms-still-in-created	2021-12-06 14:00:36 +00:00
Ben Thorner	2acc4ee67d	Repurpose command to replay notification callbacks This is so we can use it to address issues highlighted by the new alert, if it's not possible to actually send the notifications e.g. if they are somehow 'invalid'. Previously this was added for a one-off use case [1]. This rewrites the task to operate on arbitrary notification IDs instead of client refs, which aren't always present for notifications we may want to send / replay callbacks for. Since the task may now need to work on notifications more than one service, I had to restructure it to cope with multiple callback APIs. Note that, in the test, I've chosen to do a chain of invocations and assertions, rather than duplicate a load of boilerplate or introduce funky parametrize flags for a service with/out a callback API. We'll refactor this in a later commit. [1]: `e95740a6b5`	2021-12-06 14:00:35 +00:00
Ben Thorner	f96ba5361a	Add new task to alert about created email / sms This will log an error when email or SMS notifications have been stuck in 'created' for too long - normally they should be 'sending' in seconds, noting that we have a goal of < 10s wait time for most notifications being processed our platform. In the next commits we'll decouple similar functionality from the existing 'timeout-sending-notifications' task.	2021-12-06 14:00:31 +00:00
David McDonald	989ef9c21a	Remove `last` and `total` keys from pagination links These don't appear to be used anywhere in the admin app and this route is only used by the admin app. Therefore it is safe to remove them. We remove them because the calculate the total number of notifications or the final page number of results can be particularly slow for services with many many notifications, for example 100 seconds for a service with 500k notifications sent in the past 7 days. Given neither are being used, this will give us the potential in the next commit to reduce the number of slow queries and improve page load times. Note, I've kept the scope small by only introducing the new pagination function for this one endpoint but there could be scope in future to get all pagination using the next function if appropriate.	2021-12-03 17:26:49 +00:00
David McDonald	a62e63fcef	Add tests for existing pagination behaviour No functionality change, just documenting what already exists	2021-12-03 17:21:14 +00:00
Ben Thorner	6435b57cd1	Merge pull request #3384 from alphagov/fix-cronitor-test-180344153 Fix flakey Cronitor test using caplog fixture	2021-12-01 12:44:27 +00:00
Ben Thorner	aea5d601f2	Fix flakey Cronitor test using caplog fixture This appears to not be thread safe: it started failing when run in parallel with other tests in this PR [1]. We don't get much out of using caplog over patching - it just proves our logging config isn't swallowing the error logs, which we shouldn't need to test here. [1]: https://github.com/alphagov/notifications-api/pull/3383	2021-11-26 17:17:45 +00:00

1 2 3 4 5 ...

4131 Commits