notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-21 07:51:13 -05:00

Author	SHA1	Message	Date
Kenneth Kehl	a46bd94650	code review feedback	2023-06-19 07:55:48 -07:00
Kenneth Kehl	e4a98dfcc2	cleanup, get rid of print statements, etc.	2023-06-15 10:45:03 -07:00
Kenneth Kehl	95a2db6991	fix test	2023-06-15 09:22:44 -07:00
Kenneth Kehl	80c6c677db	clean up remaining todays()	2023-05-31 10:07:27 -07:00
Kenneth Kehl	db0d829238	notify-281 fix time-sensitive tests	2023-05-31 17:52:16 -07:00
Kenneth Kehl	39707adf10	Merge branch 'main' of https://github.com/GSA/notifications-api into notify-260	2023-05-30 11:47:36 -07:00
Kenneth Kehl	d8c3b0dfe4	fix more skips	2023-05-18 10:56:58 -07:00
Kenneth Kehl	6e597aafad	notify-34 fix skipped tests	2023-05-18 09:46:36 -07:00
Kenneth Kehl	08c1ad75c8	notify-260 remove server-side timezone handling	2023-05-10 08:39:50 -07:00
Kenneth Kehl	27d86c949a	#224 remove crown (#228 ) Co-authored-by: Kenneth Kehl <@kkehl@flexion.us>	2023-04-11 16:29:37 -04:00
Steven Reilly	ff4190a8eb	Remove letters-related code (#175 ) This deletes a big ol' chunk of code related to letters. It's not everything—there are still a few things that might be tied to sms/email—but it's the the heart of letters function. SMS and email function should be untouched by this. Areas affected: - Things obviously about letters - PDF tasks, used for precompiling letters - Virus scanning, used for those PDFs - FTP, used to send letters to the printer - Postage stuff	2023-03-02 20:20:31 -05:00
Ryan Ahearn	041cd08097	Clean up more mmg and firetext references	2022-12-22 09:31:12 -05:00
stvnrlly	9e7ee1c0f8	migrate bst_date to local_date	2022-11-21 11:49:59 -05:00
stvnrlly	e6d30394ba	london → local	2022-11-16 14:11:52 -05:00
stvnrlly	213f699c99	time adjustments in tests	2022-11-14 14:23:54 -05:00
stvnrlly	3528bd37e1	update time handling for more tests	2022-11-10 16:54:48 -05:00
stvnrlly	e9fdfd59f4	clean flake8 except provider code	2022-10-19 16:16:26 +00:00
Christa Hartsock	c4cdaed683	Skip tests that fail because of timezone handling	2022-07-07 15:41:16 -07:00
Christa Hartsock	af6495cd4c	Get tests passing locally When we cloned the repository and started making modifications, we didn't initially keep tests in step. This commit tries to get us to a clean test run by skipping tests that are failing and removing some that we no longer expect to use (MMG, Firetext), with the intention that we will come back in future and update or remove them as appropriate. To find all tests skipped, search for `@pytest.mark.skip(reason="Needs updating for TTS:`. There will be a brief description of the work that needs to be done to get them passing, if known. Delete that line to make them run in a standard test run (`make test`).	2022-07-07 15:41:15 -07:00
Ben Thorner	43dbc0891f	Merge pull request #3546 from alphagov/notification-view-178125825 Use notification view for status / billing tasks	2022-05-26 11:03:38 +01:00
Ben Thorner	33645c7747	Use notification view for status / billing tasks This fixes a bug where (letter) notifications left in sending would temporarily get excluded from billing and status calculations once the service retention period had elapsed, and then get included once again when they finally get marked as delivered.* Status and billing tasks shouldn't need to have knowledge about which table their data is in and getting this wrong is the fundamental cause of the bug here. Adding a view across both tables abstracts this away while keeping the query complexity the same. Using a view also has the added benefit that we no longer need to care when the status / billing tasks run in comparison to the deletion task, since we will retrieve the same data irrespective (see below for a more detailed discussion on data integrity). Such a scenario is rare but has happened. A New View ========== I've included all the columns that are shared between the two tables, even though only a subset are actually needed. Having extra columns has no impact and may be useful in future. Although the view isn't actually a table, SQLAlchemy appears to wrap it without any issues, noting that the package doesn't have any direct support for "view models". Because we're never inserting data, we don't need most of the kwargs when defining columns. Note that the "default" kwarg doesn't affect data that's retrieved, only data that's written (if no value is set). Data Integrity ============== The (new) tests cover the main scenarios. We need to be careful with how the view interacts with the deletion / archiving task. There are two concerns here: - Duplicates. The deletion task inserts before it deletes [^1], so we could end up double counting. It turns out this isn't a problem because a Postgres UNION is an implicit "DISTINCT" [^2]. I've also verified this manually, just to be on the safe side. - No data. It's conceivable that the query will check the history table just before the insertion, then check the notifications table just after the deletion. It turns out this isn't a problem either because the whole query sees the same DB snapshot [^3][^4]. I can't think of a way to test this as it's a race condition, but I'm confident the Postgres docs are accurate. Performance =========== I copied the relevant (non-PII) columns from Production for data going back to 2022-04-01. I then ran several tests. Queries using the new view still make use of indices on a per-table basis, as the following query plan illustrates: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ GroupAggregate (cost=1130820.02..1135353.89 rows=46502 width=97) (actual time=629.863..756.703 rows=72 loops=1) Group Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international -> Sort (cost=1130820.02..1131401.28 rows=232506 width=85) (actual time=629.756..708.914 rows=217563 loops=1) Sort Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international Sort Method: external merge Disk: 9320kB -> Subquery Scan on notifications_all_time_view (cost=1088506.43..1098969.20 rows=232506 width=85) (actual time=416.118..541.669 rows=217563 loops=1) -> Unique (cost=1088506.43..1096644.14 rows=232506 width=725) (actual time=416.115..513.065 rows=217563 loops=1) -> Sort (cost=1088506.43..1089087.70 rows=232506 width=725) (actual time=416.115..451.190 rows=217563 loops=1) Sort Key: notifications_no_pii.id, notifications_no_pii.job_id, notifications_no_pii.service_id, notifications_no_pii.template_id, notifications_no_pii.key_type, notifications_no_pii.billable_units, notifications_no_pii.notification_type, notifications_no_pii.created_at, notifications_no_pii.sent_by, notifications_no_pii.notification_status, notifications_no_pii.international, notifications_no_pii.rate_multiplier, notifications_no_pii.postage Sort Method: external merge Disk: 23936kB -> Append (cost=114.42..918374.12 rows=232506 width=725) (actual time=2.051..298.229 rows=217563 loops=1) -> Bitmap Heap Scan on notifications_no_pii (cost=114.42..8557.55 rows=2042 width=113) (actual time=1.405..1.442 rows=0 loops=1) Recheck Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) Filter: ((key_type)::text = ANY ('{normal,team}'::text[])) -> Bitmap Index Scan on ix_notifications_no_piiservice_id_composite (cost=0.00..113.91 rows=2202 width=0) (actual time=1.402..1.439 rows=0 loops=1) Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) -> Index Scan using ix_notifications_history_no_pii_service_id_composite on notifications_history_no_pii (cost=0.70..906328.97 rows=230464 width=113) (actual time=0.645..281.612 rows=217563 loops=1) Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND ((key_type)::text = ANY ('{normal,team}'::text[])) AND (notification_type = 'sms'::notification_type) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) Filter: (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) Planning Time: 18.032 ms Execution Time: 759.001 ms (21 rows) Queries using the new view appear to be slower than without, but the differences I've seen are minimal: the original queries execute in seconds locally and in Production, so it's not a big issue. Notes: Performance ================== I downloaded a minimal set of columns for testing: \copy ( select id, notification_type, key_type, created_at, service_id, template_id, sent_by, rate_multiplier, international, billable_units, postage, job_id, notification_status from notifications ) to 'notifications.csv' delimiter ',' csv header; CREATE TABLE notifications_no_pii ( id uuid NOT NULL, notification_type public.notification_type NOT NULL, key_type character varying(255) NOT NULL, created_at timestamp without time zone NOT NULL, service_id uuid, template_id uuid, sent_by character varying, rate_multiplier numeric, international boolean, billable_units integer NOT NULL, postage character varying, job_id uuid, notification_status text ); copy notifications_no_pii from '/Users/ben.thorner/Desktop/notifications.csv' delimiter ',' csv header; CREATE INDEX ix_notifications_no_piicreated_at ON notifications_no_pii USING btree (created_at); CREATE INDEX ix_notifications_no_piijob_id ON notifications_no_pii USING btree (job_id); CREATE INDEX ix_notifications_no_piinotification_type_composite ON notifications_no_pii USING btree (notification_type, notification_status, created_at); CREATE INDEX ix_notifications_no_piiservice_created_at ON notifications_no_pii USING btree (service_id, created_at); CREATE INDEX ix_notifications_no_piiservice_id_composite ON notifications_no_pii USING btree (service_id, notification_type, notification_status, created_at); CREATE INDEX ix_notifications_no_piitemplate_id ON notifications_no_pii USING btree (template_id); And similarly for the history table. I then created a sepatate view across both of these temporary tables using just these columns. To test performance I created some queries that reflect what is run by the billing [^5] and status [^6] tasks e.g. explain analyze select template_id, sent_by, rate_multiplier, international, sum(billable_units), count() from notifications_all_time_view where notification_status in ('sending', 'sent', 'delivered', 'pending', 'temporary-failure', 'permanent-failure') and key_type in ('normal', 'team') and created_at >= '2022-05-01 23:00' and created_at < '2022-05-02 23:00' and notification_type = 'sms' and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f' group by 1,2,3,4; explain analyze select template_id, job_id, key_type, notification_status, count(*) from notifications_all_time_view where created_at >= '2022-05-01 23:00' and created_at < '2022-05-02 23:00' and notification_type = 'sms' and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f' and key_type in ('normal', 'team') group by 1,2,3,4; Between running queries I restarted my local database and also ran a command to purge disk caches [^7]. I tested on a few services: - c5956607-20b1-48b4-8983-85d11404e61f on 2022-05-02 (high volume) - 0cc696c6-b792-409d-99e9-64232f461b0f on 2022-04-06 (highest volume) - 01135db6-7819-4121-8b97-4aa2d741e372 on 2022-04-14 (very low volume) All execution results are of the same magnitude using the view compared to the worst case of either table on its own. [^1]: `00a04ebf54/app/dao/notifications_dao.py (L389)` [^2]: https://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all [^3]: https://www.postgresql.org/docs/current/transaction-iso.html [^4]: https://dba.stackexchange.com/questions/210485/can-sub-selects-change-in-one-single-query-in-a-read-committed-transaction [^5]: `00a04ebf54/app/dao/fact_billing_dao.py (L471)` [^6]: `00a04ebf54/app/dao/fact_notification_status_dao.py (L58)` [^7]: https://stackoverflow.com/questions/28845524/echo-3-proc-sys-vm-drop-caches-on-mac-osx	2022-05-19 15:14:32 +01:00
Ben Thorner	648952eb38	Standardise tests for nightly billing sub-task	2022-05-19 13:40:00 +01:00
Ben Thorner	a9104a2ec3	Simplify test for status aggregation We don't benefit from testing a third service and simplifying the test will make it easier to incorporate anothere edge case later.	2022-05-19 13:39:46 +01:00
Ben Thorner	458e997706	Recalculate billing rows for 10 days (prev. 4) This effectively reverts [^1], which was only a temporary change. I suspect the performance problem will go away with [^2]. While we've clearly been managing without this change, it resulted in several rows being left as incorrect when letter receipts were delayed. It makes sense for us to run this task for the same period as we do to aggregate statuses, as status affects billing. [^1]: `e5c76ffda7` [^2]: https://github.com/alphagov/notifications-api/pull/3542	2022-05-17 17:38:08 +01:00
Katie Smith	bff4e3a709	Fix test which was flakey due to order of items returned from db The test was querying `FactNotificationStatus` and ordering the results by bst_date and notification_type then checking the rows. However, the bst_date and notification_type for each row is the same, so this test could fail based on the order that the results came back in. By ordering on the notification_status instead, we can be sure of the order of the results.	2022-02-28 13:03:55 +00:00
Ben Thorner	574b1d3c63	Fix not deleting dead rows in status table To address: https://github.com/alphagov/notifications-api/pull/3454#pullrequestreview-880302729 Since we now delete all rows before inserting fresh ones, we no longer need to worry about conflicts. I've also extended the old test to check all three kinds of overwrite: new, changed, gone.	2022-02-16 13:40:08 +00:00
Ben Thorner	1213463b8e	Only aggregate status when necessary for a service This takes a similar approach to the nightly deletion task so that we only create sub-tasks when there are actually notifications to aggregate for a given type and day [1]. We're making this change to stop the duplication errors we're getting at the moment and ensure the task can scale to more messages and more services. There are two parts to this: - Each subtask should now run within the 5 minute visibility timeout. However, they may still be duplicated if the parent task overruns [2]. - The parent task creates a mininal number of subtasks, and the query to determine this is very fast for a normal process day (milliseconds). Since all tasks will run quickly, there should be no more duplication. In order to test this more nuanced task, I rewrote the tests: - One test checks the subtask is called correctly. - One test checks we create all the right subtasks. [1]: https://github.com/alphagov/notifications-api/pull/3381 [2]: https://docs.google.com/document/d/1MaP6Nyy3nJKkuh_4lP1wuDm19X8LZITOLRd9n3Ax-xg/edit#heading=h.q3intzwqhfzl	2022-02-09 17:39:07 +00:00
Ben Thorner	018a253b6f	Revert "Revert running status aggregation in parallel" This reverts commit `0f6dea0deb`.	2022-02-09 17:39:00 +00:00
Ben Thorner	0f6dea0deb	Revert running status aggregation in parallel The top-level task didn't run successfully after this was deployed due to the worker being killed due to heavy disk usage. While the more parallel version does log much more, it doesn't totally explain the disk behaviour. Nonetheless, reverting it is sensible to give us the time we need to investigate more.	2022-01-20 12:22:33 +00:00
Ben Thorner	9182ebf4e5	Parallelise status aggregation by service and day This follows a similar approach as [1]. Recently we've seen lots of errors from this task, which we think are a consequence of it doing too much work and tripping Celery's visibility timeout. While we can optimise the query [2], it's likely the errors will return as the number of live services grows. Parallelising the aggregation now will make it more futureproof. [1]: https://github.com/alphagov/notifications-api/pull/3397 [2]: https://github.com/alphagov/notifications-api/pull/3417	2022-01-12 15:47:59 +00:00
Ben Thorner	9fc8b904c6	DRY up status aggregation tests (move DAO tests up) The previous DAO tests were also confusing because they were testing two functions at the same time, so moving the tests up to the task level seems very reasonable, and will make it easier to change how this code works in the next commits.	2022-01-11 16:11:36 +00:00
Rebecca Law	057c4e4568	Quick fix to ensure that billing doesn't fail if the crown is not set for the service. The letters rates for cronw and non crown are the same. It would be nice to remove the need for crown but for now this is a quick fix.	2021-03-25 08:42:46 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Chris Hill-Scott	3b0b96834d	Do extra code style checks with flake8-bugbear Flake8 Bugbear checks for some extra things that aren’t code style errors, but are likely to introduce bugs or unexpected behaviour. A good example is having mutable default function arguments, which get shared between every call to the function and therefore mutating a value in one place can unexpectedly cause it to change in another. This commit enables all the extra warnings provided by Flake8 Bugbear, except for: - the line length one (because we already lint for that separately) - B903 Data class should either be immutable or use `__slots__` because this seems to false-positive on some of our custom exceptions - B902 Invalid first argument 'cls' used for instance method because some SQLAlchemy decorators (eg `declared_attr`) make things that aren’t formally class methods take a class not an instance as their first argument It disables: - _B306: BaseException.message is removed in Python 3_ because I think our exceptions have a custom structure that means the `.message` attribute is still present Matches the work done in other repos: - https://github.com/alphagov/notifications-admin/pull/3172/files	2020-12-22 16:26:45 +00:00
Katie Smith	5d47e1cf06	Migration to add international letter rates New rows giving the prices of letters with a post_class of `europe` and `rest-of-world` have been added to the `letter_rates` table. All rates are currently the same for international letters.	2020-07-14 10:22:30 +01:00
Leo Hemsted	0f6f2f1b91	split up _query_for_billing_data into three separate queries the queries all return lots of columns, but each query has columns it doesn't care about. eg emails don't have billable units or international flag, letters don't have international flag, sms don't have a page count etc. additionally, the query was grouping on things that never change, like service id and notification type. by making all of these literals (as in `select 1 as foo`) we see times that are over 50% quicker for gov.uk email service. Note: One of the tests changed because previously it involved emails and sms with statuses that they could never be (eg returned-letter)	2020-02-19 13:12:01 +00:00
Leo Hemsted	6ac4595224	process letters for 10 days when updating ft_notification_status sms and emails have a very predictable 72 hour lifecycle. letters, on the other hand, have ridiculously complex lifecycles - they might not get sent because it's a weekend, they might not get sent because they're second class and are only processed on alternate days, they might not get sent because a different letter in the same batch had an error that we didn't know about. Either way, it's apparent that four days is definitely not enough time to guarantee that letters have gone from sending to delivered. Extend the amount of days we process for letters to 10 days. Keep emails and sms down at 4 to keep run-times shorter We're deliberately not thinking about returned letters here at all.	2019-12-09 16:02:43 +00:00
Leo Hemsted	884cb24bfa	remove day_start from create nightly notification status it makes less sense once we introduce different start dates for letters and emails. Also, we never use it, since we just call the day tasks ourselves from commands.py	2019-12-09 16:02:21 +00:00
Leo Hemsted	0448bca542	make create_nightly_notification_status_for_day take notification_type the nightly task won't be affected, it'll just trigger three times more sub-tasks. this doesn't need to be a two-part deploy because we only trigger this overnight, so as long as the deploy completes in daytime we don't need to worry about celery task signatures	2019-12-05 14:43:33 +00:00
Leo Hemsted	8f13697cf1	Revert "trigger nightly delete tasks from the create notification status task" This reverts commit `58f24a0a83`.	2019-08-19 16:06:25 +01:00
Leo Hemsted	36dd750637	split up reporting tasks in to separate tasks per day to try and speed up overall time by parallelising	2019-08-19 16:06:25 +01:00
Leo Hemsted	92d78956be	Merge pull request #2592 from alphagov/reporting-worker Add reporting worker	2019-08-15 17:22:27 +01:00
Leo Hemsted	e5c76ffda7	reduce days to process from 10 to 4 to try and speed it up temporarily.	2019-08-15 17:06:38 +01:00
Leo Hemsted	58f24a0a83	trigger nightly delete tasks from the create notification status task the nightly tasks need to run after the create nightly notification status task - so that test notifications are still there to record stats for, and to stop the risk of deleting notificaitons part-way through recording stats for them.	2019-08-14 18:04:45 +01:00
Rebecca Law	996dcdd88c	Increase the number of days we rebuild the tables for	2019-07-18 16:45:27 +01:00
Rebecca Law	e3ee99e70d	Reduce the number of days to recalculate billing. It's not necessary to calculate longer than 4 days.	2019-05-15 14:40:53 +01:00
Leo Hemsted	28bff28786	remove letter rates between tests we only use them in ft_billing so no reason not to delete them. makes the tests read better as it's obvious how they work too.	2019-04-03 15:34:02 +01:00
Leo Hemsted	9f1f858997	update fact_billing_dao::get_rate to use dates not datetimes update unit tests too	2019-04-03 14:52:41 +01:00
Leo Hemsted	6f41f6c7d7	use db models instead of tuples when referring to rate objects makes it less confusing	2019-04-03 13:08:53 +01:00
Leo Hemsted	1dc084be54	fix nightly ft stats tables task to respect BST the create_nightly_notification_status task runs at 00:30am UK time, however this means that in summer datetime.today() will return the wrong date as the server (which runs on UTC) will run the task at 23:30 (populating the wrong row in the table). fix this to use nice tz aware functions	2019-04-02 15:15:07 +01:00

1 2

71 Commits