notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-20 15:31:15 -05:00

Author	SHA1	Message	Date
Kenneth Kehl	1ecb747c6d	reformat	2023-08-29 14:54:30 -07:00
Kenneth Kehl	5a350560d7	notify-api-433b remove research mode	2023-08-25 12:09:00 -07:00
Kenneth Kehl	15a70460bc	code review feedback	2023-06-01 07:07:10 -07:00
Kenneth Kehl	3b0d38ea39	more fix	2023-05-30 12:16:49 -07:00
Kenneth Kehl	08c1ad75c8	notify-260 remove server-side timezone handling	2023-05-10 08:39:50 -07:00
Steven Reilly	ff4190a8eb	Remove letters-related code (#175 ) This deletes a big ol' chunk of code related to letters. It's not everything—there are still a few things that might be tied to sms/email—but it's the the heart of letters function. SMS and email function should be untouched by this. Areas affected: - Things obviously about letters - PDF tasks, used for precompiling letters - Virus scanning, used for those PDFs - FTP, used to send letters to the printer - Postage stuff	2023-03-02 20:20:31 -05:00
stvnrlly	9e7ee1c0f8	migrate bst_date to local_date	2022-11-21 11:49:59 -05:00
stvnrlly	e6d30394ba	london → local	2022-11-16 14:11:52 -05:00
stvnrlly	b50cb4712f	tz utility swap and many test updates	2022-11-10 12:33:25 -05:00
Ben Thorner	8e837cf681	Use table class directly instead of "table" var In response to [^1]. [^1]: https://github.com/alphagov/notifications-api/pull/3546#discussion_r879541366	2022-05-24 10:16:28 +01:00
Ben Thorner	33645c7747	Use notification view for status / billing tasks This fixes a bug where (letter) notifications left in sending would temporarily get excluded from billing and status calculations once the service retention period had elapsed, and then get included once again when they finally get marked as delivered.* Status and billing tasks shouldn't need to have knowledge about which table their data is in and getting this wrong is the fundamental cause of the bug here. Adding a view across both tables abstracts this away while keeping the query complexity the same. Using a view also has the added benefit that we no longer need to care when the status / billing tasks run in comparison to the deletion task, since we will retrieve the same data irrespective (see below for a more detailed discussion on data integrity). Such a scenario is rare but has happened. A New View ========== I've included all the columns that are shared between the two tables, even though only a subset are actually needed. Having extra columns has no impact and may be useful in future. Although the view isn't actually a table, SQLAlchemy appears to wrap it without any issues, noting that the package doesn't have any direct support for "view models". Because we're never inserting data, we don't need most of the kwargs when defining columns. Note that the "default" kwarg doesn't affect data that's retrieved, only data that's written (if no value is set). Data Integrity ============== The (new) tests cover the main scenarios. We need to be careful with how the view interacts with the deletion / archiving task. There are two concerns here: - Duplicates. The deletion task inserts before it deletes [^1], so we could end up double counting. It turns out this isn't a problem because a Postgres UNION is an implicit "DISTINCT" [^2]. I've also verified this manually, just to be on the safe side. - No data. It's conceivable that the query will check the history table just before the insertion, then check the notifications table just after the deletion. It turns out this isn't a problem either because the whole query sees the same DB snapshot [^3][^4]. I can't think of a way to test this as it's a race condition, but I'm confident the Postgres docs are accurate. Performance =========== I copied the relevant (non-PII) columns from Production for data going back to 2022-04-01. I then ran several tests. Queries using the new view still make use of indices on a per-table basis, as the following query plan illustrates: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ GroupAggregate (cost=1130820.02..1135353.89 rows=46502 width=97) (actual time=629.863..756.703 rows=72 loops=1) Group Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international -> Sort (cost=1130820.02..1131401.28 rows=232506 width=85) (actual time=629.756..708.914 rows=217563 loops=1) Sort Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international Sort Method: external merge Disk: 9320kB -> Subquery Scan on notifications_all_time_view (cost=1088506.43..1098969.20 rows=232506 width=85) (actual time=416.118..541.669 rows=217563 loops=1) -> Unique (cost=1088506.43..1096644.14 rows=232506 width=725) (actual time=416.115..513.065 rows=217563 loops=1) -> Sort (cost=1088506.43..1089087.70 rows=232506 width=725) (actual time=416.115..451.190 rows=217563 loops=1) Sort Key: notifications_no_pii.id, notifications_no_pii.job_id, notifications_no_pii.service_id, notifications_no_pii.template_id, notifications_no_pii.key_type, notifications_no_pii.billable_units, notifications_no_pii.notification_type, notifications_no_pii.created_at, notifications_no_pii.sent_by, notifications_no_pii.notification_status, notifications_no_pii.international, notifications_no_pii.rate_multiplier, notifications_no_pii.postage Sort Method: external merge Disk: 23936kB -> Append (cost=114.42..918374.12 rows=232506 width=725) (actual time=2.051..298.229 rows=217563 loops=1) -> Bitmap Heap Scan on notifications_no_pii (cost=114.42..8557.55 rows=2042 width=113) (actual time=1.405..1.442 rows=0 loops=1) Recheck Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) Filter: ((key_type)::text = ANY ('{normal,team}'::text[])) -> Bitmap Index Scan on ix_notifications_no_piiservice_id_composite (cost=0.00..113.91 rows=2202 width=0) (actual time=1.402..1.439 rows=0 loops=1) Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) -> Index Scan using ix_notifications_history_no_pii_service_id_composite on notifications_history_no_pii (cost=0.70..906328.97 rows=230464 width=113) (actual time=0.645..281.612 rows=217563 loops=1) Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND ((key_type)::text = ANY ('{normal,team}'::text[])) AND (notification_type = 'sms'::notification_type) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) Filter: (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) Planning Time: 18.032 ms Execution Time: 759.001 ms (21 rows) Queries using the new view appear to be slower than without, but the differences I've seen are minimal: the original queries execute in seconds locally and in Production, so it's not a big issue. Notes: Performance ================== I downloaded a minimal set of columns for testing: \copy ( select id, notification_type, key_type, created_at, service_id, template_id, sent_by, rate_multiplier, international, billable_units, postage, job_id, notification_status from notifications ) to 'notifications.csv' delimiter ',' csv header; CREATE TABLE notifications_no_pii ( id uuid NOT NULL, notification_type public.notification_type NOT NULL, key_type character varying(255) NOT NULL, created_at timestamp without time zone NOT NULL, service_id uuid, template_id uuid, sent_by character varying, rate_multiplier numeric, international boolean, billable_units integer NOT NULL, postage character varying, job_id uuid, notification_status text ); copy notifications_no_pii from '/Users/ben.thorner/Desktop/notifications.csv' delimiter ',' csv header; CREATE INDEX ix_notifications_no_piicreated_at ON notifications_no_pii USING btree (created_at); CREATE INDEX ix_notifications_no_piijob_id ON notifications_no_pii USING btree (job_id); CREATE INDEX ix_notifications_no_piinotification_type_composite ON notifications_no_pii USING btree (notification_type, notification_status, created_at); CREATE INDEX ix_notifications_no_piiservice_created_at ON notifications_no_pii USING btree (service_id, created_at); CREATE INDEX ix_notifications_no_piiservice_id_composite ON notifications_no_pii USING btree (service_id, notification_type, notification_status, created_at); CREATE INDEX ix_notifications_no_piitemplate_id ON notifications_no_pii USING btree (template_id); And similarly for the history table. I then created a sepatate view across both of these temporary tables using just these columns. To test performance I created some queries that reflect what is run by the billing [^5] and status [^6] tasks e.g. explain analyze select template_id, sent_by, rate_multiplier, international, sum(billable_units), count() from notifications_all_time_view where notification_status in ('sending', 'sent', 'delivered', 'pending', 'temporary-failure', 'permanent-failure') and key_type in ('normal', 'team') and created_at >= '2022-05-01 23:00' and created_at < '2022-05-02 23:00' and notification_type = 'sms' and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f' group by 1,2,3,4; explain analyze select template_id, job_id, key_type, notification_status, count(*) from notifications_all_time_view where created_at >= '2022-05-01 23:00' and created_at < '2022-05-02 23:00' and notification_type = 'sms' and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f' and key_type in ('normal', 'team') group by 1,2,3,4; Between running queries I restarted my local database and also ran a command to purge disk caches [^7]. I tested on a few services: - c5956607-20b1-48b4-8983-85d11404e61f on 2022-05-02 (high volume) - 0cc696c6-b792-409d-99e9-64232f461b0f on 2022-04-06 (highest volume) - 01135db6-7819-4121-8b97-4aa2d741e372 on 2022-04-14 (very low volume) All execution results are of the same magnitude using the view compared to the worst case of either table on its own. [^1]: `00a04ebf54/app/dao/notifications_dao.py (L389)` [^2]: https://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all [^3]: https://www.postgresql.org/docs/current/transaction-iso.html [^4]: https://dba.stackexchange.com/questions/210485/can-sub-selects-change-in-one-single-query-in-a-read-committed-transaction [^5]: `00a04ebf54/app/dao/fact_billing_dao.py (L471)` [^6]: `00a04ebf54/app/dao/fact_notification_status_dao.py (L58)` [^7]: https://stackoverflow.com/questions/28845524/echo-3-proc-sys-vm-drop-caches-on-mac-osx	2022-05-19 15:14:32 +01:00
Ben Thorner	574b1d3c63	Fix not deleting dead rows in status table To address: https://github.com/alphagov/notifications-api/pull/3454#pullrequestreview-880302729 Since we now delete all rows before inserting fresh ones, we no longer need to worry about conflicts. I've also extended the old test to check all three kinds of overwrite: new, changed, gone.	2022-02-16 13:40:08 +00:00
Ben Thorner	a69d1635a1	Update FactStatus table in bulk for each service Previously we were looping over data from the Notifications/History table and then shovelling it into the status table, one row at a time - plus an extra delete to clean up any existing data. This replaces that with a batch insertion, similar to how we archive notifications [1], but using a simple subquery (via "from_select" [2]) instead of a temporary table. To make the select compatible with the insert, I've used "literal" to inject the constant pieces of data, so each row has everything it needs to go into the status table. [1]: `9ce6d2fe92/app/dao/notifications_dao.py (L295)` [2]: https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.from_select	2022-02-16 13:40:05 +00:00
Ben Thorner	efde271c5a	Rewrite FactStatus updates to be an upsert This is consistent with the way we do billing updates [1] and is a bit less clunky. Functionally it should be the same - note that the tests already cover the "overwriting" behaviour if a row exists. [1]: `9ce6d2fe92/app/dao/fact_billing_dao.py (L522)`	2022-02-16 13:39:08 +00:00
Ben Thorner	6e8f121548	Standardise how we query midnight-to-midnight Partially addresses [1] (lots more detail to read in the comment). I've also added some tests for the status DAO function to confirm it behaves as expected across timezones. [1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r802634913	2022-02-10 10:51:27 +00:00
Ben Thorner	018a253b6f	Revert "Revert running status aggregation in parallel" This reverts commit `0f6dea0deb`.	2022-02-09 17:39:00 +00:00
Ben Thorner	0f6dea0deb	Revert running status aggregation in parallel The top-level task didn't run successfully after this was deployed due to the worker being killed due to heavy disk usage. While the more parallel version does log much more, it doesn't totally explain the disk behaviour. Nonetheless, reverting it is sensible to give us the time we need to investigate more.	2022-01-20 12:22:33 +00:00
Ben Thorner	9686595fa8	Minor tweaks to address comments on the PR To address: - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786867994 - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786853329 - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786848793 - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786214794	2022-01-18 16:56:53 +00:00
Ben Thorner	086f0f50a6	Remove unnecessary extra method in status DAO This makes it easier to see what is being queried.	2022-01-12 15:48:00 +00:00
Ben Thorner	ddbf556486	Rewrite task to aggregate status by service This is a step towards parallelising the task by service and day.	2022-01-12 15:47:53 +00:00
Ben Thorner	63b5204fb0	Optimise query to populate notification statuses Investigation with EXPLAIN and EXPLAIN ANALYZE for the notification history table shows this is another instance of [1] but for the key type column. Swapping "!=" for "IN" solves the problem. [1]: https://github.com/alphagov/notifications-api/pull/3360	2022-01-11 13:22:04 +00:00
Rebecca Law	684a882cf3	Revert "Do not include today's totals"	2021-06-02 16:06:33 +01:00
Rebecca Law	a341536de0	- Add comment to test and new if statement - Update assert in test	2021-06-02 14:13:31 +01:00
Rebecca Law	b170b5ed80	This change is a temporary fix to allow users for high volume services to use the admin app. The trouble is the aggregate query to return the big blue numbers on the dashboard and /notifications/{notification_type} page is taking too long to return. I have some ideas on how to improve the query, but should take some time to do some more research and test. In the meantime, let's just ignore "todays" total numbers for the high volume services. There are only two services that this will affect.	2021-06-02 10:31:38 +01:00
Rebecca Law	d4009ffc52	Rename database management functions. Rename @transactional to @autocommit. Rename nested_transaction to tranaction.	2021-04-19 10:56:00 +01:00
David McDonald	41d95378ea	Remove everything for the performance platform We no longer will send them any stats so therefore don't need the code - the code to work out the nightly stats - the performance platform client - any configuration for the client - any nightly tasks that kick off the sending off the stats We will require a change in cronitor as we no longer will have this task run meaning we need to delete the cronitor check.	2021-03-15 12:04:53 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Rebecca Law	11d10d5293	Rename to performance_dashboard Fix totals to return totals for all time rather than for date range. Added more test data	2021-03-10 13:16:25 +00:00
Rebecca Law	b06850e611	Add an endpoint to return all the data required for the performance platform page.	2021-03-05 09:59:03 +00:00
Pea Tyczynska	9c4205c7c6	Remove statsd decorators from dao functions This done so that we do not use statsd on our http endpoint. We decided we do not need metrics that this gave us. If we change our minds, we will add Prometheus-friendly decorators instead in the future.	2020-07-07 18:02:24 +01:00
Katie Smith	5b8b0dfc6e	Include pending notifications in monthly data by service report The '/service/monthly-data-by-service` endpoint which is used for the 'Monthly notification statuses for live services' Platform Admin report did not including `pending` notifications. This updates the DAO function that the endpoint calls to group `pending` and `sending` notifications together.	2020-04-30 13:53:01 +01:00
Rebecca Law	291c6d6dc9	Add statsd annotations for the fact table queries.	2020-02-18 14:33:17 +00:00
Leo Hemsted	ca04ff4725	remove grouping from the fact status table as we now always filter on notification type we don't need to group as well	2019-12-05 15:08:03 +00:00
Leo Hemsted	0448bca542	make create_nightly_notification_status_for_day take notification_type the nightly task won't be affected, it'll just trigger three times more sub-tasks. this doesn't need to be a two-part deploy because we only trigger this overnight, so as long as the deploy completes in daytime we don't need to worry about celery task signatures	2019-12-05 14:43:33 +00:00
Leo Hemsted	8d160303a1	add transactional wrapper and add case to get_notification_table_to_use test	2019-12-04 15:26:26 +00:00
Leo Hemsted	d457db4164	make has_delete_task_run non-optional just to ensure people think about the value of it when using the function	2019-12-03 14:19:14 +00:00
Leo Hemsted	34ac7cb6c0	only commit once, rather than for every insert if we insert a valid row, that'll mean we properly roll back the delete of old data	2019-11-29 15:27:56 +00:00
Leo Hemsted	913cf5e12d	work out which table to get notification status data from previously we checked notifications table, and if the results were zero, checked the notification history table to see if there's data in there. When we know that data isn't in notifications, we're still checking. These queries take half a second per service, and we're doing at least ten for each of the five thousand services we have in notify. Most of these services have no data in either table for any given day, and we can reduce the amount of queries we do by only checking one table. Check the data retention for a service, and then if the date is older than the retention, get from history table. NOTE: This requires that the delete tasks haven't run yet for the day! If your retention is three days, this will look in the Notification table for data from three days ago - expecting that shortly after the task finishes, we'll delete that data.	2019-11-29 15:27:56 +00:00
Leo Hemsted	36dd750637	split up reporting tasks in to separate tasks per day to try and speed up overall time by parallelising	2019-08-19 16:06:25 +01:00
Katie Smith	81de71dc1f	Add dao function that gets the required data	2019-07-23 10:00:50 +01:00
Rebecca Law	a52c65ea29	Remove the delete query when updating the ft_billing. It's in the wrong place and we also should not need it.	2019-07-18 16:24:06 +01:00
Rebecca Law	ed611f982c	We found a problem with the report tasks that populate the fact tables (or statistic tables). It is possible that the notification status can change for notifications after 4 days. This PR updates those queries to look in either Notification or NotificationHistory. Since the data does not exist in both tables at the same time we can do with and not worry about the data retention. The query will iterate over each service, then each notification type and query the data if no results then try the history table.	2019-07-18 15:29:54 +01:00
Leo Hemsted	1dc084be54	fix nightly ft stats tables task to respect BST the create_nightly_notification_status task runs at 00:30am UK time, however this means that in summer datetime.today() will return the wrong date as the server (which runs on UTC) will run the task at 23:30 (populating the wrong row in the table). fix this to use nice tz aware functions	2019-04-02 15:15:07 +01:00
Rebecca Law	b9b81bca8f	Fix BST date bug for platform admin summary page. Added test for that.	2019-04-01 10:56:55 +01:00
Rebecca Law	1806f092f3	Update the nightly task that send performance platform statistics to use ft_notification_status rather than notification_history. The previous query was including all notifications regardless of notification_status. I don't think that's right, it shouldn't include things like technical-failure or validation-failed. Thoughts? I also need to remove the query that's no longer being used.	2019-03-29 14:21:05 +00:00
Alexey Bezhan	90dc69b6bc	Merge pull request #2303 from alphagov/ft-status-template-statistics Change template statistics endpoint to use fact_notification_status_dao	2019-01-17 15:04:20 +00:00
Rebecca Law	a4d89359c5	Adding a filter to exclude test keys for the template monthly usage query. Added a test.	2019-01-15 16:13:38 +00:00
Pea Tyczynska	52831813d8	Change template statistics endpoint to use fact_notification_status_dao	2019-01-15 11:55:45 +00:00
Alexey Bezhan	876346f469	Add an option to group notification stats for 7 days by template Currently, admin app requests service statistics (with notification counts grouped by status) and template statistics (with counts by template) in order to display the service dashboard. Service statistics are gathered from FactNotificationStatus table (counts for the last 7 days) combined with Notification (counts for today). Template statistics are currently gathered from redis cache, which contains a separate counter per template per day. It's hard for us to maintain consistency between redis and DB counts. Currently it doesn't update the count for cancelled letters, counter resets in the middle of the day might produce a wrong result for the rest of the week and cleared redis cache can't be repopulated for services with low data retention periods). Since FactNotificationStatus already contains separate counts for each template_id we can use the existing logic with some additional filters to get separate counts for each template and status combination, which would allow us to populate the service dashboard page from one query response.	2019-01-14 16:58:57 +00:00
Rebecca Law	b5a3ef9576	Added order by. Added more unit tests. Remove comments.	2019-01-14 15:28:26 +00:00

1 2

69 Commits