notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-20 23:41:17 -05:00

Author	SHA1	Message	Date
Ben Thorner	8e837cf681	Use table class directly instead of "table" var In response to [^1]. [^1]: https://github.com/alphagov/notifications-api/pull/3546#discussion_r879541366	2022-05-24 10:16:28 +01:00
Ben Thorner	33645c7747	Use notification view for status / billing tasks This fixes a bug where (letter) notifications left in sending would temporarily get excluded from billing and status calculations once the service retention period had elapsed, and then get included once again when they finally get marked as delivered.* Status and billing tasks shouldn't need to have knowledge about which table their data is in and getting this wrong is the fundamental cause of the bug here. Adding a view across both tables abstracts this away while keeping the query complexity the same. Using a view also has the added benefit that we no longer need to care when the status / billing tasks run in comparison to the deletion task, since we will retrieve the same data irrespective (see below for a more detailed discussion on data integrity). Such a scenario is rare but has happened. A New View ========== I've included all the columns that are shared between the two tables, even though only a subset are actually needed. Having extra columns has no impact and may be useful in future. Although the view isn't actually a table, SQLAlchemy appears to wrap it without any issues, noting that the package doesn't have any direct support for "view models". Because we're never inserting data, we don't need most of the kwargs when defining columns. Note that the "default" kwarg doesn't affect data that's retrieved, only data that's written (if no value is set). Data Integrity ============== The (new) tests cover the main scenarios. We need to be careful with how the view interacts with the deletion / archiving task. There are two concerns here: - Duplicates. The deletion task inserts before it deletes [^1], so we could end up double counting. It turns out this isn't a problem because a Postgres UNION is an implicit "DISTINCT" [^2]. I've also verified this manually, just to be on the safe side. - No data. It's conceivable that the query will check the history table just before the insertion, then check the notifications table just after the deletion. It turns out this isn't a problem either because the whole query sees the same DB snapshot [^3][^4]. I can't think of a way to test this as it's a race condition, but I'm confident the Postgres docs are accurate. Performance =========== I copied the relevant (non-PII) columns from Production for data going back to 2022-04-01. I then ran several tests. Queries using the new view still make use of indices on a per-table basis, as the following query plan illustrates: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ GroupAggregate (cost=1130820.02..1135353.89 rows=46502 width=97) (actual time=629.863..756.703 rows=72 loops=1) Group Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international -> Sort (cost=1130820.02..1131401.28 rows=232506 width=85) (actual time=629.756..708.914 rows=217563 loops=1) Sort Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international Sort Method: external merge Disk: 9320kB -> Subquery Scan on notifications_all_time_view (cost=1088506.43..1098969.20 rows=232506 width=85) (actual time=416.118..541.669 rows=217563 loops=1) -> Unique (cost=1088506.43..1096644.14 rows=232506 width=725) (actual time=416.115..513.065 rows=217563 loops=1) -> Sort (cost=1088506.43..1089087.70 rows=232506 width=725) (actual time=416.115..451.190 rows=217563 loops=1) Sort Key: notifications_no_pii.id, notifications_no_pii.job_id, notifications_no_pii.service_id, notifications_no_pii.template_id, notifications_no_pii.key_type, notifications_no_pii.billable_units, notifications_no_pii.notification_type, notifications_no_pii.created_at, notifications_no_pii.sent_by, notifications_no_pii.notification_status, notifications_no_pii.international, notifications_no_pii.rate_multiplier, notifications_no_pii.postage Sort Method: external merge Disk: 23936kB -> Append (cost=114.42..918374.12 rows=232506 width=725) (actual time=2.051..298.229 rows=217563 loops=1) -> Bitmap Heap Scan on notifications_no_pii (cost=114.42..8557.55 rows=2042 width=113) (actual time=1.405..1.442 rows=0 loops=1) Recheck Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) Filter: ((key_type)::text = ANY ('{normal,team}'::text[])) -> Bitmap Index Scan on ix_notifications_no_piiservice_id_composite (cost=0.00..113.91 rows=2202 width=0) (actual time=1.402..1.439 rows=0 loops=1) Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) -> Index Scan using ix_notifications_history_no_pii_service_id_composite on notifications_history_no_pii (cost=0.70..906328.97 rows=230464 width=113) (actual time=0.645..281.612 rows=217563 loops=1) Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND ((key_type)::text = ANY ('{normal,team}'::text[])) AND (notification_type = 'sms'::notification_type) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone)) Filter: (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) Planning Time: 18.032 ms Execution Time: 759.001 ms (21 rows) Queries using the new view appear to be slower than without, but the differences I've seen are minimal: the original queries execute in seconds locally and in Production, so it's not a big issue. Notes: Performance ================== I downloaded a minimal set of columns for testing: \copy ( select id, notification_type, key_type, created_at, service_id, template_id, sent_by, rate_multiplier, international, billable_units, postage, job_id, notification_status from notifications ) to 'notifications.csv' delimiter ',' csv header; CREATE TABLE notifications_no_pii ( id uuid NOT NULL, notification_type public.notification_type NOT NULL, key_type character varying(255) NOT NULL, created_at timestamp without time zone NOT NULL, service_id uuid, template_id uuid, sent_by character varying, rate_multiplier numeric, international boolean, billable_units integer NOT NULL, postage character varying, job_id uuid, notification_status text ); copy notifications_no_pii from '/Users/ben.thorner/Desktop/notifications.csv' delimiter ',' csv header; CREATE INDEX ix_notifications_no_piicreated_at ON notifications_no_pii USING btree (created_at); CREATE INDEX ix_notifications_no_piijob_id ON notifications_no_pii USING btree (job_id); CREATE INDEX ix_notifications_no_piinotification_type_composite ON notifications_no_pii USING btree (notification_type, notification_status, created_at); CREATE INDEX ix_notifications_no_piiservice_created_at ON notifications_no_pii USING btree (service_id, created_at); CREATE INDEX ix_notifications_no_piiservice_id_composite ON notifications_no_pii USING btree (service_id, notification_type, notification_status, created_at); CREATE INDEX ix_notifications_no_piitemplate_id ON notifications_no_pii USING btree (template_id); And similarly for the history table. I then created a sepatate view across both of these temporary tables using just these columns. To test performance I created some queries that reflect what is run by the billing [^5] and status [^6] tasks e.g. explain analyze select template_id, sent_by, rate_multiplier, international, sum(billable_units), count() from notifications_all_time_view where notification_status in ('sending', 'sent', 'delivered', 'pending', 'temporary-failure', 'permanent-failure') and key_type in ('normal', 'team') and created_at >= '2022-05-01 23:00' and created_at < '2022-05-02 23:00' and notification_type = 'sms' and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f' group by 1,2,3,4; explain analyze select template_id, job_id, key_type, notification_status, count(*) from notifications_all_time_view where created_at >= '2022-05-01 23:00' and created_at < '2022-05-02 23:00' and notification_type = 'sms' and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f' and key_type in ('normal', 'team') group by 1,2,3,4; Between running queries I restarted my local database and also ran a command to purge disk caches [^7]. I tested on a few services: - c5956607-20b1-48b4-8983-85d11404e61f on 2022-05-02 (high volume) - 0cc696c6-b792-409d-99e9-64232f461b0f on 2022-04-06 (highest volume) - 01135db6-7819-4121-8b97-4aa2d741e372 on 2022-04-14 (very low volume) All execution results are of the same magnitude using the view compared to the worst case of either table on its own. [^1]: `00a04ebf54/app/dao/notifications_dao.py (L389)` [^2]: https://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all [^3]: https://www.postgresql.org/docs/current/transaction-iso.html [^4]: https://dba.stackexchange.com/questions/210485/can-sub-selects-change-in-one-single-query-in-a-read-committed-transaction [^5]: `00a04ebf54/app/dao/fact_billing_dao.py (L471)` [^6]: `00a04ebf54/app/dao/fact_notification_status_dao.py (L58)` [^7]: https://stackoverflow.com/questions/28845524/echo-3-proc-sys-vm-drop-caches-on-mac-osx	2022-05-19 15:14:32 +01:00
Ben Thorner	648952eb38	Standardise tests for nightly billing sub-task	2022-05-19 13:40:00 +01:00
Ben Thorner	a9104a2ec3	Simplify test for status aggregation We don't benefit from testing a third service and simplifying the test will make it easier to incorporate anothere edge case later.	2022-05-19 13:39:46 +01:00
Ben Thorner	7e536d1c2b	Merge pull request #3542 from alphagov/optimise-historic-billing-query-182116071 Optimise billing query for notification history	2022-05-18 10:27:43 +01:00
Ben Thorner	4a520bce78	Optimise billing query for notification history This follows the same pattern as for status aggregations [^1]. We haven't seen this problem for a long time because of [^2], but now we're trying to re-run the aggregation for some incorrect rows it's becoming apparent we need to fix it. The following query currently fails in Production after the 30 min SQLAlchemy timeout: select template_id, rate_multiplier, international, sum(billable_units), count(*) from notification_history where notification_status in ('delivered', 'sending') and key_type != 'test' and notification_type = 'sms' and service_id = '539d63a1-701d-400d-ab11-f3ee2319d4d4' and created_at >= '2021-07-07 23:00' and created_at < '2021-07-08 23:00' group by 1,2,3,4; Running a quick "explain analyze" with this change applied returns near immediately, but hangs without it. This is enough evidence for me that this change will fix the issue. [^1]: https://github.com/alphagov/notifications-api/pull/3417 [^2]: `e5c76ffda7`	2022-05-17 17:29:50 +01:00
Katie Smith	1d4512e34b	Merge pull request #3540 from alphagov/pyup-scheduled-update-2022-05-11 Scheduled weekly dependency update for week 19	2022-05-17 11:26:52 +01:00
Katie Smith	fdee7b5ecb	Update cachetools from 5.0.0 to 5.1.0	2022-05-17 10:53:06 +01:00
Katie Smith	2f3da76e81	Update SQLAlchemy from 1.4.35 to 1.4.36	2022-05-17 10:51:49 +01:00
Katie Smith	76e683eeed	Update PyJWT from 2.3.0 to 2.4.0	2022-05-17 10:49:30 +01:00
Katie Smith	1d58f47dfe	Update jsonschema from 4.4.0 to 4.5.1	2022-05-17 10:45:28 +01:00
Katie Smith	ddec6de3c4	Update flask from 2.1.1 to 2.1.2	2022-05-17 10:36:16 +01:00
Katie Smith	c50eae11c3	Upgrade all test requirements	2022-05-17 10:33:49 +01:00
Ben Thorner	e4a45047b3	Merge pull request #3538 from alphagov/fix-out-of-date-status-182116071 Fix out-of-date rows in ft_notification_status	2022-05-17 10:26:22 +01:00
Ben Thorner	318cc1284a	Merge pull request #3534 from alphagov/remove-redundant-usage-fields-181935935 Remove redundant fields from service usage APIs	2022-05-11 15:12:31 +01:00
Ben Thorner	ed379a3724	Fix out-of-date rows in ft_notification_status This can happen in the following scenario (primarily for letters): 1. A service has a mixture of "delivered" and "sending" letters, which the status task aggregates into two rows: sending \| 123 delivered \| 456 2. After the 7 day retention has passed, only the "delivered" letters will be archived [^1]. 3. The status task now looks at the history table [^2], which means it only sees the "delivered" letters. 4. The "sending" letters are eventually "delivered" and archived (before the 10 day aggregation cutoff). 5. But the status aggregation task doesn't run. This commit fixes (5). [^1]: https://github.com/alphagov/notifications-api/pull/3063 [^2]: `f87ebb094d/app/dao/fact_notification_status_dao.py (L51)`	2022-05-11 11:04:56 +01:00
David McDonald	177b860865	Merge pull request #3539 from alphagov/adjust-resting-rates Adjust resting priorities for SMS providers	2022-05-11 09:48:24 +01:00
Ben Thorner	1d157836ad	Remove redundant fields from service usage APIs These are no longer used since [^1] and [^2]. [^1]: https://github.com/alphagov/notifications-admin/pull/4225 [^2]: https://github.com/alphagov/notifications-admin/pull/4229	2022-05-10 15:50:22 +01:00
Leo Hemsted	ce1cea90f0	Merge pull request #3535 from alphagov/notify-db-fixture-cleanup Notify db fixture cleanup	2022-05-10 15:43:07 +01:00
David McDonald	f000e5a52b	Adjust resting priorities for SMS providers	2022-05-10 15:07:25 +01:00
Katie Smith	f87ebb094d	Merge pull request #3537 from alphagov/existing-marshmallow-changes Fix existing marshmallow schemas and remove unused code	2022-05-10 10:29:38 +01:00
Katie Smith	dd213b8d55	Ensure UserUpdatePasswordSchema only loads _password We had the `only` defined in the Meta class, and this wasn't working - any extra fields were also being loaded. This moves it to the point where the class is instantiated, which now works.	2022-05-06 10:14:50 +01:00
Katie Smith	685959de00	Don't serialize broadcast_messages in TemplateHistorySchema In `0282a76` we excluded serializing `broadcast_messages` on some schemas. This also excludes it from the TemplateHistory schema.	2022-05-06 10:14:50 +01:00
Katie Smith	ab199b6b05	Remove unused excluded fields from Marshmallow schemas We have a lot of cases in the schemas where we're excluding a field that doesn't actually exist on the schema anyway. This is often because a model has been deleted, and the schema has not been updated. These excluded fields have no effect at the moment, but Marshmallow 3 does raise an error if you try and exclude non-existent fields. There should be no change to what gets (de)serialized after this change.	2022-05-06 10:14:50 +01:00
Katie Smith	bd42bded0a	Delete unused Marshmallow schema validation functions	2022-05-04 14:37:04 +01:00
Leo Hemsted	6b0d3860bd	make notify_db fixture private	2022-05-04 11:37:05 +01:00
Leo Hemsted	6181c60f75	remove usage of notify_db fixture in unit tests * notify_db fixture creates the database connection and ensures the test db exists and has migrations applied etc. It will run once per session (test run). * notify_db_session fixture runs after your test finishes and deletes all non static (eg type table) data. In unit tests that hit the database (ie: most of them), 99% of the time we will need to use notify_db_session to ensure everything is reset. The only time we don't need to use it is when we're querying things such as "ensure get X works when database is empty". This is such a low percentage of tests that it's easier for us to just use notify_db_session every time, and ensure that all our tests run much more consistently, at the cost of a small bit of performance when running tests. We used to use notify_db to access the session object for manually adding, committing, etc. To dissuade usage of that fixture I've moved that to the `notify_db_session`. I've then removed all uses of notify_db that I could find in the codebase. As a note, if you're writing a test that uses a `sample_x` fixture, all of those fixtures rely on notify_db_session so you'll get the teardown functionality for free. If you're just calling eg `create_x` db.py functions, then you'll need to make you add notify_db_session fixture to your test, even if you aren't manually accessing the session.	2022-05-04 11:36:54 +01:00
Ben Thorner	867e8fbce3	Merge pull request #3528 from alphagov/tidy-up-usage-tests-181934027 Minor refactorings to usage API tests	2022-05-04 10:31:09 +01:00
Leo Hemsted	d8f14abde6	Merge pull request #3533 from alphagov/remove-provider-rates remove provider_rates table	2022-05-04 10:29:25 +01:00
Leo Hemsted	201367ec33	Merge pull request #3531 from alphagov/cleanup-user-tests clean up user tests	2022-05-04 10:29:19 +01:00
Leo Hemsted	51646af92e	remove provider_rates table this was added five years ago but never used. if we want to bring back variable rates per client we might as well get a fresh start since a lot has changed since then.	2022-05-03 14:42:59 +01:00
Leo Hemsted	091d255d56	clean up user tests an API build failed because one of the tests expected the database to be empty, but it actually wasn't. This is probably because another test run earlier on that worker did not clear down properly. I took the opportunity to refresh all of these tests to ensure they all correctly tear down, and also use the more modern admin_request fixture ratehr than the old client one that required us to specify headers and do json parsing etc.	2022-05-03 14:23:59 +01:00
Leo Hemsted	43206b1656	Merge pull request #3532 from alphagov/utils-bump bump utils	2022-05-03 13:54:19 +01:00
Leo Hemsted	cc3035a101	bump utils mostly to get rid of the security warning on pypdf2	2022-05-03 12:26:38 +01:00
Leo Hemsted	7493daff07	Merge pull request #3530 from alphagov/fix update migration script numbers	2022-05-03 11:05:12 +01:00
Leo Hemsted	cc285fa41e	update migration script numbers	2022-05-03 10:54:25 +01:00
Leo Hemsted	503c5e1eea	Merge pull request #3527 from alphagov/fix-apr-rate add revision to fix apr rate	2022-05-03 10:48:19 +01:00
Ben Thorner	bb62d22f25	Merge pull request #3526 from alphagov/remove-reach Remove support for Reach provider	2022-04-29 13:21:27 +01:00
Ben Thorner	c27107fa74	Remove support for Reach provider This provider was never active and support was never completed, so there's little value in keeping all this potentially confusing code.	2022-04-29 12:28:08 +01:00
Ben Thorner	52b2982b92	Rename test_ft_billing... to match file under test This tripped me up several times when modifying the DAO functions and then trying to search for the test file associated with them.	2022-04-29 11:33:20 +01:00
Ben Thorner	b8cd99d9fa	Move usage API test data inline with tests This makes it easier to compare assertions against the test data, especially since each function was only used in one test.	2022-04-29 11:33:19 +01:00
Ben Thorner	efae436c4a	Speed up usage API tests with minimal test data This slowed me down when making changes to the APIs. As well as being unnecessary given the structural focus of these tests, I found the way the test data was generated was quite confusing. Before: time pytest tests/app/billing/test_rest.py ... pytest tests/app/billing/test_rest.py 4.16s user 0.43s system 55% cpu 8.355 total After: time pytest tests/app/billing/test_rest.py ... pytest tests/app/billing/test_rest.py 2.16s user 0.25s system 70% cpu 3.413 total	2022-04-29 11:33:16 +01:00
Ben Thorner	2fdec58496	Remove unused line in annual usage API test data	2022-04-29 10:57:24 +01:00
Ben Thorner	f978a4fe45	Remove redundant test data for annual usage API This test should be focussed on the structural properties of the API. We can leave more detailed testing of rate multipliers, etc. to lower-level DAO tests.	2022-04-29 10:57:23 +01:00
Ben Thorner	1fab73ef9e	Speed up usage DAO tests with minimal test data This slowed me down when making changes to the DAO functions. It's really not necessary to do 3 * 367 DB insertions for both tests. Before: time pytest tests/app/dao/test_ft_billing_dao.py -k for_year ... pytest tests/app/dao/test_ft_billing_dao.py -k for_year 3.95s user 0.40s system 62% cpu 6.971 total After: time pytest tests/app/dao/test_ft_billing_dao.py -k for_year ... pytest tests/app/dao/test_ft_billing_dao.py -k for_year 1.84s user 0.25s system 69% cpu 3.006 total	2022-04-29 10:56:51 +01:00
Ben Thorner	c3bc1d0df9	Remove redundant test data for usage DAOs This is unnecessary since we now have separate "_variable_rates" tests that check the behaviour for multiple rates explicitly for the two types of notifications it affects.	2022-04-29 10:22:09 +01:00
Leo Hemsted	6d47d97bc0	add revision to fix apr rate this will update about 80-90k rows in ft billing	2022-04-28 15:54:22 +01:00
Ben Thorner	66bc11bb65	Merge pull request #3520 from alphagov/free-allowance-api-181934027 Extend service usage APIs with costs and free allowance	2022-04-28 13:37:52 +01:00
Ben Thorner	ebaef4b57b	Add "charged_units" to service usage APIs This can be calculated from the "free_allowance_used" field and the "chargeable_units" field, but having it included separately is more convenient as it can be used directly in Admin [^1]. [^1]: `417e7370bb/app/templates/views/usage.html (L38-L39)`	2022-04-27 15:57:35 +01:00
Ben Thorner	555868c442	Add "free_allowance_units" to service usage APIs This represents the number of chargeable_units that were actually free due to the free allowance - they won't be included in "cost". Although the existing calculations in Admin [^1][^2] will still be correct with a change in SMS rates - it's cost that's the problem - it makes sense to have all the knowledge about calculating usage consistently in these two APIs. Note that the Integer casting is covered by the API-level tests in test_rest. [^1]: `474d7dfda8/app/main/views/dashboard.py (L490)` [^2]: `c63660d56d/app/main/views/dashboard.py (L350)`	2022-04-27 15:57:34 +01:00

1 2 3 4 5 ...

8922 Commits