Commit Graph

8922 Commits

Author SHA1 Message Date
Ben Thorner
8e837cf681 Use table class directly instead of "table" var
In response to [^1].

[^1]: https://github.com/alphagov/notifications-api/pull/3546#discussion_r879541366
2022-05-24 10:16:28 +01:00
Ben Thorner
33645c7747 Use notification view for status / billing tasks
This fixes a bug where (letter) notifications left in sending would
temporarily get excluded from billing and status calculations once
the service retention period had elapsed, and then get included once
again when they finally get marked as delivered.*

Status and billing tasks shouldn't need to have knowledge about which
table their data is in and getting this wrong is the fundamental cause
of the bug here. Adding a view across both tables abstracts this away
while keeping the query complexity the same.

Using a view also has the added benefit that we no longer need to care
when the status / billing tasks run in comparison to the deletion task,
since we will retrieve the same data irrespective (see below for a more
detailed discussion on data integrity).

*Such a scenario is rare but has happened.

A New View
==========

I've included all the columns that are shared between the two tables,
even though only a subset are actually needed. Having extra columns
has no impact and may be useful in future.

Although the view isn't actually a table, SQLAlchemy appears to wrap
it without any issues, noting that the package doesn't have any direct
support for "view models". Because we're never inserting data, we don't
need most of the kwargs when defining columns.*

*Note that the "default" kwarg doesn't affect data that's retrieved,
only data that's written (if no value is set).

Data Integrity
==============

The (new) tests cover the main scenarios.

We need to be careful with how the view interacts with the deletion /
archiving task. There are two concerns here:

- Duplicates. The deletion task inserts before it deletes [^1], so we
could end up double counting. It turns out this isn't a problem because
a Postgres UNION is an implicit "DISTINCT" [^2]. I've also verified this
manually, just to be on the safe side.

- No data. It's conceivable that the query will check the history table
just before the insertion, then check the notifications table just after
the deletion. It turns out this isn't a problem either because the whole
query sees the same DB snapshot [^3][^4].*

*I can't think of a way to test this as it's a race condition, but I'm
confident the Postgres docs are accurate.

Performance
===========

I copied the relevant (non-PII) columns from Production for data going
back to 2022-04-01. I then ran several tests.

Queries using the new view still make use of indices on a per-table basis,
as the following query plan illustrates:

                                                                                          QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       GroupAggregate  (cost=1130820.02..1135353.89 rows=46502 width=97) (actual time=629.863..756.703 rows=72 loops=1)
         Group Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international
         ->  Sort  (cost=1130820.02..1131401.28 rows=232506 width=85) (actual time=629.756..708.914 rows=217563 loops=1)
               Sort Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international
               Sort Method: external merge  Disk: 9320kB
               ->  Subquery Scan on notifications_all_time_view  (cost=1088506.43..1098969.20 rows=232506 width=85) (actual time=416.118..541.669 rows=217563 loops=1)
                     ->  Unique  (cost=1088506.43..1096644.14 rows=232506 width=725) (actual time=416.115..513.065 rows=217563 loops=1)
                           ->  Sort  (cost=1088506.43..1089087.70 rows=232506 width=725) (actual time=416.115..451.190 rows=217563 loops=1)
                                 Sort Key: notifications_no_pii.id, notifications_no_pii.job_id, notifications_no_pii.service_id, notifications_no_pii.template_id, notifications_no_pii.key_type, notifications_no_pii.billable_units, notifications_no_pii.notification_type, notifications_no_pii.created_at, notifications_no_pii.sent_by, notifications_no_pii.notification_status, notifications_no_pii.international, notifications_no_pii.rate_multiplier, notifications_no_pii.postage
                                 Sort Method: external merge  Disk: 23936kB
                                 ->  Append  (cost=114.42..918374.12 rows=232506 width=725) (actual time=2.051..298.229 rows=217563 loops=1)
                                       ->  Bitmap Heap Scan on notifications_no_pii  (cost=114.42..8557.55 rows=2042 width=113) (actual time=1.405..1.442 rows=0 loops=1)
                                             Recheck Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone))
                                             Filter: ((key_type)::text = ANY ('{normal,team}'::text[]))
                                             ->  Bitmap Index Scan on ix_notifications_no_piiservice_id_composite  (cost=0.00..113.91 rows=2202 width=0) (actual time=1.402..1.439 rows=0 loops=1)
                                                   Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone))
                                       ->  Index Scan using ix_notifications_history_no_pii_service_id_composite on notifications_history_no_pii  (cost=0.70..906328.97 rows=230464 width=113) (actual time=0.645..281.612 rows=217563 loops=1)
                                             Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND ((key_type)::text = ANY ('{normal,team}'::text[])) AND (notification_type = 'sms'::notification_type) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone))
                                             Filter: (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[]))
       Planning Time: 18.032 ms
       Execution Time: 759.001 ms
      (21 rows)

Queries using the new view appear to be slower than without, but the
differences I've seen are minimal: the original queries execute in
seconds locally and in Production, so it's not a big issue.

Notes: Performance
==================

I downloaded a minimal set of columns for testing:

      \copy (
        select
          id, notification_type, key_type, created_at, service_id,
          template_id, sent_by, rate_multiplier, international,
          billable_units, postage, job_id, notification_status
        from notifications
      ) to 'notifications.csv' delimiter ',' csv header;

      CREATE TABLE notifications_no_pii (
          id uuid NOT NULL,
          notification_type public.notification_type NOT NULL,
          key_type character varying(255) NOT NULL,
          created_at timestamp without time zone NOT NULL,
          service_id uuid,
          template_id uuid,
          sent_by character varying,
          rate_multiplier numeric,
          international boolean,
          billable_units integer NOT NULL,
          postage character varying,
          job_id uuid,
          notification_status text
      );

      copy notifications_no_pii	 from '/Users/ben.thorner/Desktop/notifications.csv' delimiter ',' csv header;

      CREATE INDEX ix_notifications_no_piicreated_at ON notifications_no_pii USING btree (created_at);
      CREATE INDEX ix_notifications_no_piijob_id ON notifications_no_pii USING btree (job_id);
      CREATE INDEX ix_notifications_no_piinotification_type_composite ON notifications_no_pii USING btree (notification_type, notification_status, created_at);
      CREATE INDEX ix_notifications_no_piiservice_created_at ON notifications_no_pii USING btree (service_id, created_at);
      CREATE INDEX ix_notifications_no_piiservice_id_composite ON notifications_no_pii USING btree (service_id, notification_type, notification_status, created_at);
      CREATE INDEX ix_notifications_no_piitemplate_id ON notifications_no_pii USING btree (template_id);

And similarly for the history table. I then created a sepatate view
across both of these temporary tables using just these columns.

To test performance I created some queries that reflect what is run
by the billing [^5] and status [^6] tasks e.g.

      explain analyze select template_id, sent_by, rate_multiplier, international, sum(billable_units), count(*)
      from notifications_all_time_view
      where
      notification_status in ('sending', 'sent', 'delivered', 'pending', 'temporary-failure', 'permanent-failure')
      and key_type in ('normal', 'team')
      and created_at >= '2022-05-01 23:00'
      and created_at < '2022-05-02 23:00'
      and notification_type = 'sms'
      and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'
      group by 1,2,3,4;

      explain analyze select template_id, job_id, key_type, notification_status, count(*)
      from notifications_all_time_view
      where created_at >= '2022-05-01 23:00'
      and created_at < '2022-05-02 23:00'
      and notification_type = 'sms'
      and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'
      and key_type in ('normal', 'team')
      group by 1,2,3,4;

Between running queries I restarted my local database and also ran
a command to purge disk caches [^7].

I tested on a few services:

- c5956607-20b1-48b4-8983-85d11404e61f on 2022-05-02 (high volume)
- 0cc696c6-b792-409d-99e9-64232f461b0f on 2022-04-06 (highest volume)
- 01135db6-7819-4121-8b97-4aa2d741e372 on 2022-04-14 (very low volume)

All execution results are of the same magnitude using the view compared
to the worst case of either table on its own.

[^1]: 00a04ebf54/app/dao/notifications_dao.py (L389)
[^2]: https://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all
[^3]: https://www.postgresql.org/docs/current/transaction-iso.html
[^4]: https://dba.stackexchange.com/questions/210485/can-sub-selects-change-in-one-single-query-in-a-read-committed-transaction
[^5]: 00a04ebf54/app/dao/fact_billing_dao.py (L471)
[^6]: 00a04ebf54/app/dao/fact_notification_status_dao.py (L58)
[^7]: https://stackoverflow.com/questions/28845524/echo-3-proc-sys-vm-drop-caches-on-mac-osx
2022-05-19 15:14:32 +01:00
Ben Thorner
648952eb38 Standardise tests for nightly billing sub-task 2022-05-19 13:40:00 +01:00
Ben Thorner
a9104a2ec3 Simplify test for status aggregation
We don't benefit from testing a third service and simplifying the
test will make it easier to incorporate anothere edge case later.
2022-05-19 13:39:46 +01:00
Ben Thorner
7e536d1c2b Merge pull request #3542 from alphagov/optimise-historic-billing-query-182116071
Optimise billing query for notification history
2022-05-18 10:27:43 +01:00
Ben Thorner
4a520bce78 Optimise billing query for notification history
This follows the same pattern as for status aggregations [^1]. We
haven't seen this problem for a long time because of [^2], but now
we're trying to re-run the aggregation for some incorrect rows it's
becoming apparent we need to fix it.

The following query currently fails in Production after the 30 min
SQLAlchemy timeout:

      select template_id, rate_multiplier, international, sum(billable_units), count(*)
      from notification_history
      where notification_status in ('delivered', 'sending')
      and key_type != 'test'
      and notification_type = 'sms'
      and service_id = '539d63a1-701d-400d-ab11-f3ee2319d4d4'
      and created_at >= '2021-07-07 23:00'
      and created_at < '2021-07-08 23:00'
      group by 1,2,3,4;

Running a quick "explain analyze" with this change applied returns
near immediately, but hangs without it. This is enough evidence for
me that this change will fix the issue.

[^1]: https://github.com/alphagov/notifications-api/pull/3417
[^2]: e5c76ffda7
2022-05-17 17:29:50 +01:00
Katie Smith
1d4512e34b Merge pull request #3540 from alphagov/pyup-scheduled-update-2022-05-11
Scheduled weekly dependency update for week 19
2022-05-17 11:26:52 +01:00
Katie Smith
fdee7b5ecb Update cachetools from 5.0.0 to 5.1.0 2022-05-17 10:53:06 +01:00
Katie Smith
2f3da76e81 Update SQLAlchemy from 1.4.35 to 1.4.36 2022-05-17 10:51:49 +01:00
Katie Smith
76e683eeed Update PyJWT from 2.3.0 to 2.4.0 2022-05-17 10:49:30 +01:00
Katie Smith
1d58f47dfe Update jsonschema from 4.4.0 to 4.5.1 2022-05-17 10:45:28 +01:00
Katie Smith
ddec6de3c4 Update flask from 2.1.1 to 2.1.2 2022-05-17 10:36:16 +01:00
Katie Smith
c50eae11c3 Upgrade all test requirements 2022-05-17 10:33:49 +01:00
Ben Thorner
e4a45047b3 Merge pull request #3538 from alphagov/fix-out-of-date-status-182116071
Fix out-of-date rows in ft_notification_status
2022-05-17 10:26:22 +01:00
Ben Thorner
318cc1284a Merge pull request #3534 from alphagov/remove-redundant-usage-fields-181935935
Remove redundant fields from service usage APIs
2022-05-11 15:12:31 +01:00
Ben Thorner
ed379a3724 Fix out-of-date rows in ft_notification_status
This can happen in the following scenario (primarily for letters):

1. A service has a mixture of "delivered" and "sending" letters,
which the status task aggregates into two rows:

  sending | 123
  delivered | 456

2. After the 7 day retention has passed, only the "delivered" letters
will be archived [^1].

3. The status task now looks at the history table [^2], which means
it only sees the "delivered" letters.

4. The "sending" letters are eventually "delivered" and archived (before
the 10 day aggregation cutoff).

5. But the status aggregation task doesn't run.

This commit fixes (5).

[^1]: https://github.com/alphagov/notifications-api/pull/3063
[^2]: f87ebb094d/app/dao/fact_notification_status_dao.py (L51)
2022-05-11 11:04:56 +01:00
David McDonald
177b860865 Merge pull request #3539 from alphagov/adjust-resting-rates
Adjust resting priorities for SMS providers
2022-05-11 09:48:24 +01:00
Ben Thorner
1d157836ad Remove redundant fields from service usage APIs
These are no longer used since [^1] and [^2].

[^1]: https://github.com/alphagov/notifications-admin/pull/4225
[^2]: https://github.com/alphagov/notifications-admin/pull/4229
2022-05-10 15:50:22 +01:00
Leo Hemsted
ce1cea90f0 Merge pull request #3535 from alphagov/notify-db-fixture-cleanup
Notify db fixture cleanup
2022-05-10 15:43:07 +01:00
David McDonald
f000e5a52b Adjust resting priorities for SMS providers 2022-05-10 15:07:25 +01:00
Katie Smith
f87ebb094d Merge pull request #3537 from alphagov/existing-marshmallow-changes
Fix existing marshmallow schemas and remove unused code
2022-05-10 10:29:38 +01:00
Katie Smith
dd213b8d55 Ensure UserUpdatePasswordSchema only loads _password
We had the `only` defined in the Meta class, and this wasn't working -
any extra fields were also being loaded. This moves it to the point
where the class is instantiated, which now works.
2022-05-06 10:14:50 +01:00
Katie Smith
685959de00 Don't serialize broadcast_messages in TemplateHistorySchema
In 0282a76 we excluded serializing `broadcast_messages` on some schemas.
This also excludes it from the TemplateHistory schema.
2022-05-06 10:14:50 +01:00
Katie Smith
ab199b6b05 Remove unused excluded fields from Marshmallow schemas
We have a lot of cases in the schemas where we're excluding a field that
doesn't actually exist on the schema anyway. This is often because a
model has been deleted, and the schema has not been updated. These
excluded fields have no effect at the moment, but Marshmallow 3 does
raise an error if you try and exclude non-existent fields.

There should be no change to what gets (de)serialized after this change.
2022-05-06 10:14:50 +01:00
Katie Smith
bd42bded0a Delete unused Marshmallow schema validation functions 2022-05-04 14:37:04 +01:00
Leo Hemsted
6b0d3860bd make notify_db fixture private 2022-05-04 11:37:05 +01:00
Leo Hemsted
6181c60f75 remove usage of notify_db fixture in unit tests
* notify_db fixture creates the database connection and ensures the test
  db exists and has migrations applied etc. It will run once per session
  (test run).
* notify_db_session fixture runs after your test finishes and deletes
  all non static (eg type table) data.

In unit tests that hit the database (ie: most of them), 99% of the time
we will need to use notify_db_session to ensure everything is reset. The
only time we don't need to use it is when we're querying things such as
"ensure get X works when database is empty". This is such a low
percentage of tests that it's easier for us to just use
notify_db_session every time, and ensure that all our tests run much
more consistently, at the cost of a small bit of performance when
running tests.

We used to use notify_db to access the session object for manually
adding, committing, etc. To dissuade usage of that fixture I've moved
that to the `notify_db_session`. I've then removed all uses of notify_db
that I could find in the codebase.

As a note, if you're writing a test that uses a `sample_x` fixture, all
of those fixtures rely on notify_db_session so you'll get the teardown
functionality for free. If you're just calling eg `create_x` db.py
functions, then you'll need to make you add notify_db_session fixture to
your test, even if you aren't manually accessing the session.
2022-05-04 11:36:54 +01:00
Ben Thorner
867e8fbce3 Merge pull request #3528 from alphagov/tidy-up-usage-tests-181934027
Minor refactorings to usage API tests
2022-05-04 10:31:09 +01:00
Leo Hemsted
d8f14abde6 Merge pull request #3533 from alphagov/remove-provider-rates
remove provider_rates table
2022-05-04 10:29:25 +01:00
Leo Hemsted
201367ec33 Merge pull request #3531 from alphagov/cleanup-user-tests
clean up user tests
2022-05-04 10:29:19 +01:00
Leo Hemsted
51646af92e remove provider_rates table
this was added five years ago but never used. if we want to bring back
variable rates per client we might as well get a fresh start since a lot
has changed since then.
2022-05-03 14:42:59 +01:00
Leo Hemsted
091d255d56 clean up user tests
an API build failed because one of the tests expected the database to be
empty, but it actually wasn't. This is probably because another test run
earlier on that worker did not clear down properly.

I took the opportunity to refresh all of these tests to ensure they all
correctly tear down, and also use the more modern admin_request fixture
ratehr than the old client one that required us to specify headers and
do json parsing etc.
2022-05-03 14:23:59 +01:00
Leo Hemsted
43206b1656 Merge pull request #3532 from alphagov/utils-bump
bump utils
2022-05-03 13:54:19 +01:00
Leo Hemsted
cc3035a101 bump utils
mostly to get rid of the security warning on pypdf2
2022-05-03 12:26:38 +01:00
Leo Hemsted
7493daff07 Merge pull request #3530 from alphagov/fix
update migration script numbers
2022-05-03 11:05:12 +01:00
Leo Hemsted
cc285fa41e update migration script numbers 2022-05-03 10:54:25 +01:00
Leo Hemsted
503c5e1eea Merge pull request #3527 from alphagov/fix-apr-rate
add revision to fix apr rate
2022-05-03 10:48:19 +01:00
Ben Thorner
bb62d22f25 Merge pull request #3526 from alphagov/remove-reach
Remove support for Reach provider
2022-04-29 13:21:27 +01:00
Ben Thorner
c27107fa74 Remove support for Reach provider
This provider was never active and support was never completed, so
there's little value in keeping all this potentially confusing code.
2022-04-29 12:28:08 +01:00
Ben Thorner
52b2982b92 Rename test_ft_billing... to match file under test
This tripped me up several times when modifying the DAO functions
and then trying to search for the test file associated with them.
2022-04-29 11:33:20 +01:00
Ben Thorner
b8cd99d9fa Move usage API test data inline with tests
This makes it easier to compare assertions against the test data,
especially since each function was only used in one test.
2022-04-29 11:33:19 +01:00
Ben Thorner
efae436c4a Speed up usage API tests with minimal test data
This slowed me down when making changes to the APIs. As well as
being unnecessary given the structural focus of these tests, I
found the way the test data was generated was quite confusing.

Before:

    time pytest tests/app/billing/test_rest.py
    ...
    pytest tests/app/billing/test_rest.py  4.16s user 0.43s system 55% cpu 8.355 total

After:

    time pytest tests/app/billing/test_rest.py
    ...
    pytest tests/app/billing/test_rest.py  2.16s user 0.25s system 70% cpu 3.413 total
2022-04-29 11:33:16 +01:00
Ben Thorner
2fdec58496 Remove unused line in annual usage API test data 2022-04-29 10:57:24 +01:00
Ben Thorner
f978a4fe45 Remove redundant test data for annual usage API
This test should be focussed on the structural properties of the
API. We can leave more detailed testing of rate multipliers, etc.
to lower-level DAO tests.
2022-04-29 10:57:23 +01:00
Ben Thorner
1fab73ef9e Speed up usage DAO tests with minimal test data
This slowed me down when making changes to the DAO functions. It's
really not necessary to do 3 * 367 DB insertions for both tests.

Before:

    time pytest tests/app/dao/test_ft_billing_dao.py -k for_year
    ...
    pytest tests/app/dao/test_ft_billing_dao.py -k for_year  3.95s user 0.40s system 62% cpu 6.971 total

After:

    time pytest tests/app/dao/test_ft_billing_dao.py -k for_year
    ...
    pytest tests/app/dao/test_ft_billing_dao.py -k for_year  1.84s user 0.25s system 69% cpu 3.006 total
2022-04-29 10:56:51 +01:00
Ben Thorner
c3bc1d0df9 Remove redundant test data for usage DAOs
This is unnecessary since we now have separate "_variable_rates"
tests that check the behaviour for multiple rates explicitly for
the two types of notifications it affects.
2022-04-29 10:22:09 +01:00
Leo Hemsted
6d47d97bc0 add revision to fix apr rate
this will update about 80-90k rows in ft billing
2022-04-28 15:54:22 +01:00
Ben Thorner
66bc11bb65 Merge pull request #3520 from alphagov/free-allowance-api-181934027
Extend service usage APIs with costs and free allowance
2022-04-28 13:37:52 +01:00
Ben Thorner
ebaef4b57b Add "charged_units" to service usage APIs
This can be calculated from the "free_allowance_used" field and the
"chargeable_units" field, but having it included separately is more
convenient as it can be used directly in Admin [^1].

[^1]: 417e7370bb/app/templates/views/usage.html (L38-L39)
2022-04-27 15:57:35 +01:00
Ben Thorner
555868c442 Add "free_allowance_units" to service usage APIs
This represents the number of chargeable_units that were actually
free due to the free allowance - they won't be included in "cost".

Although the existing calculations in Admin [^1][^2] will still be
correct with a change in SMS rates - it's cost that's the problem
- it makes sense to have all the knowledge about calculating usage
consistently in these two APIs.

Note that the Integer casting is covered by the API-level tests in
test_rest.

[^1]: 474d7dfda8/app/main/views/dashboard.py (L490)
[^2]: c63660d56d/app/main/views/dashboard.py (L350)
2022-04-27 15:57:34 +01:00