Commit Graph

52 Commits

Author SHA1 Message Date
stvnrlly
57f4df8ed1 remove broadcast-related code, except migrations 2022-10-04 15:28:27 +00:00
Ben Thorner
33645c7747 Use notification view for status / billing tasks
This fixes a bug where (letter) notifications left in sending would
temporarily get excluded from billing and status calculations once
the service retention period had elapsed, and then get included once
again when they finally get marked as delivered.*

Status and billing tasks shouldn't need to have knowledge about which
table their data is in and getting this wrong is the fundamental cause
of the bug here. Adding a view across both tables abstracts this away
while keeping the query complexity the same.

Using a view also has the added benefit that we no longer need to care
when the status / billing tasks run in comparison to the deletion task,
since we will retrieve the same data irrespective (see below for a more
detailed discussion on data integrity).

*Such a scenario is rare but has happened.

A New View
==========

I've included all the columns that are shared between the two tables,
even though only a subset are actually needed. Having extra columns
has no impact and may be useful in future.

Although the view isn't actually a table, SQLAlchemy appears to wrap
it without any issues, noting that the package doesn't have any direct
support for "view models". Because we're never inserting data, we don't
need most of the kwargs when defining columns.*

*Note that the "default" kwarg doesn't affect data that's retrieved,
only data that's written (if no value is set).

Data Integrity
==============

The (new) tests cover the main scenarios.

We need to be careful with how the view interacts with the deletion /
archiving task. There are two concerns here:

- Duplicates. The deletion task inserts before it deletes [^1], so we
could end up double counting. It turns out this isn't a problem because
a Postgres UNION is an implicit "DISTINCT" [^2]. I've also verified this
manually, just to be on the safe side.

- No data. It's conceivable that the query will check the history table
just before the insertion, then check the notifications table just after
the deletion. It turns out this isn't a problem either because the whole
query sees the same DB snapshot [^3][^4].*

*I can't think of a way to test this as it's a race condition, but I'm
confident the Postgres docs are accurate.

Performance
===========

I copied the relevant (non-PII) columns from Production for data going
back to 2022-04-01. I then ran several tests.

Queries using the new view still make use of indices on a per-table basis,
as the following query plan illustrates:

                                                                                          QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       GroupAggregate  (cost=1130820.02..1135353.89 rows=46502 width=97) (actual time=629.863..756.703 rows=72 loops=1)
         Group Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international
         ->  Sort  (cost=1130820.02..1131401.28 rows=232506 width=85) (actual time=629.756..708.914 rows=217563 loops=1)
               Sort Key: notifications_all_time_view.template_id, notifications_all_time_view.sent_by, notifications_all_time_view.rate_multiplier, notifications_all_time_view.international
               Sort Method: external merge  Disk: 9320kB
               ->  Subquery Scan on notifications_all_time_view  (cost=1088506.43..1098969.20 rows=232506 width=85) (actual time=416.118..541.669 rows=217563 loops=1)
                     ->  Unique  (cost=1088506.43..1096644.14 rows=232506 width=725) (actual time=416.115..513.065 rows=217563 loops=1)
                           ->  Sort  (cost=1088506.43..1089087.70 rows=232506 width=725) (actual time=416.115..451.190 rows=217563 loops=1)
                                 Sort Key: notifications_no_pii.id, notifications_no_pii.job_id, notifications_no_pii.service_id, notifications_no_pii.template_id, notifications_no_pii.key_type, notifications_no_pii.billable_units, notifications_no_pii.notification_type, notifications_no_pii.created_at, notifications_no_pii.sent_by, notifications_no_pii.notification_status, notifications_no_pii.international, notifications_no_pii.rate_multiplier, notifications_no_pii.postage
                                 Sort Method: external merge  Disk: 23936kB
                                 ->  Append  (cost=114.42..918374.12 rows=232506 width=725) (actual time=2.051..298.229 rows=217563 loops=1)
                                       ->  Bitmap Heap Scan on notifications_no_pii  (cost=114.42..8557.55 rows=2042 width=113) (actual time=1.405..1.442 rows=0 loops=1)
                                             Recheck Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone))
                                             Filter: ((key_type)::text = ANY ('{normal,team}'::text[]))
                                             ->  Bitmap Index Scan on ix_notifications_no_piiservice_id_composite  (cost=0.00..113.91 rows=2202 width=0) (actual time=1.402..1.439 rows=0 loops=1)
                                                   Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND (notification_type = 'sms'::notification_type) AND (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[])) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone))
                                       ->  Index Scan using ix_notifications_history_no_pii_service_id_composite on notifications_history_no_pii  (cost=0.70..906328.97 rows=230464 width=113) (actual time=0.645..281.612 rows=217563 loops=1)
                                             Index Cond: ((service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'::uuid) AND ((key_type)::text = ANY ('{normal,team}'::text[])) AND (notification_type = 'sms'::notification_type) AND (created_at >= '2022-05-01 23:00:00'::timestamp without time zone) AND (created_at < '2022-05-02 23:00:00'::timestamp without time zone))
                                             Filter: (notification_status = ANY ('{sending,sent,delivered,pending,temporary-failure,permanent-failure}'::text[]))
       Planning Time: 18.032 ms
       Execution Time: 759.001 ms
      (21 rows)

Queries using the new view appear to be slower than without, but the
differences I've seen are minimal: the original queries execute in
seconds locally and in Production, so it's not a big issue.

Notes: Performance
==================

I downloaded a minimal set of columns for testing:

      \copy (
        select
          id, notification_type, key_type, created_at, service_id,
          template_id, sent_by, rate_multiplier, international,
          billable_units, postage, job_id, notification_status
        from notifications
      ) to 'notifications.csv' delimiter ',' csv header;

      CREATE TABLE notifications_no_pii (
          id uuid NOT NULL,
          notification_type public.notification_type NOT NULL,
          key_type character varying(255) NOT NULL,
          created_at timestamp without time zone NOT NULL,
          service_id uuid,
          template_id uuid,
          sent_by character varying,
          rate_multiplier numeric,
          international boolean,
          billable_units integer NOT NULL,
          postage character varying,
          job_id uuid,
          notification_status text
      );

      copy notifications_no_pii	 from '/Users/ben.thorner/Desktop/notifications.csv' delimiter ',' csv header;

      CREATE INDEX ix_notifications_no_piicreated_at ON notifications_no_pii USING btree (created_at);
      CREATE INDEX ix_notifications_no_piijob_id ON notifications_no_pii USING btree (job_id);
      CREATE INDEX ix_notifications_no_piinotification_type_composite ON notifications_no_pii USING btree (notification_type, notification_status, created_at);
      CREATE INDEX ix_notifications_no_piiservice_created_at ON notifications_no_pii USING btree (service_id, created_at);
      CREATE INDEX ix_notifications_no_piiservice_id_composite ON notifications_no_pii USING btree (service_id, notification_type, notification_status, created_at);
      CREATE INDEX ix_notifications_no_piitemplate_id ON notifications_no_pii USING btree (template_id);

And similarly for the history table. I then created a sepatate view
across both of these temporary tables using just these columns.

To test performance I created some queries that reflect what is run
by the billing [^5] and status [^6] tasks e.g.

      explain analyze select template_id, sent_by, rate_multiplier, international, sum(billable_units), count(*)
      from notifications_all_time_view
      where
      notification_status in ('sending', 'sent', 'delivered', 'pending', 'temporary-failure', 'permanent-failure')
      and key_type in ('normal', 'team')
      and created_at >= '2022-05-01 23:00'
      and created_at < '2022-05-02 23:00'
      and notification_type = 'sms'
      and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'
      group by 1,2,3,4;

      explain analyze select template_id, job_id, key_type, notification_status, count(*)
      from notifications_all_time_view
      where created_at >= '2022-05-01 23:00'
      and created_at < '2022-05-02 23:00'
      and notification_type = 'sms'
      and service_id = 'c5956607-20b1-48b4-8983-85d11404e61f'
      and key_type in ('normal', 'team')
      group by 1,2,3,4;

Between running queries I restarted my local database and also ran
a command to purge disk caches [^7].

I tested on a few services:

- c5956607-20b1-48b4-8983-85d11404e61f on 2022-05-02 (high volume)
- 0cc696c6-b792-409d-99e9-64232f461b0f on 2022-04-06 (highest volume)
- 01135db6-7819-4121-8b97-4aa2d741e372 on 2022-04-14 (very low volume)

All execution results are of the same magnitude using the view compared
to the worst case of either table on its own.

[^1]: 00a04ebf54/app/dao/notifications_dao.py (L389)
[^2]: https://stackoverflow.com/questions/49925/what-is-the-difference-between-union-and-union-all
[^3]: https://www.postgresql.org/docs/current/transaction-iso.html
[^4]: https://dba.stackexchange.com/questions/210485/can-sub-selects-change-in-one-single-query-in-a-read-committed-transaction
[^5]: 00a04ebf54/app/dao/fact_billing_dao.py (L471)
[^6]: 00a04ebf54/app/dao/fact_notification_status_dao.py (L58)
[^7]: https://stackoverflow.com/questions/28845524/echo-3-proc-sys-vm-drop-caches-on-mac-osx
2022-05-19 15:14:32 +01:00
Ben Thorner
6e8f121548 Standardise how we query midnight-to-midnight
Partially addresses [1] (lots more detail to read in the comment).
I've also added some tests for the status DAO function to confirm
it behaves as expected across timezones.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r802634913
2022-02-10 10:51:27 +00:00
David McDonald
ec6ed3958c Move get_prev_next_pagination_links to utils
This will mean it can later be reused whereever we want
2021-12-10 12:26:57 +00:00
Pea Tyczynska
52c529ab3a Use personalisation to set client_reference for letters
which were sent through Notify interface only. This is done
to avoid performance dip from additional operation for
other notification types.
2021-03-24 14:55:10 +00:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Katie Smith
6b8ebb3421 Fix linting errors 2021-02-16 09:03:38 +00:00
Chris Hill-Scott
78e87857e3 Don’t serialize nullable UUID columns to 'None'
We should return a proper `None` instead, so it gets JSONified as `null`
and returns what you’d expect when doing `bool(model.field)`
2021-01-15 13:15:00 +00:00
Pea Tyczynska
95deb5a52f Move DATETIME_FORMAT from app to app.utils
To avoid cyclical import issues
2020-12-18 17:39:35 +00:00
Pea Tyczynska
a186d2d296 Format sequential number into an 8 char long hex
As per Vodafone spec for ibag format message number
2020-12-07 13:13:11 +00:00
Leo Hemsted
3dd15841a5 create get_dt_string_or_none string
mostly to quash flakle8 warnings
2020-07-28 12:10:18 +01:00
Leo Hemsted
7ecd7341b0 add broadcast to template_types and add broadcast_data
had to go through the code and change a few places where we filter on
template types. i specifically didn't worry about jobs or notifications.

Also, add braodcast_data - a json column that might contain arbitrary
broadcast data that we'll figure out as we go. We don't know what it'll
look like, but it should be returned by the API
2020-07-06 15:47:13 +01:00
Katie Smith
13f7fecd5b Move function to get archived email address value
This function will be used when archiving services too, so it has been
renamed and moved to `app/utils.py`.
2020-05-22 09:36:07 +01:00
Chris Hill-Scott
5ddb5a75da Use new properties of utils Templates
We’ve added some new properties to the templates in utils that we can
use instead of doing weird things like
`WithSubjectTemplate.__str__(another_instance)`
2020-04-15 16:40:42 +01:00
Leo Hemsted
d457db4164 make has_delete_task_run non-optional
just to ensure people think about the value of it when using the function
2019-12-03 14:19:14 +00:00
Leo Hemsted
913cf5e12d work out which table to get notification status data from
previously we checked notifications table, and if the results were
zero, checked the notification history table to see if there's data
in there. When we know that data isn't in notifications, we're still
checking. These queries take half a second per service, and we're
doing at least ten for each of the five thousand services we have in
notify. Most of these services have no data in either table for any
given day, and we can reduce the amount of queries we do by only
checking one table.

Check the data retention for a service, and then if the date is older
than the retention, get from history table.

NOTE: This requires that the delete tasks haven't run yet for the day!
If your retention is three days, this will look in the Notification
table for data from three days ago - expecting that shortly after the
task finishes, we'll delete that data.
2019-11-29 15:27:56 +00:00
Rebecca Law
6ccb242107 Merge pull request #2458 from alphagov/remove-unused-method
Remove unused method.
2019-04-15 12:11:00 +01:00
Rebecca Law
1c68e0f565 Remove unused method.
last_n_days was only being used in a test.
2019-04-12 10:26:46 +01:00
Chris Hill-Scott
5a7de22f55 Set default branding for NHS services
The NHS is a special case because it’s not one organisation, but it does
have one consistent brand. So anyone working for an NHS organisation
should have their default branding set when they create a service, even
if we know nothing about their specific organisation.
2019-04-08 10:27:26 +01:00
Chris Hill-Scott
f185dbecbe Return rendered HTML when previewing a template
If you’re trying to show what a Notify email will look like in your
caseworking system all the API gives you at the moment is raw markdown
(with the placeholders replaced).

This isn’t that useful if your caseworkers have no idea what markdown
is. If we also give teams the HTML then they can embed this in their
systems, and the people using those systems will be able to see how
headings, bulleted lists, etc. look.
2019-02-07 17:43:46 +00:00
Pea Tyczynska
ac3832a918 Remove old redis template cache 2019-01-15 14:46:40 +00:00
Pea Tyczynska
d36c4d8a78 Remove now unused methods that populated template usage redis cache 2019-01-15 14:38:45 +00:00
Katie Smith
ff06d120e8 Bump notifications-utils to 3.7.0
Bumped notifications-utils to 3.7.0. Version 3.7.0 includes the
`convert_utc_to_bst` and `convert_bst_to_utc` functions and the
`LETTER_PROCESSING_DEADLINE` constant, so these have been removed from
this repo and anywhere using these has now been updated to get these
from `notifications-utils`.

Also bumped pytest by a patch version to bring in a bug fix.
2018-11-26 12:53:39 +00:00
Rebecca Law
d0cbdce6c4 Update the error message when a service does not have permission to send a notification type. 2018-11-20 11:01:48 +00:00
Leo Hemsted
267c4fc07b bump requirements, fix pyflake8 things, unpin botocore/awscli 2018-11-07 13:39:08 +00:00
Pea Tyczynska
a69dee5e6d Move code that escapes special chars to helper function and use it
in query get_users_by_partial_email
2018-07-13 15:47:21 +01:00
Leo Hemsted
0efa223fb2 rename days_ago to midnight_n_days_ago
also add some more timezone boundary tests and minor code cleanup
2018-04-30 11:50:56 +01:00
Leo Hemsted
85fd7c3869 add new tests for template statistics 2018-04-30 11:13:21 +01:00
Leo Hemsted
9e8b6fd00d refactor template stats endpoint to read from new redis keys
New redis keys are partitioned per service per day. New process is as
follows:

* require a count of days to filter by. Currently admin always gives 7.
* for each day, check and see if there's anything in redis. There won't
  be if either a) redis is/was down or b) the service didn't send any
  notifications that day
  - if there isn't, go to the database and get a count out.
* combine all these stats together
* get the names/template types etc out of the DB at the end.
2018-04-30 11:13:21 +01:00
Leo Hemsted
5e702449cb move days_ago to utils and make it tz aware
it's used in a few places - it should definitely know what timezones
are and return datetimes rather than dates, which are hard to work with
in terms of figuring out how tz aware they are.
2018-04-30 11:13:21 +01:00
Chris Hill-Scott
c9882e2f9c Bump utils to improve plain text email formatting
Brings in:
- [x] https://github.com/alphagov/notifications-utils/pull/438
- [x] https://github.com/alphagov/notifications-utils/pull/450
- [x] https://github.com/alphagov/notifications-utils/pull/454

Changes:
- https://github.com/alphagov/notifications-utils/compare/25.3.0...26.2.0
2018-04-10 11:14:48 +01:00
Leo Hemsted
6e554188bd add command to backfill template usage
The command takes a service id and a day, grabs the historical data for
that day (potentially out of notification_history), and pops it in
redis (for eight days, same as if it were written to manually).

also, prefix template usage key with "service" to make clear that it's
a service id, and not an individual template id.
2018-04-03 16:14:47 +01:00
Leo Hemsted
8e73961f65 add new redis template usage per day key
We've run into issues with redis expiring keys while we try and write
to them - short lived redis TTLs aren't really sustainable for keys
where we mutate the state. Template usage is a hash contained in redis
where we increment a count keyed by template_id each time a message is
sent for that template. But if the key expires, hincrby (redis command
for incrementing a value in a hash) will re-create an empty hash.

This is no good, as we need the hash to be populated with the last
seven days worth of data, which we then increment further. We can't
tell whether the hincrby created the key, so a different approach
entirely was needed:

* New redis key: <service_id>-template-usage-<YYYY-MM-DD>. Note: This
  YYYY-MM-DD is BTC time so it lines up nicely with ft_billing table
* Incremented to from process_notification - if it doesn't exist yet,
  it'll be created then.
* Expiry set to 8 days every time it's incremented to.

Then, at read time, we'll just read the last eight days of keys from
Redis, and sum them up. This works because we're only ever incrementing
from that one place - never setting wholesale, never recreating the
data from scratch. So we know that if the data is in redis, then it is
good and accurate data.

One thing we *don't* know and *cannot* reason about is what no key in
redis means. It could be either of:

* This is the first message that the service has sent today.
* The key was deleted from redis for some reason.

Since we set the TTL to so long, we'll never be writing to a key that
previously expired. But if there is a redis (or operator) error and the
key is deleted, then we'll have bad data - after any data loss we'll
have to rebuild the data.
2018-04-03 16:12:54 +01:00
Chris Hill-Scott
c0e2a478f6 Allow admin to specify domain for email auth links
Similar to https://github.com/alphagov/notifications-api/pull/1515

This lets the admin app pass in a domain to use for email auth links,
so that when it’s running on a different URL users who try to sign in
will get an email auth link for the domain they sign in on, not the
default admin domain for the environment in which the API is running.
2018-02-09 14:19:17 +00:00
Rebecca Law
19f964a90b Added a check that the call is not using a test api key.
Removed the tests for trial mode service for the scheduled tasks and the process job.
Having the validation in the POST notification and create job endpoint is enough.
Updated the test_service_whitelist test because the order of the array is not gaurenteed.
2017-09-04 17:24:41 +01:00
Ken Tsang
d391919677 Refactored to check trial when running scheduled job 2017-08-30 22:30:05 +01:00
Imdad Ahad
19b09f2a27 Rename method to convert from utc to bst for consistency 2017-08-11 16:56:46 +01:00
Leo Hemsted
2ab105aaf4 add tests for letter api notifications 2017-07-27 16:43:55 +01:00
Ken Tsang
0b3277b8a4 Refactored to make code clearer 2017-07-06 12:27:57 +01:00
Rebecca Law
973cc2c4c9 Changed the scheduled_for datetime to only send and hour of a day to send.
Also expect the date being passed in is BST. The date is converted to UTC before saving. And converted to BST when returning a notification.
2017-05-17 15:06:15 +01:00
Ken Tsang
eef8aceac5 Refactored get utc in bst to use localize 2017-04-03 16:20:54 +01:00
Ken Tsang
2b53e14a73 Add fix for bst 2017-04-03 15:49:23 +01:00
Ken Tsang
ecbe87a0d6 Added letter preview 2017-03-22 14:22:26 +00:00
Rebecca Law
53b7ad0961 Moved the cache key to the utils module.
Renamed the dao method.
2017-02-14 14:22:52 +00:00
Chris Hill-Scott
45b505f379 Split out handling of BST queries for reuse 2017-01-31 14:16:35 +00:00
Imdad Ahad
cc7bf45766 Move notification retrieval logic into perf platform client and dont filter by status 2017-01-27 16:39:01 +00:00
Rebecca Law
c87d0a37f3 Refactor the get_midnight functions to return the date in UTC but for the localised time.
So in June when we are in BST June 16, 00:00 BST => June 15 23:00 UTC
2017-01-27 15:57:25 +00:00
Imdad Ahad
781bef557a Add date helper methods to be used as date range filters for notification counts 2017-01-27 12:21:28 +00:00
Chris Hill-Scott
59af44d7ab Update utils to 12.0.0
Includes:

- [x] https://github.com/alphagov/notifications-utils/pull/94 (breaking
      changes which are responsible for all the changes to the API in
      this PR)

The test for `get_sms_fragment_count` has been removed because this
method is already tested in utils here:

ac20f7e99e/tests/test_base_template.py (L140-L159)
2016-12-13 10:57:01 +00:00
Rebecca Law
9ffdf66c49 Rename the endpoints.
Increase test coverage to include the encrypted message sent to the task.
2016-10-13 11:59:47 +01:00