Commit Graph

1536 Commits

Author SHA1 Message Date
Ben Thorner
ee4da698fe Standardise timezones for service usage APIs
We want to query for service usage in the BST financial year:

    2022-04-01T00:00:00+01:00 to 2023-03-31T23:59:59+01:00 =>
    2022-04-01 to 2023-03-31  # bst_date

Previously we were only doing this explicitly for the monthly API
and it seemed like the yearly usage API was incorrectly querying:

    2022-03-31T23:00:00+00:00 to 2023-03-30T23:00:00+00:00 =>
    2022-03-31 to 2023-03-30  # "bst_date"

However, it turns out this isn't a problem for two reasons:

1. We've been lucky that none of our rates have changed since 2017,
which is long ago enough that no one would care.

2. There's a quirk somewhere in Sqlalchemy / Postgres that has been
compensating for the lack of explicit BST conversion.

To help ensure we do this consistently in future I've DRYed-up the
BST conversion into a new utility. I could have just hard-coded the
dates but it seemed strange to have the knowledge twice.

I've also adjusted the tests so they detect if we accidentally use
data from a different financial year. (2) is why none of the test
assertions actually need changing and users won't be affected.

Sqlalchemy / Postgres quirk
===========================

The following queries were run on the same data but results differ:

    FactBilling.query.filter(FactBilling.bst_date >= datetime(2021,3,31,23,0), FactBilling.bst_date <= '2021-04-05').order_by(FactBilling.bst_date).first().bst_date
    datetime.date(2021, 4, 1)

    FactBilling.query.filter(FactBilling.bst_date >= '2021-03-31 23:00:00', FactBilling.bst_date <= '2021-04-05').order_by(FactBilling.bst_date).first().bst_date
    datetime.date(2021, 3, 31)

Looking at the actual query for the first item above still suggests
the results should be the same, but for the use of "timestamp".

    SELECT ...
    FROM ft_billing
    WHERE ft_billing.service_id = '16b60315-9dab-45d3-a609-e871fbbf5345'::uuid AND ft_billing.bst_date >= '2016-03-31T23:00:00'::timestamp AND ft_billing.bst_date <= '2017-03-31T22:59:59.999999'::timestamp AND ft_billing.notification_type IN ('email', 'letter') GROUP BY ft_billing.rate, ft_billing.notification_type UNION ALL SELECT sum(ft_billing.notifications_sent) AS notifications_sent, sum(ft_billing.billable_units * ft_billing.rate_multiplier) AS billable_units, ft_billing.rate AS ft_billing_rate, ft_billing.notification_type AS ft_billing_notification_type
    FROM ft_billing
    WHERE ft_billing.service_id = '16b60315-9dab-45d3-a609-e871fbbf5345'::uuid AND ft_billing.bst_date >= '2016-03-31T23:00:00'::timestamp AND ft_billing.bst_date <= '2017-03-31T22:59:59.999999'::timestamp AND ft_billing.notification_type = 'sms' GROUP BY ft_billing.rate, ft_billing.notification_type) AS anon_1 ORDER BY anon_1.notification_type, anon_1.rate

If we try some manual queries with and without '::timestamp' we get:

    select distinct(bst_date) from ft_billing where bst_date >= '2022-04-20T23:00:00' order by bst_date desc;
      bst_date
    ------------
     2022-04-21
     2022-04-20

    select distinct(bst_date) from ft_billing where bst_date >= '2022-04-20T23:00:00'::timestamp order by bst_date desc;
      bst_date
    ------------
     2022-04-21
     2022-04-20

It looks like this is happening because all client connections are
aware of the local timezone, and naive datetimes are interpreted as
being in UTC - not necessarily true, but saves us here!

The monthly API datetimes were pre-converted to dates, so none of
this was relevant for deciding exactly which date to use.
2022-04-26 13:11:34 +01:00
Leo Hemsted
259d4a0569 add new daily sms provider volume report
code generally lifted almost exactly from the daily_volumes_report, but
per provider and only for SMS.
2022-04-11 13:42:40 +01:00
Ben Thorner
13b01579c8 Limit provider history to 100 results
This stops the pages being slow to load and defers the need to add
any pagination. Only the most recent data is likely to be relevant.
2022-04-06 11:53:01 +01:00
Ben Thorner
b145a29935 Merge pull request #3488 from alphagov/fix-provider-adjustment-bug-181574489
Fix SMS priority adjustment if only 1 provider
2022-03-22 16:21:53 +00:00
Ben Thorner
7bffe9ee50 Log Service ID when we get a duplicate receipt
This will make it easier to group these logs if a service complains
about the issue.
2022-03-21 15:43:08 +00:00
Ben Thorner
b4d4133b1f Fix SMS priority adjustment if only 1 provider
Fixes:

    >       reduced_provider = providers[identifier]
    E       KeyError: 'firetext'

Note that the mock return value in the other test was wrong [^1].

[^1]: bff97f0bbe/app/dao/provider_details_dao.py (L73)
2022-03-17 17:09:13 +00:00
Rebecca Law
8402e7c97b Free allowance rates for 2022
Update the map for determine the free allowance (aka: free sms fragments) for services.
2022-03-15 12:35:39 +00:00
Rebecca Law
29ebf0eb9a Move the casting the column as an int to the the endpoint, we need to
convert the Decimal data type to datatype json can work with.
2022-03-09 12:29:58 +00:00
Rebecca Law
466b7fa341 Report for total notifications sent per day for each channel.
Daily volumes report: total volumes across the platform aggregated by whole business day (bst_date)
Volumes by service report: total volumes per service aggregated by the date range given.

NB: start and end dates are inclusive
2022-03-07 10:44:49 +00:00
Katie Smith
807db037eb Update flake8 from 3.8.4 to 4.0.1
And adds `noqa` to some non-errors which are being flagged.
2022-03-02 15:52:18 +00:00
Ben Thorner
d406829b6c Tweak SMS alert to make it worth the effort
Currently we alert if a service wastes £16 of SMS. It may cost us
around that amount just to deal with the alert, especially if the
service refuses to clean up their data.

This bumps the threshold to something more alarming, which should
make it more reasonable to suspend the service if we can show that
they've already wasted public money. £160 seems like a reasonable
compromise between have wasted vs could waste.

Note: we previously compromised on 1000 [1] down from 63K [2]. I
think we can afford to go a little bit higher.

[1]: https://github.com/alphagov/notifications-api/pull/3234
[2]: https://github.com/alphagov/notifications-api/pull/3221
2022-02-25 10:37:56 +00:00
Chris Hill-Scott
4b4122a773 Merge pull request #3461 from alphagov/be-more-robust-around-references-to-cancel
Be more robust in handling ambiguous references to cancel an alert
2022-02-21 10:39:48 +00:00
Chris Hill-Scott
f691bc2a92 Only lookup broadcasts which can be cancelled
It is possible that, among the references Environment Agency give us for
which broadcast to cancel, there could be references for older, already
expired broadcasts.

This would be the case if someone cancelled a broadcast in Notify, then
issued and try to re-cancel another broadcast to the same area. The
Flood Warning Service has no way of knowing that the first broadcast has
been cancelled in Notify already, so it would add the reference to the
list of things to be cancelled.

We can avoid this from happening by filtering-out already-cancelled and
expired broadcasts before looking up which one should be cancelled.
2022-02-17 15:23:13 +00:00
Ben Thorner
574b1d3c63 Fix not deleting dead rows in status table
To address: https://github.com/alphagov/notifications-api/pull/3454#pullrequestreview-880302729

Since we now delete all rows before inserting fresh ones, we no
longer need to worry about conflicts. I've also extended the old
test to check all three kinds of overwrite: new, changed, gone.
2022-02-16 13:40:08 +00:00
Ben Thorner
a69d1635a1 Update FactStatus table in bulk for each service
Previously we were looping over data from the Notifications/History
table and then shovelling it into the status table, one row at a time
- plus an extra delete to clean up any existing data.

This replaces that with a batch insertion, similar to how we archive
notifications [1], but using a simple subquery (via "from_select" [2])
instead of a temporary table.

To make the select compatible with the insert, I've used "literal"
to inject the constant pieces of data, so each row has everything it
needs to go into the status table.

[1]: 9ce6d2fe92/app/dao/notifications_dao.py (L295)
[2]: https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.from_select
2022-02-16 13:40:05 +00:00
Ben Thorner
efde271c5a Rewrite FactStatus updates to be an upsert
This is consistent with the way we do billing updates [1] and is a
bit less clunky. Functionally it should be the same - note that the
tests already cover the "overwriting" behaviour if a row exists.

[1]: 9ce6d2fe92/app/dao/fact_billing_dao.py (L522)
2022-02-16 13:39:08 +00:00
Ben Thorner
966c4db8c6 Fix getting service IDs for status aggregation
Addresses [1].

Previously the query would always use UTC midnight, even after we
had switched to BST (+1h). We store timestamps as naive UTC in our
DB - without a timezone - but we want the query to work in terms
of GMT / BST so we adjust for that - BST midnight is 11PM in UTC.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r791998690
2022-02-10 10:51:45 +00:00
Ben Thorner
6e8f121548 Standardise how we query midnight-to-midnight
Partially addresses [1] (lots more detail to read in the comment).
I've also added some tests for the status DAO function to confirm
it behaves as expected across timezones.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r802634913
2022-02-10 10:51:27 +00:00
Ben Thorner
7f4b140f97 Rename function to make it consistent
This is consistent with the new "on_date" function. It was going
off the edge of my screen before in some parts of the code.
2022-02-09 17:39:08 +00:00
Ben Thorner
1213463b8e Only aggregate status when necessary for a service
This takes a similar approach to the nightly deletion task so that
we only create sub-tasks when there are actually notifications to
aggregate for a given type and day [1].

We're making this change to stop the duplication errors we're getting
at the moment and ensure the task can scale to more messages and more
services. There are two parts to this:

- Each subtask should now run within the 5 minute visibility timeout.
However, they may still be duplicated if the parent task overruns [2].

- The parent task creates a mininal number of subtasks, and the query
to determine this is very fast for a normal process day (milliseconds).

Since all tasks will run quickly, there should be no more duplication.

In order to test this more nuanced task, I rewrote the tests:

- One test checks the subtask is called correctly.
- One test checks we create all the right subtasks.

[1]: https://github.com/alphagov/notifications-api/pull/3381
[2]: https://docs.google.com/document/d/1MaP6Nyy3nJKkuh_4lP1wuDm19X8LZITOLRd9n3Ax-xg/edit#heading=h.q3intzwqhfzl
2022-02-09 17:39:07 +00:00
Ben Thorner
018a253b6f Revert "Revert running status aggregation in parallel"
This reverts commit 0f6dea0deb.
2022-02-09 17:39:00 +00:00
David McDonald
1d8fafcdf4 Remove unused functions
Can't see these being used anywhere so lets get rid of them
2022-02-07 15:58:04 +00:00
Rebecca Law
09c8fbe982 Merge pull request #3418 from alphagov/letters-too-long
Mark letters as validation-failed if the templated letter is too long.
2022-02-02 08:30:50 +00:00
Pea Tyczynska
4a90cde701 Merge pull request #3429 from alphagov/cancel_alert_via_api
Cancel broadcast via API
2022-01-21 14:04:35 +00:00
Ben Thorner
0f6dea0deb Revert running status aggregation in parallel
The top-level task didn't run successfully after this was deployed
due to the worker being killed due to heavy disk usage. While the
more parallel version does log much more, it doesn't totally explain
the disk behaviour. Nonetheless, reverting it is sensible to give us
the time we need to investigate more.
2022-01-20 12:22:33 +00:00
Pea Tyczynska
a4c20e8ba6 Return 404 if reference from cancel message does not match
If the reference from cancel CAP XML we received via API does not
match with any existing broadcast, return 404.

Do the same if service id doesn't match.

Also refactor code to cancel broadcast out into separate function

It should be a separate function that is only called by create_broadcast
function. This will prevent create_broadcast from becoming too
big and complex and doing too many things.
2022-01-19 15:42:27 +00:00
Pea Tyczynska
940126abfb Reject unapproved broadcast upon cancel API request
When a service sends us a cancel broadcast XML via API, if that
broadcast was not approved yet, reject it.
2022-01-19 15:41:38 +00:00
Ben Thorner
6be489daa7 Merge pull request #3425 from alphagov/parallelise-ft-status-180693991
Parallelise status aggregation by service and day
2022-01-19 13:49:28 +00:00
Rebecca Law
6cd7a23d3c If there is an invalid letter that has not been updated to validation-failed because the update-validation-failed-for-templated-letter has not been picked up off the letter-tasks queue and the collate-letter-pdfs-to-be-sent has started.
1. The number of letters that we send to DVLA will be not be correct (see 20ead82463/app/celery/letters_pdf_tasks.py (L136))
This may raise an alert with DVLA when they find we have sent them fewer letter than we have reported.
2. When we get the PDF from S3 we will get a file not found 20ead82463/app/celery/letters_pdf_tasks.py (L244)
The error will not prevent the collate task from completing but we will see an alert email for the exception and raise questions.

Although this situation is very unlikely because we have a 15 minute window between the last letter deadline date and the time we kick off the collate task we should still mitigate these issues. I updated the queries to only return letters with billable_units > 0, all valid letters should have at least 1 billable unit.
2022-01-19 08:31:19 +00:00
Ben Thorner
9686595fa8 Minor tweaks to address comments on the PR
To address:

- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786867994
- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786853329
- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786848793
- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786214794
2022-01-18 16:56:53 +00:00
Ben Thorner
086f0f50a6 Remove unnecessary extra method in status DAO
This makes it easier to see what is being queried.
2022-01-12 15:48:00 +00:00
Ben Thorner
ddbf556486 Rewrite task to aggregate status by service
This is a step towards parallelising the task by service and day.
2022-01-12 15:47:53 +00:00
Katie Smith
ed725c1513 Add endpoint to allow org team members to be removed
This is similar to the corresponding endpoint for services. However,
it is a little simpler since we don't need to worry about always having
at least one team member for an organisation.

The new dao function added, `dao_remove_user_from_organisation`, is also
simpler than `dao_remove_user_from_service` since we don't have any
organisation permissions to deal with.
2022-01-11 15:20:48 +00:00
Ben Thorner
081e0cab88 Merge pull request #3417 from alphagov/optimise-status-query-180693991
Optimise query to populate notification statuses
2022-01-11 14:18:36 +00:00
Ben Thorner
63b5204fb0 Optimise query to populate notification statuses
Investigation with EXPLAIN and EXPLAIN ANALYZE for the notification
history table shows this is another instance of [1] but for the key
type column. Swapping "!=" for "IN" solves the problem.

[1]: https://github.com/alphagov/notifications-api/pull/3360
2022-01-11 13:22:04 +00:00
Rebecca Law
2257cae398 Fix bug in organisation report for its services and usages.
If a service has not sent any SMS for the financial year the free allowance was showing up as 0 rather than the number in annual billing. The query has been updated to use an outer join so that the free allow will be returned when there is no ft_billing.

There is a potential performance enhancement to only return the data for the services of the organisation in the `fetch_sms_free_allowance_remainder_until_date` subquery. I will investigate in a subsequent PR.
2022-01-11 10:04:36 +00:00
Pea Tyczynska
32cd7a0eb6 Merge pull request #3395 from alphagov/fix_org_usage_report
Fix calculating remaining free allowance for SMS
2021-12-21 15:02:54 +00:00
Ben Thorner
f65fb519c7 Merge pull request #3404 from alphagov/remove-redundant-conditional-180477467
Remove redundant conditional for letter branding
2021-12-21 13:35:59 +00:00
Ben Thorner
de9ae08ecc Downgrade log for letter deletion exceptions
If the S3 object is missing [1], then that's what we want, so we
don't need such a severe log for it, but we still want to know as
it's not expected. This is separate to more general "ClientError"
exceptions, which could mean anything.

There weren't any tests to cover missing S3 objects, so I've added
one. I don't think we need a test for ClientErrors:

- If there was no handler, the task would fail and we'd learn about
it that way.

- The scope of the calling task is now much smaller, so it matters
less than it used to [2].

[1]: 81a79e56ce/app/letters/utils.py (L52)
[2]: f965322f25
2021-12-20 12:45:48 +00:00
Ben Thorner
76da31c32a Remove redundant conditional for letter branding
This is no longer used when creating a service [1]. It was likely
added at a migration point when Admin _did_ specify branding.

[1]: 50c3c3e10c/app/main/views/add_service.py (L15-L22)
2021-12-16 17:54:33 +00:00
Pea Tyczynska
6c04deaec2 Get rid of unnecessary coalesce 2021-12-14 17:36:03 +00:00
Leo Hemsted
49cc1b643f split delete task up into per service
we really don't gain anything by running each service delete in sequence
- we get the services, and then just loop through them deleting per
service. By deleting per service in separate tasks, we can take
advantage of parallelism. the only thing we lose is some log lines but I
don't think we're that interested in them.

only set query limit at the move_notifications dao function - the task
doesn't really care about the technical implementation of how it deletes
the notifications
2021-12-14 15:24:34 +00:00
Ben Thorner
11278c47f5 Replace log with StatsD gauge for slow delivery
A gauge is more useful as we can visualise it and combine it with
other stats - we already have other stats for the total number of
notifications sent by provider, and we can extrapolate the number
of slow notifications using this, if needed.

We also still have logs to say the task is running, as well as a
log in the calling code when we actually make a switch [1], so
we're not losing anything by removing the log here.

[1]: a9306c4557/app/celery/scheduled_tasks.py (L117)
2021-12-14 13:03:43 +00:00
Ben Thorner
2adaaac3ae Remove redundant conditions for update query
Filtering by ID is enough, noting the other conditions were the
same between both queries.
2021-12-13 17:03:07 +00:00
Ben Thorner
c8ebb365d4 Make limit of DAO timeout function more obvious
We're going to iterate how we use the function with a limit, so we
shouldn't say it's "temporary" anymore. We don't need to change the
default, but having it in the function parameters makes it easier to
see the funtion doesn't time out all notifications, just some.
2021-12-13 17:01:41 +00:00
Ben Thorner
76aeab24ce Rewrite DAO timeout method to take cutoff_time
Previously we specified the period and calculated the cutoff time
in the function. Passing it in means we can run the method multiple
times and avoid getting "new" notifications to time out in the time
it takes to process each batch.
2021-12-13 16:56:21 +00:00
Ben Thorner
3bcaf8330e Simplify comment for DAO timeout function 2021-12-13 16:39:55 +00:00
Ben Thorner
2fb432adaf Merge pull request #3383 from alphagov/email-sms-created-alert-180344153
Add new log / alert for 'created' email / SMS
2021-12-13 12:56:05 +00:00
David McDonald
7d8eed8228 Optimise queries run for creating pagination links
We have been running in to the problem in
pallets/flask-sqlalchemy#518 where
our page loads very slow when viewing a single page of notifications
for a service in the admin app. Tracing this back and using SQL
explain analyze I can see that getting the notifications takes about
a second but the second query to count how many notifications there
are (to work out if there is a next page of pagination) can take up
to 100 seconds.

As suggested in that issue, we do the pagination ourselves.
Our pagination doesn't need us to know exactly how many notifications
there are, just whether there are any on the next page and that can
be done without running the slow query to count how many
notifications in total by using `count_pages=False`.

This commit is analagous to
c68d1a2f23

The only difference is that in that case, the pagination links are
used to show prev and/or next links in the admin app. In this case,
the pagination links are only used to see if there is a page 2, and
if there is, say that we are only showing the first 50 results.
2021-12-10 17:47:27 +00:00
Pea Tyczynska
a74d1b026f Fix calculating remaining free allowance for SMS
The way it was done before, the remainder was incorrect in the
billing report and in the org usage query - it was the sms remainder
left at the start of the report period, not at the end of that period.

This became apparent when we tried to show sms_remainder on the org
usage report, where start date is always the start of the financial year.
We saw that sms sent by services did not reduce their free allowance
remainder according to the report. As a result of this, we had to
temporarily remove of sms_remainder column from the report, until
we fix the bug - it has been fixed now, yay!

I think the bug has snuck in partially because our fixtures for testing
this part of the code are quite complex, so it was
harder to see that numbers don't add up. I have added comments
to the tests to try and make it a bit clearer why the results are
as they are.

I also added comments to the code, and renamed some variables,
to make it easier to understand, as there are quite a few
moving parts in it - subqueries and the like.
I also renamed the fetch_sms_free_allowance_remainder method to
fetch_sms_free_allowance_remainder_until_date so it is clearer
what it does.
2021-12-09 18:58:10 +00:00