Commit Graph

4961 Commits

Author SHA1 Message Date
Rebecca Law
4b6de79dae Merge pull request #3484 from alphagov/update-free-allowance-2022
Free allowance rates for 2022
2022-03-15 14:00:09 +00:00
Rebecca Law
8402e7c97b Free allowance rates for 2022
Update the map for determine the free allowance (aka: free sms fragments) for services.
2022-03-15 12:35:39 +00:00
Ben Thorner
6cd3650b3f Merge pull request #3482 from alphagov/fix-docker-host-181498273
Fix letter functional tests to work in Docker
2022-03-15 12:32:54 +00:00
Leo Hemsted
2fbe9e85ac Merge pull request #3479 from alphagov/auto-retry-stuck-av-letters
automatically retry letters stuck in pending-virus-scan
2022-03-15 11:43:42 +00:00
Leo Hemsted
9e8df8b623 remove "letters stuck pending av" runbook
there's not anything we know we need to do now that we resolve stuck
letters automatically. Letters couuld still get into this state, so it's
worth alerting us. However, we don't have anything concrete that we know
how to fix these letters, so we should just remove the runbook entirely.
2022-03-10 14:10:01 +00:00
Rebecca Law
cd78361be1 Merge pull request #3477 from alphagov/volume-stat-report
Report for total notifications sent per day for each channel.
2022-03-09 13:42:00 +00:00
Rebecca Law
29ebf0eb9a Move the casting the column as an int to the the endpoint, we need to
convert the Decimal data type to datatype json can work with.
2022-03-09 12:29:58 +00:00
Ben Thorner
3fab7a0ca9 Fix letter functional tests to work in Docker
Currently "test_send_letter_notification_via_api" fails at the final
stage in create-fake-letter-response-file [^1]:

        requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6011): Max retries exceeded with url: /notifications/letter/dvla (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff95ffc460>: Failed to establish a new connection: [Errno 111] Connection refused'))

This only applies when running in Docker so the default should still
be "localhost" for the Flask app itself.

[^1]: 5093064533/app/celery/research_mode_tasks.py (L57)
2022-03-09 11:07:50 +00:00
David McDonald
0d952b4d8c Reduce timeout for service callback attempt to 5 seconds
It is currently 60 seconds but we have had two incidents in the
past week where there is a connection error talking to a service
and the request takes up to 60 seconds before failing. When this
happens, if there are a few of these callbacks then all of them
will completely hog the service callback worker and build up a big
queue of all the other service callbacks.

5 seconds has been chosen as that is still a pretty decent length
time for a simple web request that should just be giving them a
little bit of information for them to store. 5 seconds should be a
sufficient enough reduction that we dramatically reduce this problem
for the moment.

Open to this number
being changed in the future based on how we see it perform.
2022-03-08 13:05:32 +00:00
Leo Hemsted
00259893f1 automatically retry letters stuck in pending-virus-scan
Since sept 2019 we've had to log on to production around once every
twenty days to restart the virus scan task for a letter. Most of the
time this is just a case of making sure the file is in the scan bucket,
and then triggering the task. If the file isn't in the scan bucket we'd
need to do some more manual investigation to find out exactly where the
file got stuck, but I can only remember times when it's been in the scan
bucket.

So if the file is in the scan bucket, we can just check that with code
and kick the task off automatically.
2022-03-07 18:31:46 +00:00
Chris Hill-Scott
c2b6a9df80 Allow admin app to specify domain for registration email
This follows the pattern for invite emails where the admin app tells the
API which domain to use when generating the link.

This will starting working once the admin change is merged:
- [ ] TBC

It won’t break anything if it’s merged before the admin change.
2022-03-07 15:03:46 +00:00
Rebecca Law
466b7fa341 Report for total notifications sent per day for each channel.
Daily volumes report: total volumes across the platform aggregated by whole business day (bst_date)
Volumes by service report: total volumes per service aggregated by the date range given.

NB: start and end dates are inclusive
2022-03-07 10:44:49 +00:00
Katie Smith
514bd48614 Update flake8-bugbear from 20.11.1 to 22.1.11
And ignore a warning, since I did not think that in this case "Using
.strip() with multi-character strings is misleading the reader".
2022-03-02 16:51:09 +00:00
Katie Smith
807db037eb Update flake8 from 3.8.4 to 4.0.1
And adds `noqa` to some non-errors which are being flagged.
2022-03-02 15:52:18 +00:00
Leo Hemsted
e497cbbec6 Merge pull request #3474 from alphagov/fix-returned-letters
split returned letters tasks into a max count of returned letters
2022-03-02 11:49:12 +00:00
Leo Hemsted
b1636b7a1a split returned letters tasks into a max count of returned letters
if we have too many returned letters, we'll exceed SQS's max task size
of 256kb. Cap it to 5000 - this is probably a bit conservative but
follows the initial values we used when implementing this for the
collate-letters-task[^1]. Also follow the pattern of compressing the
sqs payload just to reduce it a little more.

[^1]: https://github.com/alphagov/notifications-api/pull/1536
2022-03-02 10:51:08 +00:00
Katie Smith
ead909d2ce Merge pull request #3472 from alphagov/inbound-no-command
Stop blank strings being inserted as inbound numbers
2022-03-02 08:30:41 +00:00
Ben Thorner
d61133c508 Merge pull request #3468 from alphagov/bump-sms-alert-threshold
Tweak SMS alert to make it worth the effort
2022-03-01 16:10:33 +00:00
Katie Smith
67d1b3719e Stop blank strings being inserted as inbound numbers
We had an inbound number in the database with a value of ''. This
could happen if there are blank lines in the inbound numbers file
we use for the `insert-inbound-numbers` command. To avoid this
happening again, the command now calls `.strip()` on each line of the
file and only inserts a row if the result is truthy (i.e. not '').
2022-03-01 15:31:54 +00:00
Katie Smith
d572c7228d Allow the free SMS fragment limit to be 0
This updates the schema so that the free allowance has a minimum value
of 0 instead of 1.
2022-02-28 12:45:25 +00:00
Katie Smith
d53ef27b7f Make letters still sending check later
This changes the scheduled task to raise an alert if letters are still
sending from 1530 to 1700. DVLA have reported that our "monitoring is
executing just before we actually mark them as ‘despatched’ and send
you the feedback files." and asked us to make the check a little later.

We don't actually contact DVLA until the morning after the alert anyway,
so this won't affect the process of getting in touch with them.

This change will require Cronitor to be updated for the new time.
2022-02-28 11:50:55 +00:00
Ben Thorner
d406829b6c Tweak SMS alert to make it worth the effort
Currently we alert if a service wastes £16 of SMS. It may cost us
around that amount just to deal with the alert, especially if the
service refuses to clean up their data.

This bumps the threshold to something more alarming, which should
make it more reasonable to suspend the service if we can show that
they've already wasted public money. £160 seems like a reasonable
compromise between have wasted vs could waste.

Note: we previously compromised on 1000 [1] down from 63K [2]. I
think we can afford to go a little bit higher.

[1]: https://github.com/alphagov/notifications-api/pull/3234
[2]: https://github.com/alphagov/notifications-api/pull/3221
2022-02-25 10:37:56 +00:00
Ben Thorner
c9a9640a4b Iterate local development with Docker
This makes a few changes to:

- Make local development consistent with our other apps. It's now
faster to start Celery locally since we don't try to build the
image each time - this is usually quick, but unnecessary.

- Add support for connecting to a local Redis instance. Note that
the previous suggestion of "REDIS = True" was incorrect as this
would be turned into the literal string "True".

I've also co-located and extended the recipes in the Makefile to
make them a bit more visible.
2022-02-24 17:15:41 +00:00
Chris Hill-Scott
9c2f0ce9db Clear cache when cancelling broadcast via the API
Before we implemented ‘cancel’ any updates to a broadcast went through
the admin app. This meant the admin app could deal with clearing the
cache any time a broadcast was updated by a user performing an action.

Now that a broadcast can be updated without the admin app being involved
we have another place we need to clear the cache from.

If we don’t do this then the broadcast can look like it’s still going
even though it’s successfully been cancelled.
2022-02-22 16:26:05 +00:00
Chris Hill-Scott
4b4122a773 Merge pull request #3461 from alphagov/be-more-robust-around-references-to-cancel
Be more robust in handling ambiguous references to cancel an alert
2022-02-21 10:39:48 +00:00
Chris Hill-Scott
cc207ac11f Raise error if multiple broadcasts found for reference
Because the `<reference>` field of a `cancel` message can contain an
arbitrary number of items it’s possible for it to reference more than
one current alert.

In this case it is ambiguous which alert should be cancelled, so we
should raise a custom error.

This will help people know that they have to manually go into Notify and
figure out which alert(s) to cancel there.
2022-02-17 15:23:13 +00:00
Chris Hill-Scott
f691bc2a92 Only lookup broadcasts which can be cancelled
It is possible that, among the references Environment Agency give us for
which broadcast to cancel, there could be references for older, already
expired broadcasts.

This would be the case if someone cancelled a broadcast in Notify, then
issued and try to re-cancel another broadcast to the same area. The
Flood Warning Service has no way of knowing that the first broadcast has
been cancelled in Notify already, so it would add the reference to the
list of things to be cancelled.

We can avoid this from happening by filtering-out already-cancelled and
expired broadcasts before looking up which one should be cancelled.
2022-02-17 15:23:13 +00:00
Ben Thorner
8fac5c72db Merge pull request #3454 from alphagov/upsert-status-180693991
Rewrite status aggregation to be a bulk upsert
2022-02-17 13:21:50 +00:00
Chris Hill-Scott
d73131bbec Allow cancel of alert via API with no description
The XML for an alerts requires a `<description>` field. The XML for
a `<cancel>` may have a `<description>` field populated (although we
ignore the contents) but it may also be empty.

This commit updates the schema to leave the all the validation to the
view layer, which can decide when or when not to validate the content of
the `<description>` field.
2022-02-16 15:31:50 +00:00
Ben Thorner
574b1d3c63 Fix not deleting dead rows in status table
To address: https://github.com/alphagov/notifications-api/pull/3454#pullrequestreview-880302729

Since we now delete all rows before inserting fresh ones, we no
longer need to worry about conflicts. I've also extended the old
test to check all three kinds of overwrite: new, changed, gone.
2022-02-16 13:40:08 +00:00
Ben Thorner
a69d1635a1 Update FactStatus table in bulk for each service
Previously we were looping over data from the Notifications/History
table and then shovelling it into the status table, one row at a time
- plus an extra delete to clean up any existing data.

This replaces that with a batch insertion, similar to how we archive
notifications [1], but using a simple subquery (via "from_select" [2])
instead of a temporary table.

To make the select compatible with the insert, I've used "literal"
to inject the constant pieces of data, so each row has everything it
needs to go into the status table.

[1]: 9ce6d2fe92/app/dao/notifications_dao.py (L295)
[2]: https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.from_select
2022-02-16 13:40:05 +00:00
Ben Thorner
efde271c5a Rewrite FactStatus updates to be an upsert
This is consistent with the way we do billing updates [1] and is a
bit less clunky. Functionally it should be the same - note that the
tests already cover the "overwriting" behaviour if a row exists.

[1]: 9ce6d2fe92/app/dao/fact_billing_dao.py (L522)
2022-02-16 13:39:08 +00:00
Ben Thorner
7bc037f1ce Merge pull request #3459 from alphagov/fix-status-task-logs-180693991
Fix task name and action in status task logs
2022-02-16 13:38:38 +00:00
Ben Thorner
5206844a95 Merge pull request #3438 from alphagov/lower-query-timeout-180693991
Revert increased timeout for reporting worker
2022-02-16 13:38:29 +00:00
Ben Thorner
ef231d5de7 Fix task name and action in status task logs 2022-02-16 11:45:45 +00:00
Chris Hill-Scott
bbc444699a Return 400 if references missing from cancel broadcast
If someone tries to cancel a broadcast but the references don’t match
and existing broadcast we correctly return a 404.

If they don’t provide any references then we get an exception. This
commit catches the missing references and returns a 400. I think this
is more appropriate because it’s malformed request, rather than a
well-formed request that doesn’t match our data. It also lets us write a
more specific and helpful error message.
2022-02-14 12:34:09 +00:00
Ben Thorner
966c4db8c6 Fix getting service IDs for status aggregation
Addresses [1].

Previously the query would always use UTC midnight, even after we
had switched to BST (+1h). We store timestamps as naive UTC in our
DB - without a timezone - but we want the query to work in terms
of GMT / BST so we adjust for that - BST midnight is 11PM in UTC.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r791998690
2022-02-10 10:51:45 +00:00
Ben Thorner
6e8f121548 Standardise how we query midnight-to-midnight
Partially addresses [1] (lots more detail to read in the comment).
I've also added some tests for the status DAO function to confirm
it behaves as expected across timezones.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r802634913
2022-02-10 10:51:27 +00:00
Ben Thorner
7f4b140f97 Rename function to make it consistent
This is consistent with the new "on_date" function. It was going
off the edge of my screen before in some parts of the code.
2022-02-09 17:39:08 +00:00
Ben Thorner
1213463b8e Only aggregate status when necessary for a service
This takes a similar approach to the nightly deletion task so that
we only create sub-tasks when there are actually notifications to
aggregate for a given type and day [1].

We're making this change to stop the duplication errors we're getting
at the moment and ensure the task can scale to more messages and more
services. There are two parts to this:

- Each subtask should now run within the 5 minute visibility timeout.
However, they may still be duplicated if the parent task overruns [2].

- The parent task creates a mininal number of subtasks, and the query
to determine this is very fast for a normal process day (milliseconds).

Since all tasks will run quickly, there should be no more duplication.

In order to test this more nuanced task, I rewrote the tests:

- One test checks the subtask is called correctly.
- One test checks we create all the right subtasks.

[1]: https://github.com/alphagov/notifications-api/pull/3381
[2]: https://docs.google.com/document/d/1MaP6Nyy3nJKkuh_4lP1wuDm19X8LZITOLRd9n3Ax-xg/edit#heading=h.q3intzwqhfzl
2022-02-09 17:39:07 +00:00
Ben Thorner
c8db58d0e8 Reorder loops for creation status agg sub tasks
This will help tailor the innermost loop on services.
2022-02-09 17:39:06 +00:00
Ben Thorner
d6678b6a70 Remove unnecessary logs from status aggreagtion
These can be inferred elsewhere:

- Task creation is obvious from task execution. If we're concerned
about a specific service, we can check the updated times on the DB
records, since all records are recreated each time this runs.

- Task starting is already logged.

- Task completion is already logged. The number of rows updated can
also be inferred from the DB.

The log I've found useful is the one about fetching the data, and
I've also added another to time how long it takes to insert the data,
as both could be sources of poor performance.

Arguably we should use metrics for this sort of thing, but logs are
easier in practice for the metric systems we have.
2022-02-09 17:39:05 +00:00
Ben Thorner
018a253b6f Revert "Revert running status aggregation in parallel"
This reverts commit 0f6dea0deb.
2022-02-09 17:39:00 +00:00
Pea Tyczynska
d05bff9efc Merge pull request #3440 from alphagov/audit-api-key-id-when-cancelling-broadcast-via-api
Audit api key id when cancelling broadcast via api
2022-02-09 10:15:03 +00:00
Chris Hill-Scott
0614a70764 Merge pull request #3445 from alphagov/bump-utils-53.0.0
Bump utils to 53.0.0
2022-02-08 09:56:18 +00:00
Chris Hill-Scott
7f72d3a60f Bump utils to 53.0.0
Changes:

53.0.0
---

* `notifications_utils.columns.Columns` has moved to
  `notifications_utils.insensitive_dict.InsensitiveDict`
* `notifications_utils.columns.Rows` has moved to
  `notifications_utils.recipients.Rows`
* `notifications_utils.columns.Cell` has moved to
  `notifications_utils.recipients.Cell`

52.0.0
---

* Deprecate the following unused `redis_client` functions:
  - `redis_client.increment_hash_value`
  - `redis_client.decrement_hash_value`
  - `redis_client.get_all_from_hash`
  - `redis_client.set_hash_and_expire`
  - `redis_client.expire`

51.3.1
---

* Bump govuk-bank-holidays to cache holidays for next year.
2022-02-08 09:45:10 +00:00
David McDonald
1d8fafcdf4 Remove unused functions
Can't see these being used anywhere so lets get rid of them
2022-02-07 15:58:04 +00:00
Leo Hemsted
4018fcba28 Merge pull request #3442 from alphagov/celery-docker
add script to run celery from within docker
2022-02-03 12:51:01 +00:00
Pea Tyczynska
0737eceddb Include check constraint in migration script
Add check constraint that created_by_id should not be null, unless
created_by_api_key_id is not null to the migration script. It is
already in the models file.

Also remove check constraint for cancelled_by_id from models, as
this field would only be filled for broadcasts with cancelled
status.

Also add some spacing in that migration script so it is easier
to read.
2022-02-02 17:21:58 +00:00
Chris Hill-Scott
07f584e1d5 Allow admin app to specify domain for password reset
This follows the pattern for invite emails where the admin app tells the
API which domain to use when generating the link.

This will starting working once this admin change is merged:
- [ ] https://github.com/alphagov/notifications-admin/pull/4150/files

It won’t break anything if it’s merged before the admin change.
2022-02-02 17:15:09 +00:00