Commit Graph

4070 Commits

Author SHA1 Message Date
Ben Thorner
d66c68d6d6 Merge pull request #3364 from alphagov/celery-headers-request-id-180213914
Move Celery task Request ID injection into headers
2021-11-12 11:10:29 +00:00
Ben Thorner
4a577eca62 Merge pull request #3359 from alphagov/improve-clarify-botocore-exception-180017131
Improve and clarify large task error handling
2021-11-12 11:10:17 +00:00
Ben Thorner
1872854a4e Improve and clarify large task error handling
Previously we were catching one type of exception if something went
wrong adding a notification to the queue for high volume services.
In reality there are two types of exception so this adds a second
handler to cover both.

For context, this is code we changed experimentally as part of the
upgrade to Celery 5 [1]. At the time we didn't check how the new
exception compared to the old one. It turns out they behaved the
same and we were always vulnerable to the scenario now covered by
the second exception, where the behaviour has changed in Celery 5 -
testing with a large task invocation gives...

Before (Celery 3, large-ish task):

    'process_job.apply_async(["a" * 200000])'...

    boto.exception.SQSError: SQSError: 400 Bad Request
    <?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>InvalidParameterValue</Code><Message>One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes.</Message><Detail/></Error><RequestId>96162552-cd96-5a14-b3a5-7f503300a662</RequestId></ErrorResponse>

Before (Celery 3, very large task):

    <hangs forever>

After (Celery 5, large-ish task):

    botocore.exceptions.ClientError: An error occurred (InvalidParameterValue) when calling the SendMessage operation: One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes.

After (Celery 5, very large task):

    botocore.parsers.ResponseParserError: Unable to parse response (syntax error: line 1, column 0), invalid XML received. Further retries may succeed:
    b'HTTP content length exceeded 1662976 bytes.'

[1]: 29c92a9e54
2021-11-11 17:37:50 +00:00
Ben Thorner
89a8dd1a03 Move Celery task Request ID injection into headers
Previously we passed along this piece of state via the kwargs for
a task, but this runs the risk of the task accidentally receiving
the extra kwarg unless we've covered all the code paths that could
invoke it directly e.g. retries don't invoke __call__.

This switches to using Celery "headers" to pass the extra state. It
turns out that a Celery has two "header" concepts, which leads to
some confusion and even a bug with the framework [1]:

- In older (pre v4.4) versions of Celery, the "headers" specified
by apply_async() would become _the_ headers in the message that
gets passed around workers, etc. These would be available later on
via "self.request.headers".

- Since Celery protocol v2, the meaning of "headers" in the message
changed to become (basically) _all_ metadata about the task [2],
with the "headers" option in apply_async() being merged [3] into
the big dict of metadata.

This makes using headers a bit confusing unfortunately, since the
data structure we put in is subtly different to what comes out in
the request context. Nonetheless, it still works. I've added some
comments to try and clarify it.

Note that one of the original tests is no longer necessary, since we
don't need to worry about argument passing styles with headers.

[1]: https://github.com/celery/celery/issues/4875
[2]: 663e4d3a0b (diff-07a65448b2db3252a9711766beec23372715cd7597c3e309bf53859eabc0107fR343)
[3]: 681a922220/celery/app/amqp.py (L495)
2021-11-10 18:03:40 +00:00
Katie Smith
3d4796c924 Add task to resanitise and replace a PDF for precompiled letter
This adds a task which is designed to be used if we want to recreate the
PDF for a precompiled letter (either one that has been created using the
API or one that has been uploaded through the website).

The task takes the `notification_id` of the letter and passes template
preview the details it needs in order to sanitise the original file and
then replace the version in the letters-pdf bucket with the freshly
sanitised version.
2021-11-10 09:51:31 +00:00
Ben Thorner
29c92a9e54 Try removing boto package again 2021-11-01 09:54:10 +00:00
Ben Thorner
d1586a8f81 CC DVLA in tickets about outstanding letters
Previously we sent them emails about this manually. We also tried
a Zendesk macro/trigger approach, but using a CC means:

- We can control the behaviour ourselves (Zendesk triggers can only
be edited by admins outside our team).

- We keep the DVLA notification approach consistent and in one place,
so notifications always go to the same people.

- Any further (public) updates to the ticket will also trigger a
notification to DVLA (previous trigger only notified on creation).
2021-10-29 11:46:29 +01:00
David McDonald
5a51ab6131 Bug fix: update normalised_to, not just to after letter sanitise
When a precompiled letter is sent to us, we set the `to` field as
'Provided as PDF' in
1c1023a877/app/v2/notifications/post_notifications.py (L100-L104)

This then also sets `normalised_to` as `providedaspdf`.

However, when template preview sanitises the letter, pulls out the
address and gives it to the API, we were only setting `to` to be
the new address and had forgotten to also amend `normalised_to` to
be the normalised version. This meant that for all these letters
we accidentally left `normalised_to` as `providedaspdf`. The impact
of this was that we can not then search for these letters in the
admin user interface as they rely on the `normalised_to` field
containing the recipient address.

This commit fixes that bug by also setting the `normalised_to`
field
2021-10-27 11:56:25 +01:00
Chris Hill-Scott
54bcf618da Store the event field from CAP XML broadcasts
We don’t store everything that comes in the CAP XML when someone creates
a broadcast via the API.

One thing we do store is `<identifier>` (in a column called `reference`)
which is a unique (to the external system) identifier for the broadcast.
We show this in the front end instead of the template name, because
broadcasts created from the API don’t use templates.

However this ID isn’t very friendly – the Environment Agency just supply
a UUID.

The Environment Agency also populate the `<event>` field with some human
readable text, for example:
> 013 Issue Severe Flood Warning EA

(013 is an area code which will be meaningful to the Flood Warning
Service team)

We should show this in the UI instead of the reference. The first step
towards this is storing it in the database and returning it in the REST
endpoints.

Later we can have the admin app prefer `cap_event` over `reference`,
where `cap_event` is present.

We can’t backfill this data because we don’t keep a copy of the original
XML.

Seems like `<event>` is a mandatory property of `<info>`, so we don’t
need to worry about the field being missing (`<info>` is optional in
CAP but we require it because it contains stuff like the areas which
we need in order to send out the broadcast`).

***

https://www.pivotaltracker.com/story/show/176927060
2021-10-26 11:12:27 +01:00
Ben Thorner
d703251b13 Merge pull request #3348 from alphagov/better-callback-stats-180016688
Include status in stats about delivery times
2021-10-22 11:59:24 +01:00
Ben Thorner
f974108934 Include status in stats about delivery times
Previously these metrics weren't very useful because they could be
skewed by long timings for failed notifications, which can take up
to 72 hours to deliver. I'm intentionally not trying to have a dual
running period (with the old and new names) because:

- We don't use the current stats for anything (checking Grafana).

- The current stats get turned into a "bucket" metric in Prometheus
[1][2], which isn't very useful because it can only tell us the mean
time to deliver, but we're actually interested in percentiles.

Switching to a new naming is an opportunity to fix the raw data and
the way it's aggregated, using the same kind of "summary" metric that
we now use for stats about our Celery tasks [3].

[1]: c330a8ac8a/paas/statsd/statsd-mapping.yml (L82)
[2]: https://prometheus.io/docs/practices/histograms/#quantiles
[3]: https://github.com/alphagov/notifications-aws/pull/890
2021-10-20 17:22:59 +01:00
Leo Hemsted
0b8c6ef263 Merge pull request #3339 from alphagov/letter-runbook-link
tweak zendesk message for no ack files alert
2021-10-20 15:23:33 +01:00
Pea Tyczynska
1b6f9505da Call publish-govuk-alerts task when alert expires
The `auto-expire-broadcast-messages` task checks for expired broadcasts
at five minute intervals. This change now calls the
`publish-govuk-alerts` task in govuk-alerts if there are expired
broadcasts so that the site is updated.

Co-authored-by: Katie Smith <katie.smith@digital.cabinet-office.gov.uk>
2021-10-18 08:41:25 +01:00
Katie Smith
04bfd6bfdb Trigger task to publish alerts when sending or cancelling alert
When we send or cancel a broadcast message, we now trigger a task
in govuk-alerts repo that polls our API for alerts and
publishes a fresh list of alerts.

Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>
2021-10-18 08:41:24 +01:00
Leo Hemsted
b8c4e19072 tweak zendesk message for no ack files alert
include a link to a runbook entry.

also the list of acknowledgement files can be very long, so make that
the last thing, and use new lines to space out the message.
2021-10-08 13:45:02 +01:00
Chris Hill-Scott
544bfbf569 Add separate config item for failed login count
It’s confusing that changing `MAX_VERIFY_CODE_COUNT` also limits the
number of failed login attempts that a user of text messages 2FA can
make.

This makes the parameters independent, and adds a test to make sure any
future changes which affect the limit of failed login attempts are
covered.
2021-10-04 10:45:07 +01:00
Chris Hill-Scott
786893d920 Reduce max concurrent 2 factor codes
I was doing some analysis and saw that in the last 24 hours the most
codes that anyone had was in a 15 minute window was 3.

So I think we can safely reduce this to 5 to get a bit more security
with enough headroom to not have any negative impact to the user.
2021-10-04 10:45:06 +01:00
Chris Hill-Scott
19ad11e383 Don’t repeat digits in security codes
People with dyslexia and dyscalculia find it difficult to transpose
codes which have consecutive, repeated digits[1].

This commits enhances the algorithm for generating codes to not repeat
the previous digit in a code.

This reduces the key space for our codes from 100,000 possibilities to
65,610 possibilities.

1. https://twitter.com/annaecook/status/1442567679710150662
2021-09-30 10:24:17 +01:00
Katie Smith
58597653df Update how "sending to TV numbers" Zendesk tickets are created 2021-09-29 11:26:20 +01:00
Katie Smith
0c0c7f4478 Update how "letters still created status" Zendesk tickets are created 2021-09-29 11:23:28 +01:00
Katie Smith
2f66e38fb9 Update how "missing ackfile for letters" Zendesk tickets are created 2021-09-29 11:10:50 +01:00
Katie Smith
64c0a3fb9d Update how 'letters still sending' Zendesk tickets are created
These now use the new Zendesk form.
2021-09-29 11:07:37 +01:00
Katie Smith
b114dadcae Update how pending virus check Zendesk tickets are created
This updates the tickets that are created when the
`check_if_letters_still_pending_virus_check` scheduled task detects
letters in the `pending-virus-check` state.
2021-09-29 11:03:48 +01:00
Katie Smith
9ff0ca0363 Update how live broadcast Zendesk tickets are created
These now use the Notify Form in Zendesk
2021-09-29 10:59:07 +01:00
Ben Thorner
68eeb1defa Merge pull request #3325 from alphagov/prevent-empty-areas-178986763
Add validation to prevent blank area names
2021-09-17 15:20:15 +01:00
Ben Thorner
d8a0967ec0 Add validation to prevent blank area names
Now that these are used for display on gov.uk/alerts we need to
make sure the data is being set properly. We've already found an
example where it wasn't [1]. We validate external broadcasts in
two stages: with the official CAP XML schema [2] and then again
with our own, more specific schema for the converted JSON. Since
this validation is a custom requirement I've made it part of the
JSON schema. Note that jsonschema recommends avoiding metachars
like "\w" since they're not supported by all implementations [3].

I've tested the new validation manually and it works as expected
by disallowing e.g. "  " but still alowing "foo" and "foo bar".

[1]: https://www.notifications.service.gov.uk/services/120107d0-d99a-4c42-8b70-f37d2f28879b/rejected-alerts/d6e0c70e-60f6-4422-8589-2a2d159c63f2
[2]: 81a25ff1ef/app/xml_schemas/CAP-v1.2.xsd
[3]: http://json-schema.org/understanding-json-schema/reference/regular_expressions.html
2021-09-17 13:33:52 +01:00
Ben Thorner
6a53871455 Restructure govuk-alerts endpoint to be internal
In response to: https://github.com/alphagov/notifications-api/pull/3305#pullrequestreview-726672421

Previously this was added among the public /v2 endpoints, but it's
only meant for internal use. While only the govuk-alerts app would
be able to access it, the location and /v2 URL suggested otherwise.
This restructures the endpoint so it resembles other internal ones.
2021-09-15 15:36:17 +01:00
Ben Thorner
12640e5380 Fix misleading test name for error scenario 2021-09-15 11:02:51 +01:00
Leo Hemsted
33ca817e17 return contact list created_at in UTC, not BST
make sure timestamps returned from the api are always consistent.

The only place in models where we're serializing a BST timestamp is on
the Notification.serialize_for_csv method now, which at least is a bit
different as this is user-facing (it also returns a formatted
human-readable notification_status for example).
2021-09-14 12:41:52 +01:00
Ben Thorner
81a25ff1ef Merge pull request #3327 from alphagov/bump-utils-46-0-0
Bump utils to 46.0.0
2021-09-14 11:22:10 +01:00
Ben Thorner
c53ee6de94 Merge pull request #3323 from alphagov/add-permission-local-dev
Make it easy to develop with broadcast services
2021-09-14 11:22:00 +01:00
Ben Thorner
4d05b064fd Remove redundant DB fixtures in conftest.py
These were only there to ensure a DB session existed for the test
and are now included implicitly as one of the dependencies of the
"sample_user" fixture.
2021-09-14 09:37:32 +01:00
Ben Thorner
c9ea85938a Refactor "create_user" to actually create a user
This switches a number of fixtures to use "sample_user", which is
equivalent to calling the previous "create_user" function when it
used to default the email to "notify@...".
2021-09-14 09:29:28 +01:00
Ben Thorner
922fd2f333 Support testing commands and add first test
We have a lot of commands and it's important we test the ones that
are meant to be used in the future to ensure they work when they're
needed. Testing Flask commands is usually easy as written in their
docs [1], but I had to make some changes to the way we decorate the
command functions so they can work with test DB objects - I couldn't
find any example of someone else encountering the same problem.

[1]: https://flask.palletsprojects.com/en/2.0.x/testing/#testing-cli-commands
2021-09-14 09:29:23 +01:00
sakisv
9faa3d34e1 Fix tests
Specifically, no longer test for a p1 zendesk when sending an alert
and drop misleading "p1" from test name when cancelling an alert.

We're no longer creating a P1 from the code, but we _do_ create a
zendesk ticket when sending out an alert.

When cancelling, what we want to test is that we don't create a second
ticket when the alert is cancelled.
2021-09-10 10:14:28 +03:00
Ben Thorner
11e1a597da Bump utils to 46.0.0
This brings in some new polygon simplication code [1] so we need to
tweak any tests that rely on the exact number of polygons after this
operation.

[1]: https://github.com/alphagov/notifications-utils/pull/890
2021-09-08 14:36:13 +01:00
Ben Thorner
54808104a6 Stop defaulting simple_polygons to empty array
This is now done by the Admin app [1].

[1]: baf20e0075
2021-09-07 17:16:24 +01:00
Ben Thorner
59d0ab4f65 Stop defaulting "ids" to an empty array
This is so we can distinguish custom broadcasts in the Admin app
[1]. I've also extended the POST test for custom broadcasts to
check we're correctly reading data for "names", as this wasn't
being tested previously.

[1]: 411fda81c0
2021-09-07 17:16:22 +01:00
Ben Thorner
dd41cf854c Remove support for old "areas" sub-field
All broadcasts with this field have now been migrated to use "ids".

This also removes a few lines that were missed in previous PRs:

- Added by mistake: fd7ebbebb0 (diff-045554136e1462693a6cbb6328b2e056a81e8b348e94575edd8f72b78c5da96eR115)
- Missed removal: ec1171f85c (diff-045554136e1462693a6cbb6328b2e056a81e8b348e94575edd8f72b78c5da96eR110)
2021-09-07 17:14:57 +01:00
Ben Thorner
6af39c4d3b Remove redundant force_overrride feature
This is no longer needed as all areas data has now been migrated.
2021-09-06 09:53:39 +01:00
Ben Thorner
d50c563f08 Remove support for "areas_2" field
The Admin app was only using this temporarily and is now using the
"areas" field instead [1], so we can delete this one.

[1]: https://github.com/alphagov/notifications-admin/pull/4006
2021-09-01 17:42:15 +01:00
Ben Thorner
be7272b44f Merge pull request #3314 from alphagov/next-stage-areas-migration-178986763
Switch "areas" field to "areas_2" format
2021-09-01 16:16:30 +01:00
Chris Hill-Scott
2c7e4657ce Don’t update email_access_validated_at on password reset
As of https://github.com/alphagov/notifications-admin/pull/4000/files
the admin app is doing this, so we don’t need to do it here as well.
2021-09-01 09:54:54 +01:00
Chris Hill-Scott
6900505b05 Don’t call random.choices with zero weighting
As of 041d8b48a2
it’s not valid to call `random.choices` without giving at least one of
the options a positive weighting.

This makes sense, because giving a zero weighting is effectively saying
‘theres’s only one choice, but don’t choose it’.

In our codebase this is applicable where there’s only one international
provider, which we want to use even when it’s been de-prioritised for
domestic SMS.

This doesn’t cause a problem now, but will if we upgrade to Python
versions greater than 3.9.0.
2021-08-31 11:08:27 +01:00
Ben Thorner
bf0bf4e31c Favour new "areas" format for PagerDuty alerts
Broadcasts created via the API [1] and the Admin app [2] should
both now have this field set. It's also more informative to show
this, and broadcasts created via the API don't have IDs anyway.

There's a small risk that an old broadcast that gets approved won't
have this data, but it's for information only and we intend to
backfill all old broadcasts in the near future.

[1]: 023a06d5fb
[2]: 7dbe3afa19
2021-08-27 14:22:12 +01:00
Ben Thorner
ec1171f85c Switch "areas" field to "areas_2" format
The Admin app is now temporarily using the "areas_2" field while
we migrate "areas" to the new format [1].

[1]: https://github.com/alphagov/notifications-admin/pull/4004
2021-08-27 14:22:11 +01:00
Ben Thorner
a7d92b9058 Replace / remove redundant uses of "areas"
In one case ("areas=['manchester']") the format was even invalid,
but in general the original value of the column is pretty much
irrelevant for tests that involve updating it (it's highly unlikely
the column would default to the same value as the test data).
2021-08-27 13:31:49 +01:00
Ben Thorner
194f54c38f Add missing tests for old format areas
This was (sort of) missed in [1], but it hasn't caused a problem
because the code to create/update broadcasts will populate areas
in the old ("areas") and the new ("ids") formats anyway [2].

However, we're about to remove create/update support code, so we
need to have something in place to cope with old and new format
data until we backfill old broadcasts with the new format [3].

[1]: https://github.com/alphagov/notifications-api/pull/3312
[2]: https://github.com/alphagov/notifications-api/pull/3312/files#diff-6be38b59e387630a0c6b8fc60312b7ba53ba9f36c54594fa5690646f286dd2b9R141
[3]: 9667433b7e
2021-08-27 13:18:12 +01:00
Ben Thorner
023a06d5fb Start dual running with "areas" and "names"
For the public API we actually receive a "name" instead of an ID,
which we also want to start sending from the Admin app.

Unlike IDs, which aren't really used anywhere, we want the names
to display the alerts on gov.uk/alerts.
2021-08-26 15:34:25 +01:00
Ben Thorner
8f39d476bd Start dual running with "areas" and (area) "ids"
This is necessary until:

- The Admin app is using the new "areas(_2)" format to store and
retrieve data.

- We've migrated all existing broadcast messages to use the new
format.

Note that "areas" / "ids" isn't actually used for anything except
printing out the PagerDuty message - it's not sent to the proxy [1].

[1]: 6edc6c70aa/app/celery/broadcast_message_tasks.py (L190-L193)
2021-08-26 15:34:24 +01:00