This deletes a big ol' chunk of code related to letters. It's not everything—there are still a few things that might be tied to sms/email—but it's the the heart of letters function. SMS and email function should be untouched by this.
Areas affected:
- Things obviously about letters
- PDF tasks, used for precompiling letters
- Virus scanning, used for those PDFs
- FTP, used to send letters to the printer
- Postage stuff
We were using the Draft4Validator in one place, so this updates it to
the Draft7Validator instead.
The schemas were mostly using draft 4 of the JSON schema, though there
were a couple of schemas that were already of version 7. This updates
them all to version 7, which is the latest version fully supported by
the jsonschema Python package. There are some breaking changes in the
newer version of the schema, but I could not see anywhere would these
affect us. Some of these schemas were not valid in version 4, but are
now valid in version 7 because `"required": []` was not valid in earlier
versions.
Before we implemented ‘cancel’ any updates to a broadcast went through
the admin app. This meant the admin app could deal with clearing the
cache any time a broadcast was updated by a user performing an action.
Now that a broadcast can be updated without the admin app being involved
we have another place we need to clear the cache from.
If we don’t do this then the broadcast can look like it’s still going
even though it’s successfully been cancelled.
Because the `<reference>` field of a `cancel` message can contain an
arbitrary number of items it’s possible for it to reference more than
one current alert.
In this case it is ambiguous which alert should be cancelled, so we
should raise a custom error.
This will help people know that they have to manually go into Notify and
figure out which alert(s) to cancel there.
The XML for an alerts requires a `<description>` field. The XML for
a `<cancel>` may have a `<description>` field populated (although we
ignore the contents) but it may also be empty.
This commit updates the schema to leave the all the validation to the
view layer, which can decide when or when not to validate the content of
the `<description>` field.
If someone tries to cancel a broadcast but the references don’t match
and existing broadcast we correctly return a 404.
If they don’t provide any references then we get an exception. This
commit catches the missing references and returns a 400. I think this
is more appropriate because it’s malformed request, rather than a
well-formed request that doesn’t match our data. It also lets us write a
more specific and helpful error message.
This caught me out the other day when I was testing. I thought antivirus
was enable, seeing this message didn't help with that assumption. Plus
there is no point in showing the log if we don't actually call the task.
This is because that function is used both when broadcast status
is updated via API and via admin, so it's a shared resource.
Also move and update tests for updating broadcast message status
so things are tested at source and repetition is avoided.
If the reference from cancel CAP XML we received via API does not
match with any existing broadcast, return 404.
Do the same if service id doesn't match.
Also refactor code to cancel broadcast out into separate function
It should be a separate function that is only called by create_broadcast
function. This will prevent create_broadcast from becoming too
big and complex and doing too many things.
1. The number of letters that we send to DVLA will be not be correct (see 20ead82463/app/celery/letters_pdf_tasks.py (L136))
This may raise an alert with DVLA when they find we have sent them fewer letter than we have reported.
2. When we get the PDF from S3 we will get a file not found 20ead82463/app/celery/letters_pdf_tasks.py (L244)
The error will not prevent the collate task from completing but we will see an alert email for the exception and raise questions.
Although this situation is very unlikely because we have a 15 minute window between the last letter deadline date and the time we kick off the collate task we should still mitigate these issues. I updated the queries to only return letters with billable_units > 0, all valid letters should have at least 1 billable unit.
It is possible that the personalisation for a templated letter can make the letter exceed 10 pages or 5 sheets. We are not validating the letters posted via the API for this validation error. It is only possible to validate the letter once we create the PDF in notifications-template-preview. This means that the letter can only get a validation-failed status after the client has received a 201 from the POST to /v2/notifications.
NOTE: we only validate the preview row of a CSV for this validation error, this change will mean that it is possible for a letter to be marked as validation-failed after a successful file upload.
A new task to update the notification to `validation-failed` has been added to the API. If we find that the letter is too long once we have created the PDF we call the `update-validation-failed-for-templated-letter` task rather than `update-billable-units-for-letter` task.
New work flow for a letter in brief:
API - receives POST /v2/notifications
:: save to db
:: put CREATE_LETTERS_PDF task on queue for template preview to consume
TEMPLATE-PREVIEW - consumes task CREATE_LETTERS_PDF
:: create PDF
:: count pages of PDF
:: IF page count exceeds 10 pages
put in the letters-invalid-pdf S3 bucket with metadata (similar to the precompiled letters)
put `update-validation-failed-for-templated-letter` task on the queue for the API to consume
ELSE
put PDF in the `letters-pdf` bucket
put `update-billable-units-for-letter` task on the queue
API - consumes `update-billable-units-for-letter` OR `update-validation-failed-for-templated-letter` task
:: IF `update-billable-units-for-letter` task:
update billable units for notification as usual
:: ELSE `update-validation-failed-for-templated-letter`:
update notification_status = `validation-failed`
ADMIN - view notification page for letter
:: show validation letter for templated letter
There will be 3 PRs in order to make this change, one for the API, template-preview and the admin app.
Deployment plan
Deploy Admin first
Deploy API
Deploy template-preview
Related PRs:
alphagov/notifications-template-preview#619alphagov/notifications-admin#4107https://www.pivotaltracker.com/story/show/169209742
This is a similar PR to https://github.com/alphagov/notifications-api/pull/2284.
When using flask-sqlalchemy to get a `Pagination` object, by default
it will run two queries
1. Get the page of results that you are asking for
2. If the number of results is equal to the page size, then it will
issue a second query that will count the total number of results
Getting the total number of results is only useful if
- you need to show how many results there are
- you need to know if there is a next page of results (flask-sqlalchemy
uses the total to work out how many pages there are altogether,
which may not be the most efficient way of working out if there
is a next page or not but that is what it currently does).
Looking at the `get_notifications` route, it does
not use `paginated_notifications.total` or
`paginated_notifications.has_next` and therefore we have no use
for the second query to get the total number of results.
We can stop this additional query by setting `count_pages=False`
which will hopefully give us some performance improvements, in
particular for services which send a lot of notifications.
Flask sqlalchemy references:
818c947b66/src/flask_sqlalchemy/__init__.py (L478)818c947b66/src/flask_sqlalchemy/__init__.py (L399)
Note, I have checked the other uses of `get_notifications_for_service`
and the other cases are currently using the total or next page so
this approach is not something we can take with them.
If a polygon is smaller than the largest polygon in our dataset of
simplified polygons then we’re only throwing away useful detail by
simplifying it.
We should still simplify larger polygons as a fallback, to avoid sending
anything to the CBC that we’re not sure it will like.
The thresholds here are low: we can raise them as we test and experiment
more.
Here’s some data about the Flood Warning Service polygons
Percentile | 80% | 90% | 95% | 98% | 99% | 99.9%
-----------|-----|-------|--------|---------|---------|---------
Point count| 226 | 401.9 | 640.45 | 1015.38 | 1389.07 | 3008.609
Percentile | 80% | 90% | 95% | 98% | 99% | 99.9%
--------------|-----|-------|--------|---------|---------|---------
Polygon count |2----|3------|5-------|8--------|10-------|40.469
This new version of utils implements the transformation of our polygons
to a Cartesian plane. In other words, it converts them from being
defined in spherical degrees to metres.
For the API this means our simplification will be slightly more
accurate.
Previously we were catching one type of exception if something went
wrong adding a notification to the queue for high volume services.
In reality there are two types of exception so this adds a second
handler to cover both.
For context, this is code we changed experimentally as part of the
upgrade to Celery 5 [1]. At the time we didn't check how the new
exception compared to the old one. It turns out they behaved the
same and we were always vulnerable to the scenario now covered by
the second exception, where the behaviour has changed in Celery 5 -
testing with a large task invocation gives...
Before (Celery 3, large-ish task):
'process_job.apply_async(["a" * 200000])'...
boto.exception.SQSError: SQSError: 400 Bad Request
<?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>InvalidParameterValue</Code><Message>One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes.</Message><Detail/></Error><RequestId>96162552-cd96-5a14-b3a5-7f503300a662</RequestId></ErrorResponse>
Before (Celery 3, very large task):
<hangs forever>
After (Celery 5, large-ish task):
botocore.exceptions.ClientError: An error occurred (InvalidParameterValue) when calling the SendMessage operation: One or more parameters are invalid. Reason: Message must be shorter than 262144 bytes.
After (Celery 5, very large task):
botocore.parsers.ResponseParserError: Unable to parse response (syntax error: line 1, column 0), invalid XML received. Further retries may succeed:
b'HTTP content length exceeded 1662976 bytes.'
[1]: 29c92a9e54
We don’t store everything that comes in the CAP XML when someone creates
a broadcast via the API.
One thing we do store is `<identifier>` (in a column called `reference`)
which is a unique (to the external system) identifier for the broadcast.
We show this in the front end instead of the template name, because
broadcasts created from the API don’t use templates.
However this ID isn’t very friendly – the Environment Agency just supply
a UUID.
The Environment Agency also populate the `<event>` field with some human
readable text, for example:
> 013 Issue Severe Flood Warning EA
(013 is an area code which will be meaningful to the Flood Warning
Service team)
We should show this in the UI instead of the reference. The first step
towards this is storing it in the database and returning it in the REST
endpoints.
Later we can have the admin app prefer `cap_event` over `reference`,
where `cap_event` is present.
We can’t backfill this data because we don’t keep a copy of the original
XML.
Seems like `<event>` is a mandatory property of `<info>`, so we don’t
need to worry about the field being missing (`<info>` is optional in
CAP but we require it because it contains stuff like the areas which
we need in order to send out the broadcast`).
***
https://www.pivotaltracker.com/story/show/176927060
In response to: https://github.com/alphagov/notifications-api/pull/3305#pullrequestreview-726672421
Previously this was added among the public /v2 endpoints, but it's
only meant for internal use. While only the govuk-alerts app would
be able to access it, the location and /v2 URL suggested otherwise.
This restructures the endpoint so it resembles other internal ones.
For the public API we actually receive a "name" instead of an ID,
which we also want to start sending from the Admin app.
Unlike IDs, which aren't really used anywhere, we want the names
to display the alerts on gov.uk/alerts.
If a polygon is smaller than the largest polygon in our dataset of
simplified polygons then we’re only throwing away useful detail by
simplifying it.
We should still simplify larger polygons as a fallback, to avoid sending
anything to the CBC that we’re not sure it will like.
The thresholds here are low: we can raise them as we test and experiment
more.
Here’s some data about the Flood Warning Service polygons
Percentile | 80% | 90% | 95% | 98% | 99% | 99.9%
-----------|-----|-------|--------|---------|---------|---------
Point count| 226 | 401.9 | 640.45 | 1015.38 | 1389.07 | 3008.609
Percentile | 80% | 90% | 95% | 98% | 99% | 99.9%
--------------|-----|-------|--------|---------|---------|---------
Polygon count |2----|3------|5-------|8--------|10-------|40.469
This new version of utils implements the transformation of our polygons
to a Cartesian plane. In other words, it converts them from being
defined in spherical degrees to metres.
For the API this means our simplification will be slightly more
accurate.
We can define the API properly in future work. I've used a separate
blueprint from "broadcasts" since this API is purely internal, and
it's helpful to make it clear it's specific to govuk-alerts.
Just looks a bit tidier and less repetitive.
I’ve only done this for the serialised service because:
- we’re only checking this in places where we’re already using the
serialised service
- if we want to check this elsewhere there’s a good chance that new code
should be using the serialised service, since it’ll itself be doing
some kind of performance optimisation
We haven't bumped the test version for a while.
Also bumped the version of Flask and itsdangerous.
In order to fix flask warnings I needed to changed how the blueprints were registerd.