Commit Graph

7764 Commits

Author SHA1 Message Date
David McDonald
57f5bd76de Merge pull request #3081 from alphagov/ses-error-logs
SES error logs
2020-12-31 13:13:20 +00:00
Leo Hemsted
d470c928cd Merge pull request #3072 from alphagov/doc-dl-exc
handle doc dl connection errors correctly
2020-12-31 11:24:00 +00:00
David McDonald
56879d0d22 Make sure error message is logged as part of the exception 2020-12-31 11:08:09 +00:00
Chris Hill-Scott
8834377a5d Merge pull request #3074 from alphagov/serialise-process-type
Serialise process_type for template history
2020-12-31 09:54:00 +00:00
David McDonald
977554781f Add better logging message for tech failure
So we can easily identify which notification ID failed
2020-12-30 17:28:21 +00:00
David McDonald
2480f91667 Raise better exception on InvalidParameterValue error
There are several reasons why we might get an `InvalidParameterValue`
from the SES API. One, as correctly identified before in
https://github.com/alphagov/notifications-api/pull/713/files
is if we allow an email address on our side that SES rejects.

However, there are other types of errors that could cause an
`InvalidParameterValue`. One example is a `Header too long: 'Subject'`
error that we have seen happen in production. This shouldn't raise an
`InvalidEmailError` as that is not appropriate.

Therefore, we introduce a new exception
`EmailClientNonRetryableException`, that represents any exception back
from an email client that we can use whenever we get a
`InvalidParameterValue` error.

Note, I chose `EmailClientNonRetryableException` rather than
`SESClientNonRetryableException` as our code needs to catch this
exception and it shouldn't be aware of what email client is being used,
it just needs to know that it came from one of the email clients (if in
time we have more than one).

In time, we may wish to extend the approach of having generic
`EmailClient` exceptions and `SMSClient` exceptions as this should be
the most extendable pattern and a good abstraction.
2020-12-30 17:18:16 +00:00
David McDonald
2079202160 Stop logging email addresses for SES errors
We shouldn't be logging PII so we should not log email addresses. We
remove the email address and just log the normal exception message.

Note, this meant before that you could see the email address and more
easily track down the notification ID in the database. Now instead, you
will need to search in the DB for notifications that have gone into
technical failure at the time of the log message (as we still don't
log the notification ID alongside the failure).
2020-12-30 17:18:15 +00:00
David McDonald
6a95925897 Merge pull request #3078 from alphagov/up-memory
Add more memory for the sender and letter workers
2020-12-29 15:50:49 +00:00
Sakis
5e08cc7bc6 Merge pull request #3076 from alphagov/fix-app-logging
Fix app logging
2020-12-24 18:55:58 +02:00
sakisv
1bfdac8417 Temporarily remove disk space check from multi_worker script
There seems to be some kind of complication in this script that doesn't
allow it to terminate properly.

This is being removed for now to allow deploying the rest of the fixes
in time for the holiday period.
2020-12-24 18:44:26 +02:00
Chris Hill-Scott
c64e935168 Merge pull request #3079 from alphagov/pass-language-to-lambda
Pass language through to lambda
2020-12-24 16:06:40 +00:00
Chris Hill-Scott
9825469613 Make language attributes abstract properties
This will make it impossible to create a new client without at least
having to define these properties. Which should get someone thinking
about language support…
2020-12-24 15:19:46 +00:00
Chris Hill-Scott
c3a1d5c506 Pass language through to lambda
If we’re sending non-GSM characters, we need to mark the language in the
XML as Welsh (`cy-GB` in CAP, `Welsh` in IBAG).

Currently, the CBC proxy checks the content we’re sending, and then uses
an approximation based on ASCII to determine whether we’re sending any
non-GSM characters, and if so, sets the language appropriately.

Instead, we should can functionality from the notifications-utils repo
to determine the language. If any non-GSM characters are used, then the
we can set the language to Welsh.

We’ll need to update the proxy to look at this new language flag.
2020-12-24 15:15:32 +00:00
David McDonald
1ac3ca250c Add more memory for the sender and letter workers
On monday, we had a build of emails in the email queue that weren't
getting picked up by the sender worker and causing delays.

After further investigation with Andy from the PaaS, we believe the
following happened.

We received a bunch of traffic at 8:30ish which consisted of some
very large emails in terms of their length and complexity. The amount
of memory used by the app instances got very high and a few apps
crashed due to OOM (recorded by 5 cf app event crashes). When new
app instances tried to spin up, they weren't able to as they
potentially also ran out of memory immediately.

This left us in the position of having fewer app instances than we
needed, on top of which they were all using a very large amount of
CPU and may have been limited how quickly an individual app
instance would process tasks. This meant that we were overall
processing fewer tasks then we needed to and our queue of emails
started to build up.

So it appears our sender workers did not have the memory available that
they needed. By looking at a graph for the past 30 days of memory usage
on the sender workers, we see that it on several days breached 90%
memory usage for long periods of time. This in combination of the
hypothesis above of what happened leads us to decide that we want to
give the app instances a bigger memory quota so it has been upped from
3GB to 4GB.

Whilst doing, I also looked at long term memory usage graphs for our
other workers and saw that the letters worker was similarly close to
around 90% of memory used so have taken the opportunity to bump that
too.
2020-12-24 15:03:39 +00:00
David McDonald
701c7ba80d Merge pull request #3077 from alphagov/freezetime
Fix test that fails after 5:30pm
2020-12-24 10:35:53 +00:00
David McDonald
9aba3d758b Fix test that fails after 5:30pm
Was failing when ran after 5:30pm as this would cause the letters to be
in a different subfolder (for one day later). Solved by freezetiming it

Example build that failed: https://cd.gds-reliability.engineering/builds/1876957
2020-12-24 09:57:52 +00:00
sakisv
a6ecfd66b6 Terminate instance if it's running out of disk space 2020-12-23 19:40:04 +02:00
sakisv
2108498eb1 Send worker-sender celery logs to /dev/null
We are using our custom logger to log to `NOTIFY_LOG_PATH`, so this
logging from celery is neither needed nor desired.

We also need to define the location of the pidfiles, because of what
appears to be a bug in celery where it uses the location of logs to
infer the location of the pidfiles if it is not defined, i.e. in this
case it was trying to find the pidfiles in `/dev/null/%N.pid`.
2020-12-23 19:39:56 +02:00
Chris Hill-Scott
b8a191c3e3 Merge pull request #3071 from alphagov/add-flake8-bugbear
Do extra code style checks with flake8-bugbear
2020-12-23 16:20:48 +00:00
Chris Hill-Scott
b1dc8cc758 Serialise process_type for template history
We already serialise it in the templates response. We should make sure
the field is also present in the history response, if we want to use
cached template versions when processing notifications.
2020-12-23 13:57:38 +00:00
Rebecca Law
a2bb775b6f Merge pull request #3069 from alphagov/add-request-id-if-in-context
Pass request_id onto the task if called from a task
2020-12-23 12:26:04 +00:00
Chris Hill-Scott
0ed1f32972 Move flake8 config into setup.cfg
At the moment the two are duplicative, and it’s not clear which takes
precedence.

We need to keep the `excludes` line otherwise running the `flake8`
command takes ~3 times longer and may lint 3rd party files that we don’t
control.
2020-12-23 12:23:15 +00:00
Leo Hemsted
325f271e25 handle doc dl connection errors correctly
previously we'd see an error message in the logs:
`AttributeError: 'NoneType' object has no attribute 'status_code'`
because we were assuming the requests exception would always have a
response - it won't have a response if it wasn't able to create a
connection at all.
2020-12-23 12:21:24 +00:00
Rebecca Law
a1b31a6c20 Check for app_context and request in g to prevent Attribute Errors.
We can add a request_id for tasks that are not spawned by an HTTP request, for example scheduled or nightly tasks. That means you can match up all the tasks spawned by a single task, for example, create-night-billing spawns 4 tasks, those would all have the same idea. Not sure if that is helpful or not. Also it might be confusing to have a request_id for logs that were not started from a request so I have left it out.
2020-12-23 09:47:47 +00:00
Chris Hill-Scott
3b0b96834d Do extra code style checks with flake8-bugbear
Flake8 Bugbear checks for some extra things that aren’t code style
errors, but are likely to introduce bugs or unexpected behaviour. A
good example is having mutable default function arguments, which get
shared between every call to the function and therefore mutating a value
in one place can unexpectedly cause it to change in another.

This commit enables all the extra warnings provided by Flake8 Bugbear,
except for:
- the line length one (because we already lint for that separately)
- B903 Data class should either be immutable or use `__slots__` because
  this seems to false-positive on some of our custom exceptions
- B902 Invalid first argument 'cls' used for instance method because
  some SQLAlchemy decorators (eg `declared_attr`) make things that
  aren’t formally class methods take a class not an instance as their
  first argument

It disables:
- _B306: BaseException.message is removed in Python 3_ because I think
  our exceptions have a custom structure that means the `.message`
  attribute is still present

Matches the work done in other repos:
- https://github.com/alphagov/notifications-admin/pull/3172/files
2020-12-22 16:26:45 +00:00
Rebecca Law
025b51c801 If the request_id exists in the Flask global context, add it to the kwargs for the task.
The request_id is set is the task is created from a http request, if that task then calls through to another task this will set the request_id from the global context. We should then be able to follow the creation of a notification all the way from the original http request to the sending task.
2020-12-22 15:21:32 +00:00
David McDonald
7152ac7cba Merge pull request #3068 from alphagov/bad-logging
Fix logging line to include response context
2020-12-22 11:26:24 +00:00
David McDonald
fae7e917b5 Fix logging line to include response context
We have been getting log lines of the following:

`API POST request on
https://api.notifications.service.gov.uk/notifications/sms/mmg failed
with None`

It's not clear what error caused the request to fail because the value
of `api_error.response` is always `None`.

There appears to be something wrong with this logging.
`raise_for_status` will raise an `HTTPError`, so then there should be no
reason to then pass that error into another `HTTPError` (which is
causing the response to be lost).

We can instead simply catch the `HTTPError` and log it's status
code.

This might not be perfect, but it's definitely an improvement and should
give us some more context about why these requests occasionally fail.
2020-12-21 14:39:10 +00:00
Pea M. Tyczynska
2749a707f2 Merge pull request #3067 from alphagov/fix-cancel-broadcast
Fix cancel broadcast by converting reference date to string
2020-12-21 13:56:19 +00:00
David McDonald
b40a9c0e83 Merge pull request #3063 from alphagov/letter-retention-fixes-third-approach
Do not let us delete letters that have not reached a final state
2020-12-21 12:51:29 +00:00
Chris Hill-Scott
f76ede6bc4 Merge pull request #3066 from alphagov/bump-utils-43.5.8
Bump utils to 43.5.9
2020-12-21 12:10:04 +00:00
Pea Tyczynska
95deb5a52f Move DATETIME_FORMAT from app to app.utils
To avoid cyclical import issues
2020-12-18 17:39:35 +00:00
Pea Tyczynska
ee833bd65b Fix cancel broadcast by converting reference date to string
Datetime oobject is not json serializable, we have to convert
it to string for the created_at field of previous broadcast
provider messages.
2020-12-18 17:22:21 +00:00
Chris Hill-Scott
b6734d25d0 Bump utils to 43.5.9
Changes:
https://github.com/alphagov/notifications-utils/compare/43.5.8...43.5.9
2020-12-18 15:37:15 +00:00
Chris Hill-Scott
4e009af2f7 Bump utils to 43.5.8
Changes:
https://github.com/alphagov/notifications-utils/compare/43.5.7...43.5.8
2020-12-18 14:35:16 +00:00
Pea M. Tyczynska
519568970c Merge pull request #3059 from alphagov/cancel_broadcast_cbc
Add cancel routes to cbc proxy clients
2020-12-18 12:09:49 +00:00
Pea M. Tyczynska
bb41cabeb6 Merge pull request #3065 from alphagov/up-email-size-limit
Increase email size limit to 2MBby pulling in new utils
2020-12-16 16:14:58 +00:00
Pea Tyczynska
4fc3f95c41 Increase email size limit to 2MBby pulling in new utils
This is because GOV.UK has hit the email size limit with their
weekly digest email.
2020-12-16 15:59:49 +00:00
David McDonald
e35ea57ba2 Do not delete letters if not in final state
A few weeks ago, we deleted some pdf letters that had reached their
retention period. However, these letters were in the 'created' state so
it's very arguable that we should not have deleted them because we were
expecting to resend them and were unable to. Part of the reason for this
is that we marked the letters back to `created` as the status but we did
not nullify the `sent_at` timestamp, meaning the check on
ebb43082d5/app/dao/notifications_dao.py (L346)
did not catch it. Regardless of that check, which controls whether the
files were removed from S3, they were also archived into the
`notification_history` table as by default.

This commit does changes our code such that letters that are not in
their final state do not go through our retention process. This could
mean they violate their retention policy but that is likely the lesser
of two evils (the other being we delete them and are unable to resend
them).

Note, `sending` letters have been included in those not to be removed
because there is a risk that we give the letter to DVLA and put it in
`sending` but then they come back to us later telling us they've had
problems and require us to resend.
2020-12-16 10:50:11 +00:00
David McDonald
47e146f010 Move variable out of loop that didn't need to be 2020-12-16 10:40:30 +00:00
David McDonald
1bf9b29905 Document behaviour of s3 letter deleting
The behaviour was a bit of opaque so I have added tests around it so
it's clear what it is doing and why. No functionality has changed
2020-12-16 10:39:31 +00:00
David McDonald
219023f4c6 Fix test that was passing unintentionally
This test was added in
ebb43082d5

However, there are a few problems with it

1. The test name doesn't seem to match what the code is doing. It looks
   like instead that it is NOT trying to delete from s3 when the letter
   has not been sent and therefore I've updated the test name as such.

2. `delete_notifications_older_than_retention_by_type` was being called
   with `email` as it's argument which doesn't match. This is a test for
   letters. It definitely wouldn't do any looking in s3 for emails.

3. `created_at` needed bumping back into the past, past the default 7
   days retention so these letters would be considered old enough to
   delete

4. For the letter to not be sent, it needs to be in `created`, not in
   `sending` so I have updated the status. Note, there could be other
   statuses which class as 'not sent' but this is the most obvious one
   to test with
2020-12-15 09:48:08 +00:00
Pea Tyczynska
4758d8c4cb Format message_number for references
In IBAG format for broadcasts, we need to give sequential number
of previous message, and it needs to be formatted as a hex padded
with zeroes to be 8 character long.

This commit adds the necessary formatting.
2020-12-14 18:21:28 +00:00
Pea Tyczynska
45b806f6db Remove unused args from cancel broadcast call in tasks 2020-12-14 11:31:05 +00:00
Pea Tyczynska
35a212d907 Add cancel routes to cbc proxy clients
Also clean the code up a bit.
2020-12-11 18:52:54 +00:00
Chris Hill-Scott
34e8da7285 Merge pull request #3058 from alphagov/bump-utils-43.5.6
Bump utils to 43.5.6
2020-12-11 16:11:51 +00:00
Chris Hill-Scott
2165cbaf76 Bump utils to 43.5.6
Changes:
https://github.com/alphagov/notifications-utils/compare/43.5.4...43.5.6
2020-12-11 10:56:06 +00:00
Richard Baker
8a3e90ae39 Merge pull request #3057 from alphagov/fix_cap_message_format
Set cbc proxy message_format to "cap"
2020-12-09 17:29:24 +00:00
Richard Baker
4dd37acecb Set cbc proxy message_format to cap
The CBC proxy lambda expects the message_format parameter to be one of `cap` or `ibag`.

Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
2020-12-09 17:10:40 +00:00
Pea M. Tyczynska
a70b7c521e Merge pull request #3053 from alphagov/ibag-message-number
Add sequential message number to broadcast provider messages
2020-12-09 13:02:25 +00:00