Was failing when ran after 5:30pm as this would cause the letters to be
in a different subfolder (for one day later). Solved by freezetiming it
Example build that failed: https://cd.gds-reliability.engineering/builds/1876957
We are using our custom logger to log to `NOTIFY_LOG_PATH`, so this
logging from celery is neither needed nor desired.
We also need to define the location of the pidfiles, because of what
appears to be a bug in celery where it uses the location of logs to
infer the location of the pidfiles if it is not defined, i.e. in this
case it was trying to find the pidfiles in `/dev/null/%N.pid`.
For every email or text message we send we have to work out which
provider to send it with. Every time we do this we go and load the list
of providers from the database.
For emails, the result will always be the same.
For text messages the result is randomly chosen to balance the load
between the providers.
For international text messages the result is always the same (we only
have one international text message provider).
This commit adds an in-memory cache with a 2 second TTL so that we’re
not fetching the providers from the database every time, which should
speed things up a bit.
This does mean that, for text messages, the random choice will ‘stick’
for two seconds on each instance, before being re-chosen. I think this
is OK because it will even out to the same distribution over time.
I really don’t like having to clear the cache in the tests, so would
welcome suggestions on a better way of doing this…
We already serialise it in the templates response. We should make sure
the field is also present in the history response, if we want to use
cached template versions when processing notifications.
At the moment the two are duplicative, and it’s not clear which takes
precedence.
We need to keep the `excludes` line otherwise running the `flake8`
command takes ~3 times longer and may lint 3rd party files that we don’t
control.
previously we'd see an error message in the logs:
`AttributeError: 'NoneType' object has no attribute 'status_code'`
because we were assuming the requests exception would always have a
response - it won't have a response if it wasn't able to create a
connection at all.
We can add a request_id for tasks that are not spawned by an HTTP request, for example scheduled or nightly tasks. That means you can match up all the tasks spawned by a single task, for example, create-night-billing spawns 4 tasks, those would all have the same idea. Not sure if that is helpful or not. Also it might be confusing to have a request_id for logs that were not started from a request so I have left it out.
Flake8 Bugbear checks for some extra things that aren’t code style
errors, but are likely to introduce bugs or unexpected behaviour. A
good example is having mutable default function arguments, which get
shared between every call to the function and therefore mutating a value
in one place can unexpectedly cause it to change in another.
This commit enables all the extra warnings provided by Flake8 Bugbear,
except for:
- the line length one (because we already lint for that separately)
- B903 Data class should either be immutable or use `__slots__` because
this seems to false-positive on some of our custom exceptions
- B902 Invalid first argument 'cls' used for instance method because
some SQLAlchemy decorators (eg `declared_attr`) make things that
aren’t formally class methods take a class not an instance as their
first argument
It disables:
- _B306: BaseException.message is removed in Python 3_ because I think
our exceptions have a custom structure that means the `.message`
attribute is still present
Matches the work done in other repos:
- https://github.com/alphagov/notifications-admin/pull/3172/files
The request_id is set is the task is created from a http request, if that task then calls through to another task this will set the request_id from the global context. We should then be able to follow the creation of a notification all the way from the original http request to the sending task.
We have been getting log lines of the following:
`API POST request on
https://api.notifications.service.gov.uk/notifications/sms/mmg failed
with None`
It's not clear what error caused the request to fail because the value
of `api_error.response` is always `None`.
There appears to be something wrong with this logging.
`raise_for_status` will raise an `HTTPError`, so then there should be no
reason to then pass that error into another `HTTPError` (which is
causing the response to be lost).
We can instead simply catch the `HTTPError` and log it's status
code.
This might not be perfect, but it's definitely an improvement and should
give us some more context about why these requests occasionally fail.
A few weeks ago, we deleted some pdf letters that had reached their
retention period. However, these letters were in the 'created' state so
it's very arguable that we should not have deleted them because we were
expecting to resend them and were unable to. Part of the reason for this
is that we marked the letters back to `created` as the status but we did
not nullify the `sent_at` timestamp, meaning the check on
ebb43082d5/app/dao/notifications_dao.py (L346)
did not catch it. Regardless of that check, which controls whether the
files were removed from S3, they were also archived into the
`notification_history` table as by default.
This commit does changes our code such that letters that are not in
their final state do not go through our retention process. This could
mean they violate their retention policy but that is likely the lesser
of two evils (the other being we delete them and are unable to resend
them).
Note, `sending` letters have been included in those not to be removed
because there is a risk that we give the letter to DVLA and put it in
`sending` but then they come back to us later telling us they've had
problems and require us to resend.
This test was added in
ebb43082d5
However, there are a few problems with it
1. The test name doesn't seem to match what the code is doing. It looks
like instead that it is NOT trying to delete from s3 when the letter
has not been sent and therefore I've updated the test name as such.
2. `delete_notifications_older_than_retention_by_type` was being called
with `email` as it's argument which doesn't match. This is a test for
letters. It definitely wouldn't do any looking in s3 for emails.
3. `created_at` needed bumping back into the past, past the default 7
days retention so these letters would be considered old enough to
delete
4. For the letter to not be sent, it needs to be in `created`, not in
`sending` so I have updated the status. Note, there could be other
statuses which class as 'not sent' but this is the most obvious one
to test with
In IBAG format for broadcasts, we need to give sequential number
of previous message, and it needs to be formatted as a hex padded
with zeroes to be 8 character long.
This commit adds the necessary formatting.
for reasons unknown, using a file descriptor no longer works on
concourse (or in that docker container generally). It might be a change
within cf-cli v7 vs v6.
Either way, it does't work, so use a temporary file. Clean up the
temporary file afterwards.
If the command fails, the temporary file will still stick around, so
I've added the file to the /tmp/ folder instead. it's full of secret
keys and things so if you do have a deployment error while running
locally, you should make sure you clean up the file (make cf-rollback
and make clean will both do this for you).