Since we moved to Dockerised Celery the pycurl guidance should no
longer be necessary and doesn't work on newer Mac M1 machines.
This also adds new guidance for psycopg2, which will hopefully be
temporary (see issue comment).
there's not anything we know we need to do now that we resolve stuck
letters automatically. Letters couuld still get into this state, so it's
worth alerting us. However, we don't have anything concrete that we know
how to fix these letters, so we should just remove the runbook entirely.
Currently "test_send_letter_notification_via_api" fails at the final
stage in create-fake-letter-response-file [^1]:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6011): Max retries exceeded with url: /notifications/letter/dvla (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff95ffc460>: Failed to establish a new connection: [Errno 111] Connection refused'))
This only applies when running in Docker so the default should still
be "localhost" for the Flask app itself.
[^1]: 5093064533/app/celery/research_mode_tasks.py (L57)
It is currently 60 seconds but we have had two incidents in the
past week where there is a connection error talking to a service
and the request takes up to 60 seconds before failing. When this
happens, if there are a few of these callbacks then all of them
will completely hog the service callback worker and build up a big
queue of all the other service callbacks.
5 seconds has been chosen as that is still a pretty decent length
time for a simple web request that should just be giving them a
little bit of information for them to store. 5 seconds should be a
sufficient enough reduction that we dramatically reduce this problem
for the moment.
Open to this number
being changed in the future based on how we see it perform.
Since sept 2019 we've had to log on to production around once every
twenty days to restart the virus scan task for a letter. Most of the
time this is just a case of making sure the file is in the scan bucket,
and then triggering the task. If the file isn't in the scan bucket we'd
need to do some more manual investigation to find out exactly where the
file got stuck, but I can only remember times when it's been in the scan
bucket.
So if the file is in the scan bucket, we can just check that with code
and kick the task off automatically.
This follows the pattern for invite emails where the admin app tells the
API which domain to use when generating the link.
This will starting working once the admin change is merged:
- [ ] TBC
It won’t break anything if it’s merged before the admin change.
Daily volumes report: total volumes across the platform aggregated by whole business day (bst_date)
Volumes by service report: total volumes per service aggregated by the date range given.
NB: start and end dates are inclusive
if we have too many returned letters, we'll exceed SQS's max task size
of 256kb. Cap it to 5000 - this is probably a bit conservative but
follows the initial values we used when implementing this for the
collate-letters-task[^1]. Also follow the pattern of compressing the
sqs payload just to reduce it a little more.
[^1]: https://github.com/alphagov/notifications-api/pull/1536
We had an inbound number in the database with a value of ''. This
could happen if there are blank lines in the inbound numbers file
we use for the `insert-inbound-numbers` command. To avoid this
happening again, the command now calls `.strip()` on each line of the
file and only inserts a row if the result is truthy (i.e. not '').
- second class postage will go up by 2 pence, plus VAT
- international postage will go up by 7 pence, plus VAT
- first class postage will go down by 6 pence, plus VAT
The test was querying `FactNotificationStatus` and ordering the results
by bst_date and notification_type then checking the rows. However, the
bst_date and notification_type for each row is the same, so this test
could fail based on the order that the results came back in. By ordering
on the notification_status instead, we can be sure of the order of the
results.
This changes the scheduled task to raise an alert if letters are still
sending from 1530 to 1700. DVLA have reported that our "monitoring is
executing just before we actually mark them as ‘despatched’ and send
you the feedback files." and asked us to make the check a little later.
We don't actually contact DVLA until the morning after the alert anyway,
so this won't affect the process of getting in touch with them.
This change will require Cronitor to be updated for the new time.