This was introduced in #1811 as a way to avoid sending traffic to newly
created apps where gunicorn had not started yet, such as the case during
a scaling event. These days we depend mostly on scheduled scaling and we
rarely need to scale above the scheduled values.
Yesterday we had an event where (during a traffic spike) the healthcheck
failed causing the instance to be killed and sending a 5XX response code
to all the connections that this instance was handling at the time.
However, this instance was not unhealthy and was serving traffic. The
problem stems from a combination of using async workers, having to limit
the number of database connections and a thread holding onto a db
connection for the entire duration of the request.
Specifically, we end up having requests queued up in gunicorn waiting
for other requests to finish and release the db connection. Some pages
such as the dashboard generate queries that can take >5s.
If a healthcheck request is sent during a traffic spike and the instance in
question was "unfortunate" enough to get handled a few of these long
running queries, the healthcheck request will be queued up behind these
slow requests and will fail to receive a response within 1s [docs].
Ideally we should be able to configure the healthcheck timeout to a
value of our choosing, since we can end up in this situation again in
the future.
docs: https://docs.cloudfoundry.org/devguide/deploy-apps/healthchecks.html#types
To start with this will be an attribute on the service, at the time the notification is created it will look at Service.letter_class to decide what class to use for the letter.
This PR adds Service.letter class as a nullable column.
Updated the create_service and update_service method to default the value to second.
Subsequent PRs will add the check constraint to ensure we only get first or second in the letter_class column and make that column nullable.
This can't be done all at once because it will cause an error if someone inserts or updates a service during the deploy.
Sets the updated_at to current time when the NotificationHistory
is modified (which happens occasionally for example to set status
to returned letter).
There was a datetime bug in the query which resulted in files not being sent to the postal provider.
The trigger-letter-pdfs-for-day task is no longer needed, so rather than fix the query just call collate_letter_pdfs_for_day directly.
Less code is always better.
Deployment considerations: I realized this is strictly not backwards compatible if the scheduled job is in progress and a task is on the queue that no longer exists. This is ok since we will deploy this well before 17:50.
The concern about performnace degrading has been thought through. We do not believe there will be an adverse effect since the high volume users do not send off messages.
Adds an API endpoint `/letters/returned` that accepts a list of
notification references and creates a task to update their status.
Adds a new task that uses the list of references to update the status
of notifications to 'returned-letter'.
The update is currently done using a single query and logs the
number of changed records (including notification history records).
This could potentially be done within the `/letters/returned` endpoint,
but creating a job right away allows us to extend this more easily
in the future (e.g. logging missing notifications or adding callbacks).
The job is using the database tasks queue.
For returned letter updates most notifications won't exist in the
notifications table, so in order to find out whether the reference
matches any known letters we need to check the count of updated
history records.
We need to update letter notifications with a new status when DVLA
gives us a list of references for returned letters.
This adds the new status to the models and the DB.
DVLA call this 'returned mail', so I'm using it as the status name
since it seems less ambiguous than 'returned'.
Does two things:
1. Revert "Revert "Add unique constraint to email branding domain""
This reverts commit af9cb30ef3.
2. Don’t allow empty string in email branding domain
Columns with multiple `null`s can have a uniqueness constraint. Columns
with multiple empty string values are not considered unique.
This commit:
- removes any duplicate empty string values
- casts empty strings to null string any time these columns are updated
---
Squashed into this single commits because these two things are not
atomic as individual commits.