Instead of saving the email notification to the db add it to a queue to save later.
This is an attempt to alleviate pressure on the db from the api requests.
This initial PR is to trial it see if we see improvement in the api performance an a reduction in queue pool errors. If we are happy with this we could remove the hard coding of the service id.
In a nutshell:
- If POST /v2/notification/email is from our high volume service (hard coded for now) then create a notification to send to a queue to persist the notification to the db.
- create a save_api_email task to persist the notification
- return the notification
- New worker app to process the save_api_email tasks.
We are formatting the postcode here, because if we did it in template
preview, that could break flows like API and admin one-off, since
we are not validating postcode there yet, and format_postcode
needs a nice validated postcode.
We are not doing it in admin, as then we would have to either
rewrite the CSV file or pass data differently to API. First would
be nasty, second is a lot of overhead.
In the long run we might want to move postcode formatting to
template preview so that the postcode in letter preview looks the same
before and after user sends it, but now to get it out quickly it's better
to format the postcode here in the task.
the function no longer makes sense now that we send through both at
the same time. mostly just used in old tests that we'll end up rewriting
shortly anyway
Sometimes a job finishes but has missed a row in the middle. It is a mystery why this is happening, it could be that the task to save the notifications has been dropped.
So until we solve the missing let's find missing rows and process them.
A new scheduled task has been added to find any "finished" jobs that do not have enough notifications created. If there are missing notifications the job processes those rows for the job.
Adding the new task to beat schedule will be done in the next commit.
A unique key constraint has been added to Notifications to ensure that the row is not added twice. Any index or constraint can affect performance, but this unique constraint should not affect it enough for us to notice.
- Change the NotificationTechnicalFailureException so that it only inherits from Exception.
- The notify_celery task should create the logging message on failure.
- Fix unit tests
- Remove named parameter when raising exception.
Bumped utils to version 31.2.5, which changes when the rows of a
RecipientCSV get created. Switched to using `.get_rows()` from
RecipientCSV (a generator) instead of the `.rows` property (which builds
a list of the rows in memory).
We need to back fill the daily_sorted_count tables, so we need to iterate through all the files. No need to update the notification status. So this task has been separated out.
Bumped notifications-utils to 3.7.0. Version 3.7.0 includes the
`convert_utc_to_bst` and `convert_bst_to_utc` functions and the
`LETTER_PROCESSING_DEADLINE` constant, so these have been removed from
this repo and anywhere using these has now been updated to get these
from `notifications-utils`.
Also bumped pytest by a patch version to bring in a bug fix.
The `save_email` and `save_sms` jobs were updated previously to take an
optional `sender_id` and to use this if it was available. This commit
now gets the `sender_id` from the S3 metadata if it exists and passes it
through the the tasks which save the job notifications. This means SMS
and emails sent through jobs can use a specified `sender_id` instead of
the default.
Started passing `sender_id` to the `save_email`, `save_sms` and
`process_job` tasks, with a default value of `None`.
If `sender_id` is provided, the `save_email` and `save_sms` tasks will
use it to determine the reply-to email address or the SMS sender for the
notifications in the job. The `process_job` task will start using the
value in another commit.
Adds an API endpoint `/letters/returned` that accepts a list of
notification references and creates a task to update their status.
Adds a new task that uses the list of references to update the status
of notifications to 'returned-letter'.
The update is currently done using a single query and logs the
number of changed records (including notification history records).
This could potentially be done within the `/letters/returned` endpoint,
but creating a job right away allows us to extend this more easily
in the future (e.g. logging missing notifications or adding callbacks).
The job is using the database tasks queue.
For returned letter updates most notifications won't exist in the
notifications table, so in order to find out whether the reference
matches any known letters we need to check the count of updated
history records.
We need to deal with this, it's ok when updating a notification status from delivered to delivered. But the DailySortedLetter counts are being doubled.
Adding the file_name to the table as a unique key to get around this issue. It will mean we have multiple rows for each billing_day, but that's ok we can aggregate that.
This will also give us a way to see which file created which count.
We were previously expecting the letter response files to be in the
format of 'NOTIFY.<datetime>.RSP.TXT' but the response files we receive
use '-' in the filenames instead of '.' which was causing an error when
we tried to get the date from the filename.
In the 'update-letter-notifications-statuses' task we want to ensure
that temporary failures are always handled, regardless of whether the
response file we receive contains unknown Sorted statuses or not.
process_incomplete_jobs loops through jobs processing them in a single
task. This means that if the job statuses are all 'error', and then the
process_incomplete_jobs task fails, the later jobs in the list that
never got picked up won't have their status set back to in progress, or
their processing_started time - so will be stuck in 'error' forever.
Instead, we set the job statuses to in progress and the start time to
now before we process any - so if the incomplete_jobs task fails later,
the jobs will be picked up (again) by the check_job_statuses task in
half an hour's time
the process_incomplete_jobs task runs through all incomplete jobs in
a loop, so it might not get a chance to update the processing_started
time of the last job before check_job_status runs again (every minute).
So before we even trigger the process_incomplete_jobs task, lets set
the status of the jobs to error, so that we don't identify them for
re-processing again.
we might stop processing jobs mid-way through if, for example, a
deploy or downscale kills the box working on it. We have a scheduled
task that identifies any job that we started processing more than half
an hour ago that is still processing.
However, we encountered a bug where we triggered the
process_incomplete_job multiple times, because the processing_started
of the job was still set to half an hour ago. If we reset the
processing_started to the current time, then it won't get picked up by
future runs of the check_job_status scheduled task.
In the update_letter_notifications_statuses task we now check whether
each row in the response file that we receive from the DVLA has the
value of 'Sorted' or 'Unsorted' in the postcode validation field. We
then calculate the number of Sorted and Unsorted rows for each day and
save each day as a row in the daily_sorted_letter table.
The data in daily_sorted_letter table should be in local time, so we
convert the datetime before saving.
This will continue to update the notification history for letter notifications.
We currently have an issue where the responses to letters from the provider is taking a long time.
This is due to the manual nature of their process.
Updating the status of the letter will still work if the notification has been purged.
Also turned back on the purge letter notification scheduled task.
The Notify team needs to investigate when a notification is marked as failed.
We will process the whole file and mark the notifications with the appropriate status, if any are failed an exception is raised.
The exception will trigger a cloud watch error for the team to investigate.