SQS fails if the message body is over 256kb. Normally our messages are
quite small, but if we're using the new save-api-email task with an
email that has a large body, we can get over that limit. If so, handle
the exception and fall back to the existing code path (saving to the
database and sending a deliver-email task, which only has a notification
id.
It is possible a service has data rention that is smaller than the time it takes to get a delivery receipt.
This PR refactors process_ses_receipt to update NotificationHistory if the Notifcation has already been purged.
So we keep a record of who first uploaded a list it’s better to archive
a list than completely delete it.
The list in the database doesn’t contain any recipient info so this
isn’t a change to what data we’re retaining.
This means updating the endpoints that get contact lists to exclude ones
that are archived.
Once a contact list is gone from the database there’s no way to
reference it again. Any jobs have made their own copy.
So we can clean it up, meaning we’re not storing personal data longer
than we need to.
This was one of things we de-scoped when we first shipped this feature.
In order to safely delete a list, we first need to make sure any jobs
aren’t referencing it.
Instead of saving the email notification to the db add it to a queue to save later.
This is an attempt to alleviate pressure on the db from the api requests.
This initial PR is to trial it see if we see improvement in the api performance an a reduction in queue pool errors. If we are happy with this we could remove the hard coding of the service id.
In a nutshell:
- If POST /v2/notification/email is from our high volume service (hard coded for now) then create a notification to send to a queue to persist the notification to the db.
- create a save_api_email task to persist the notification
- return the notification
- New worker app to process the save_api_email tasks.
- fix test name
- make query filter consistent with each other
- add comment for clarity
- add inner loop to continue to insert and delete notifications while the delete count is greater than 0
The insert_notification_history_delete_notifications function uses a temp table to store the data to insert and delete. This will save extra queries while performing the insert and delete operations.
The function is written in such a way that if the task is stop while processing when it's started up again it will just pick up where it left off.
I've made a decision to delete all test data in one query, I don't anticipate a problem with that.
The performance of this might also be better than last nights test because we are inserting everything we need for the NotificationHistory insert, so we don't need the join to Notifications to perform the insert.
Need to test the performance of this function, then we can call it from the task.
- Create a temporary table to insert ids of the desired rows, limit by 10K (might be too low).
- Insert into NotifcationHistory select from notification where id in temp table
- Delete from Notifications where id in temp table
- drop temp table
We should be able to iterate of this. The query stats for the query to create the temp table are very good, 17ms.
you should always order by when doing a limit/offset, to guarantee that
the second time you run that query, the order hasn't changed and you
aren't just repeating the task with an overlap of notifications.
Luckily, in this case we haven't lost any data, because:
* we have an on conflict do update statement so when we returned
duplicate rows it would just do an update
* when we delete, we cross-reference with the notification history so if
a row always got missed, we won't delete it.
This resulted in, for example, govuk email still having a handful of
notifications in the table from 9th despite the task running succesfully
every day until the 18th of march.
order by created_at ascending so that we start with oldest notifications
first, in case it's interrupted part way through.
insert/update, and then delete notifications in hourly batches. This
means that if the task gets interrupted part-way through, we'll have at
least something to show for it. Previously we would insert and update
into the history table but might not delete from the notification table
properly.
Keeping the offsets and limits for confidence around reliability and
queries timing out.
Keeping the join to notification_history to ensure we don't delete
anything prematurely while our DB is in a bit of a weird state with lots
of these tasks failing over the last week.
This runs on the new `sms-callbacks` queue. The function
`process_sms_client_response` has been replaced with a task called
`process_sms_client_response`. This involved some reorganisation of the
existing code and tests.
This should add an extra 400 connections maximum, which will not tip us over the allowable 5000 db connections. And it may help with the queue pool connection errors.