Commit Graph

7089 Commits

Author SHA1 Message Date
Rebecca Law
ac8fc49539 Merge pull request #2770 from alphagov/increase-qry-limit
Increase default query limit to 50,000 from 10,000
2020-03-31 17:15:12 +01:00
Rebecca Law
a7a158fe69 Merge pull request #2784 from alphagov/add-team-key-to-query
Add team key to query
2020-03-31 17:13:42 +01:00
Rebecca Law
ee45a2096c Merge pull request #2785 from alphagov/index-optimisation-1
Optimise indexes
2020-03-31 16:00:03 +01:00
Katie Smith
113ab4805f Merge pull request #2775 from alphagov/task-retries
Add retries to process_sanitised_letter task
2020-03-31 15:39:31 +01:00
Katie Smith
6d7bf5eef8 Merge pull request #2783 from alphagov/bump-utils
Bump utils to 36.9.3 to bring in new version of Bleach
2020-03-31 15:39:14 +01:00
Rebecca Law
5f8ea5d1f5 Remove duplicates 2020-03-31 15:17:27 +01:00
Pea M. Tyczynska
401c8e41d6 Merge pull request #2779 from alphagov/change-error-code
Change error code for duplicate reply-to address to 409 meaning conflict
2020-03-31 14:48:56 +01:00
Rebecca Law
7489cda6f0 Optimise indexes
Dropping unnecessary indexes on Notification and Notication_History
Create new composite indexes that should give queries better performance
Drop and create new indexes.

We are not running this migration on production because we want to run each one at a time.
Run this on staging then start run some load tests
2020-03-31 14:12:55 +01:00
David McDonald
2a528f03b8 Merge pull request #2781 from alphagov/fix-slow-org-user-query
fix the sql
2020-03-31 09:32:10 +01:00
Rebecca Law
1444150d48 Add team key to the notifications to delete/insert into notifications/notification_history 2020-03-31 09:23:41 +01:00
Katie Smith
7218e5887e Bump utils to 36.9.3 to bring in new version of Bleach 2020-03-31 09:13:24 +01:00
Leo Hemsted
8faeefc6cb Merge pull request #2780 from alphagov/fix-purge-task
fix purge functional test data task
2020-03-30 18:02:53 +01:00
Leo Hemsted
53b80b931f fix the sql
"if you want something done right, then do it yourself"

The ORM was building a weird and inefficient query to get all the users
for a given organisation_id:

old query:

```sql
SELECT users.*
FROM users
WHERE (
    EXISTS (
        SELECT 1
        FROM user_to_organisation, organisation
        WHERE users.id = user_to_organisation.user_id AND
        organisation.id = user_to_organisation.organisation_id AND
        organisation.id = '63b9557c-22ea-42ac-bcba-edaa50e3ae51'
    )
) AND
users.state = 'active'
ORDER BY users.created_at
```

Note the cross join on users_to_org and org, there are a lot of users in
organisations on preview, as one is added every functional test run.
Even though they're all filtered out and the query returns the correct
data, it still does the nested loop under the hood.

new query:

```sql
SELECT users.*
FROM users
JOIN user_to_organisation AS user_to_organisation_1 ON users.id = user_to_organisation_1.user_id
JOIN organisation ON organisation.id = user_to_organisation_1.organisation_id
WHERE organisation.id = '63b9557c-22ea-42ac-bcba-edaa50e3ae51' AND users.state = 'active' ORDER BY users.created_at;
```

Much more straightforward.
2020-03-30 17:47:15 +01:00
Leo Hemsted
885f3122bd fix purge functional test data task
* it doesn't delete service email reply to or letter contacts, or
  contact lists. I don't think the contact lists will ever be an issue
  but it doesn't hurt to add it to the list of things to remove.
* it doesn't remove users from organisations before deleting the users

there may be more tables that link to Service that should be deleted,
but for now just add these ones that I could spot.
2020-03-30 17:45:29 +01:00
Katie Smith
8b8e736e49 Add retries to process_sanitised_letter task
This task didn't have retries before, based on the assumption that if
the task failed it was likely to be due to a Boto error, so retrying
would not help because a file was probably not in the expected bucket.
During an incident with the database, we had some letters that were
stuck in the `pending-virus-check` state because this task failed.

This change adds retries to the task if there was an Exception that was
not a `BotoClientError`.
2020-03-30 17:28:01 +01:00
Pea Tyczynska
0250bee1a0 Change error code for duplicate reply-to address to 409 meaning conflict 2020-03-30 17:16:55 +01:00
David McDonald
c1ba3fb1be Merge pull request #2777 from alphagov/add-runbook-link
Add runbook link to resolve letters pending virus scan
2020-03-30 13:50:30 +01:00
David McDonald
87cb02fa1d Add runbook link to resolve letters pending virus scan 2020-03-30 12:07:05 +01:00
Rebecca Law
eaf38dd6a7 Merge pull request #2776 from alphagov/save-to-queue-gracefully-abort
if message too big to put on high volume queue then save to queue
2020-03-28 10:06:42 +00:00
Leo Hemsted
0b3d711652 if message too big to put on high volume queue then save to queue
SQS fails if the message body is over 256kb. Normally our messages are
quite small, but if we're using the new save-api-email task with an
email that has a large body, we can get over that limit. If so, handle
the exception and fall back to the existing code path (saving to the
database and sending a deliver-email task, which only has a notification
id.
2020-03-28 09:55:31 +00:00
Rebecca Law
c81388d004 Merge pull request #2774 from alphagov/fix-report-and-delete-tasks
Fix report and delete tasks
2020-03-27 16:30:16 +00:00
Rebecca Law
5d0830d7f7 Rename function for clarity 2020-03-27 15:48:54 +00:00
Chris Hill-Scott
4aed012d92 Merge pull request #2759 from alphagov/delete-contact-list
Add an endpoint to delete a contact list
2020-03-27 14:50:51 +00:00
Rebecca Law
46b72a6bbb Refactor process_ses_receipt
It is possible a service has data rention that is smaller than the time it takes to get a delivery receipt.
This PR refactors process_ses_receipt to update NotificationHistory if the Notifcation has already been purged.
2020-03-27 14:12:39 +00:00
Chris Hill-Scott
14605764bd Add comment explaining test data 2020-03-27 13:41:32 +00:00
Rebecca Law
8da73510c2 Delete all notification_status types.
This will ensure the whole day has been deleted. The stats table could get the wrong updates if there is partial data for a day.
2020-03-27 12:14:30 +00:00
David McDonald
814fb6043c Merge pull request #2773 from alphagov/move-high-volume-key-to-creds
Move high volume config
2020-03-27 11:08:04 +00:00
Rebecca Law
1d7c3466b0 Add cred to manifest 2020-03-27 10:08:30 +00:00
Chris Hill-Scott
5fe0fafadf Archive, don’t delete contact lists
So we keep a record of who first uploaded a list it’s better to archive
a list than completely delete it.

The list in the database doesn’t contain any recipient info so this
isn’t a change to what data we’re retaining.

This means updating the endpoints that get contact lists to exclude ones
that are archived.
2020-03-27 09:51:54 +00:00
Rebecca Law
dc44cb29d1 To make the deployment and testing a little easier move the high volume service ids to the credential repo.
This way we can only add the ids when we are ready and all the infrastrure for the new service has been applied.
2020-03-27 08:02:51 +00:00
Chris Hill-Scott
4a6143aeb1 Remove the list from S3 once we don’t need it
Once a contact list is gone from the database there’s no way to
reference it again. Any jobs have made their own copy.

So we can clean it up, meaning we’re not storing personal data longer
than we need to.
2020-03-26 17:42:38 +00:00
Chris Hill-Scott
b50dbab8fd Add an endpoint to delete a contact list
This was one of things we de-scoped when we first shipped this feature.

In order to safely delete a list, we first need to make sure any jobs
aren’t referencing it.
2020-03-26 17:42:38 +00:00
Rebecca Law
e4df43e412 Merge pull request #2771 from alphagov/take-pressure-off-api-db
Reduce the pressure on the db for API post email requests.
2020-03-26 12:03:30 +00:00
Rebecca Law
a801230fc4 Added test that the notification exists or not 2020-03-26 11:18:59 +00:00
Rebecca Law
d04bb1caca Remove argument from retry 2020-03-26 08:44:28 +00:00
Rebecca Law
d0a2e0f3ce Move high volume service id to config 2020-03-25 15:25:45 +00:00
Rebecca Law
db4b4d929d - If the task runs twice and the notification already exists ignore the primary key constraint.
- Remove prints
- Add some more tests
- Only allow the new method to run for emails
2020-03-25 12:39:15 +00:00
Rebecca Law
e5b6fa2143 Reduce log messages. 2020-03-25 08:08:33 +00:00
Rebecca Law
a13bcc6697 Reduce the pressure on the db for API post email requests.
Instead of saving the email notification to the db add it to a queue to save later.
This is an attempt to alleviate pressure on the db from the api requests.
This initial PR is to trial it see if we see improvement in the api performance an a reduction in queue pool errors. If we are happy with this we could remove the hard coding of the service id.

In a nutshell:
 - If POST /v2/notification/email is from our high volume service (hard coded for now) then create a notification to send to a queue to persist the notification to the db.
 - create a save_api_email task to persist the notification
 - return the notification
 - New worker app to process the save_api_email tasks.
2020-03-25 07:59:05 +00:00
Rebecca Law
31f83bb9c4 Increase default query limit to 50,000 from 10,000 2020-03-24 17:04:38 +00:00
Rebecca Law
544537791d Merge pull request #2768 from alphagov/optimise-delete-notifications-task
Refactor delete_notifications_older_than_retention_by_type
2020-03-24 15:04:48 +00:00
Rebecca Law
69102db4cb Remove the hourly loop and replace it with a while loop, which check if we still have things to delete for the day.
We are already limiting the query by a query limit.
2020-03-24 14:54:06 +00:00
Rebecca Law
3ca73a35ed Change as per code review
- fix test name
 - make query filter consistent with each other
 - add comment for clarity
 - add inner loop to continue to insert and delete notifications while the delete count is greater than 0
2020-03-24 14:17:47 +00:00
Rebecca Law
f04210ba7d Refactor delete_notifications_older_than_retention_by_type to use a new strategy.
The insert_notification_history_delete_notifications function uses a temp table to store the data to insert and delete. This will save extra queries while performing the insert and delete operations.
The function is written in such a way that if the task is stop while processing when it's started up again it will just pick up where it left off.

I've made a decision to delete all test data in one query, I don't anticipate a problem with that.

The performance of this might also be better than last nights test because we are inserting everything we need for the NotificationHistory insert, so we don't need the join to Notifications to perform the insert.
2020-03-24 12:21:28 +00:00
Rebecca Law
b243257906 Merge pull request #2767 from alphagov/optimise-delete-high-volume
Transaction to insert history and delete notifications.
2020-03-23 16:50:45 +00:00
Rebecca Law
80845e1abb Transaction to insert history and delete notifications.
Need to test the performance of this function, then we can call it from the task.
- Create a temporary table to insert ids of the desired rows, limit by 10K (might be too low).
- Insert into NotifcationHistory select from notification where id in temp table
- Delete from Notifications where id in temp table
- drop temp table

We should be able to iterate of this. The query stats for the query to create the temp table are very good, 17ms.
2020-03-23 16:33:06 +00:00
Rebecca Law
3a95fba9b0 Merge pull request #2766 from alphagov/delete-stmt-order-by-and-hour-separate
Delete stmt order by and hour separate
2020-03-20 19:25:00 +00:00
Leo Hemsted
8ea09be1ff order the insert/update by created_at
you should always order by when doing a limit/offset, to guarantee that
the second time you run that query, the order hasn't changed and you
aren't just repeating the task with an overlap of notifications.
Luckily, in this case we haven't lost any data, because:

* we have an on conflict do update statement so when we returned
  duplicate rows it would just do an update
* when we delete, we cross-reference with the notification history so if
  a row always got missed, we won't delete it.

This resulted in, for example, govuk email still having a handful of
notifications in the table from 9th despite the task running succesfully
every day until the 18th of march.

order by created_at ascending so that we start with oldest notifications
first, in case it's interrupted part way through.
2020-03-20 19:11:24 +00:00
Leo Hemsted
dc5b56ff78 Change sql to chunk by hour to remove old notifications
insert/update, and then delete notifications in hourly batches. This
means that if the task gets interrupted part-way through, we'll have at
least something to show for it. Previously we would insert and update
into the history table but might not delete from the notification table
properly.

Keeping the offsets and limits for confidence around reliability and
queries timing out.

Keeping the join to notification_history to ensure we don't delete
anything prematurely while our DB is in a bit of a weird state with lots
of these tasks failing over the last week.
2020-03-20 19:07:08 +00:00
David McDonald
3a61a84ca3 Merge pull request #2753 from alphagov/increase-pool-size
Increase the queue pool size to 30.
2020-03-19 15:51:15 +00:00