Commit Graph

332 Commits

Author SHA1 Message Date
Rebecca Law
ac8fc49539 Merge pull request #2770 from alphagov/increase-qry-limit
Increase default query limit to 50,000 from 10,000
2020-03-31 17:15:12 +01:00
Rebecca Law
1444150d48 Add team key to the notifications to delete/insert into notifications/notification_history 2020-03-31 09:23:41 +01:00
Rebecca Law
5d0830d7f7 Rename function for clarity 2020-03-27 15:48:54 +00:00
Rebecca Law
8da73510c2 Delete all notification_status types.
This will ensure the whole day has been deleted. The stats table could get the wrong updates if there is partial data for a day.
2020-03-27 12:14:30 +00:00
Rebecca Law
e5b6fa2143 Reduce log messages. 2020-03-25 08:08:33 +00:00
Rebecca Law
31f83bb9c4 Increase default query limit to 50,000 from 10,000 2020-03-24 17:04:38 +00:00
Rebecca Law
69102db4cb Remove the hourly loop and replace it with a while loop, which check if we still have things to delete for the day.
We are already limiting the query by a query limit.
2020-03-24 14:54:06 +00:00
Rebecca Law
3ca73a35ed Change as per code review
- fix test name
 - make query filter consistent with each other
 - add comment for clarity
 - add inner loop to continue to insert and delete notifications while the delete count is greater than 0
2020-03-24 14:17:47 +00:00
Rebecca Law
f04210ba7d Refactor delete_notifications_older_than_retention_by_type to use a new strategy.
The insert_notification_history_delete_notifications function uses a temp table to store the data to insert and delete. This will save extra queries while performing the insert and delete operations.
The function is written in such a way that if the task is stop while processing when it's started up again it will just pick up where it left off.

I've made a decision to delete all test data in one query, I don't anticipate a problem with that.

The performance of this might also be better than last nights test because we are inserting everything we need for the NotificationHistory insert, so we don't need the join to Notifications to perform the insert.
2020-03-24 12:21:28 +00:00
Rebecca Law
80845e1abb Transaction to insert history and delete notifications.
Need to test the performance of this function, then we can call it from the task.
- Create a temporary table to insert ids of the desired rows, limit by 10K (might be too low).
- Insert into NotifcationHistory select from notification where id in temp table
- Delete from Notifications where id in temp table
- drop temp table

We should be able to iterate of this. The query stats for the query to create the temp table are very good, 17ms.
2020-03-23 16:33:06 +00:00
Leo Hemsted
8ea09be1ff order the insert/update by created_at
you should always order by when doing a limit/offset, to guarantee that
the second time you run that query, the order hasn't changed and you
aren't just repeating the task with an overlap of notifications.
Luckily, in this case we haven't lost any data, because:

* we have an on conflict do update statement so when we returned
  duplicate rows it would just do an update
* when we delete, we cross-reference with the notification history so if
  a row always got missed, we won't delete it.

This resulted in, for example, govuk email still having a handful of
notifications in the table from 9th despite the task running succesfully
every day until the 18th of march.

order by created_at ascending so that we start with oldest notifications
first, in case it's interrupted part way through.
2020-03-20 19:11:24 +00:00
Leo Hemsted
dc5b56ff78 Change sql to chunk by hour to remove old notifications
insert/update, and then delete notifications in hourly batches. This
means that if the task gets interrupted part-way through, we'll have at
least something to show for it. Previously we would insert and update
into the history table but might not delete from the notification table
properly.

Keeping the offsets and limits for confidence around reliability and
queries timing out.

Keeping the join to notification_history to ensure we don't delete
anything prematurely while our DB is in a bit of a weird state with lots
of these tasks failing over the last week.
2020-03-20 19:07:08 +00:00
David McDonald
148a5ab456 Refactor dates being passed around
I believe this way is nicer to read, we don't have to change between
datetimes and strings and back.
2020-02-21 15:01:19 +00:00
David McDonald
6226d9e122 Don't send test letters to dvla to print 2020-02-21 15:01:19 +00:00
David McDonald
dc9bf757a8 Change which letters we want to be sent to look at all days
Previously, when running the `collate_letter_pdfs_for_day` task, we
would only send letters that were created between 5:30pm yesterday and
5:30 today.

Now we send letters that were created before 5:30pm today and that are
still waiting to be sent. This will help us automatically attempt to
send letters that may have fallen through the gaps and not been sent the
previous day when they should have been.

Previously we solved the problem of letters that had fallen the gap by
having to run the task with a date parameter for example
`collate_letter_pdfs_for_day('2020-02-18'). We no longer need this date
parameter as we will always look back across previous days too for
letters that still need sending.

Note, we have to change from using the pagination `list_objects_v2` to
instead getting each individual notification from s3. We reduce load by
using `HEAD` rather than `GET` but this will still greatly increase the
number of API calls. We acknowledge there will be a small cost to this,
say 50p for 5000 letters and think this is tolerable. Boto3 also handles
retries itself so if when making one of the many HEAD requests, there is
a networking blip then it should be retried automatically for us.
2020-02-21 15:01:19 +00:00
David McDonald
3dcac18849 Use correct exception for boto3
We use boto3 for our interaction with s3. Therefore if an expection is
thrown it will be thrown from the botocore library (which boto3 is built
on top of).

I have copied
app/aws/s3.py::file_exists for an example of this exception catching.
2020-02-12 15:28:46 +00:00
Rebecca Law
8445775be0 Remove unused methods.
A new endpoint to return the last date a template was used which means the old endpoint can be removed.
2020-02-07 15:50:54 +00:00
Rebecca Law
dec42b06cc Simplify the code in the query.
The date in the notifications table should always be the most recent date for the template.
Removed the template_type param for the query as well.
Simplified the tests.
2020-02-05 16:43:17 +00:00
Rebecca Law
3a32c35dd2 Added a new endpoint to return the last used date for a template.
The existing endpoint returned a whole notification for the last time the template was used. But this only takes into account data in the last week. This new methods allows us to be specific about when the template was last used if ever but looking into the ft_notification_status table as well.
2020-02-05 13:03:54 +00:00
Chris Hill-Scott
c573209d7e Stop guessing notification type
Before the search term was either:
- an email address (or partial email address)
- a phone number (or partial phone number)

Now it can also be:
- a reference (or partial reference)

We can take a pretty good guess, by looking at the search term, whether
the thing the user is searching by email address or phone number. This
helps us:
- only show relevant notifications
- normalise the search term to give the best chance of matching what we
  store in the `normalised_to` field

However we can’t look at a search term and guess whether it’s a
reference, because a reference could take any format. Therefore if the
user hasn’t told us what kind of thing their search term is, we should
stop trying to guess.
2019-12-16 13:43:38 +00:00
Chris Hill-Scott
8cb6907828 Allow searching by reference as well as recipient
We have a team who want to find emails that might have been sent to an
incorrect address. Therefore they can’t search by the correct address,
because it won’t match.

What they do have is the reference number of the user’s application,
which is also stored in the `client_reference` field on the
notification.

So when a user is searching we should also look at the client reference,
as well as the recipient, allowing the user to enter either in the
search box.
2019-12-16 11:02:07 +00:00
Leo Hemsted
e29546cb65 flake8 2019-11-28 13:29:02 +00:00
Leo Hemsted
6f38cbbcf1 randomly choose from providers based on priority
todo: make sure if they don't add up to 100 we do something sensible,
especially if they're both 0.
2019-11-28 13:29:01 +00:00
Rebecca Law
ac4f0e8027 After a comment from @idavidmcdonald, I asked myself why are not creating the task to upload the pdf and update the notification.
The assumption was that S3 would throw an exception if the object was uploaded twice. That's not the case the default behaviour is that if a file already exists it will be overwritten. So it is completely safe to run the task from the alert.

It can also mean that we don't need to wait 4hours 15 minutes. Shall I decease the amount of time before restarting the task?
2019-11-19 16:04:21 +00:00
Rebecca Law
c42420c329 Add an alert when a letter is created but doesn't have a file in S3 for sending. We can tell this is the case because there is no updated_at and billable units are still 0.
At this point we are just creating a zendesk ticket - perhaps we can just call the create_letter_pdf task.
2019-11-13 16:39:59 +00:00
Katie Smith
fb80a9c92e Update insert_update_notification_history to take a query limit
The nightly job to delete email notifications was failing because it was
timing out (`psycopg2.errors.QueryCanceled: canceling statement due to statement timeout`).

This adds a query limit to the query which inserts or updates
notification history so that it only updates a maximum of 10000 rows at
a time.
2019-10-14 16:51:46 +01:00
Rebecca Law
2c41d6130c Merge pull request #2617 from alphagov/notification-count-for-job
Return count of notifications in the database for a job
2019-10-03 16:54:30 +01:00
Rebecca Law
7fc7d99dac Update the new endpoint to return a 404 if the job or service id are not found.
All our endpoint should perform a check that the params are valid - this is an easy whay to check that and is standard for our endpoints.
I reverted the query to just filter by job id.
2019-10-03 14:58:49 +01:00
Pea Tyczynska
c48aa77dd5 Use service_id in the query to make it safer, also use named parameters 2019-09-25 16:32:27 +01:00
Rebecca Law
f234e94572 Change the variable name to make a little more sense. 2019-09-25 13:56:10 +01:00
Rebecca Law
702d8fa85f Refactor the code that figures out what folder and filename to use for the letter pdf files.
Now we consistently use the created_at date, so we can always get the right file location and name.

The previous updates to this code were trying to solve the problem if a pdf being created at 17:29, but not ready to upload until 17:31 after the antivirus and validation check.
But in those cases we would have trouble finding the file.
2019-09-25 13:56:10 +01:00
Pea Tyczynska
8cf8d24e37 Return count of notifications in the database for a job
When we cancel a job, we need to check if all notifications are
already in the database. So far, we were querying for all
notification objects in the database and counting them in
admin app, which runs into pagination problems for large jobs,
and could time out for very large jobs.
2019-09-24 16:56:03 +01:00
Rebecca Law
f097abe82b Change the query to get the notifications for the check_templated_letter_state.
Now looking at the updated_at date, we are getting the alert if the notification was created_at:17:29 updated to created status at 17:30, so the letter is in the next days bucket.

Not sure if I want to make this change, there isn't an index on updated_at, so the query might be slow.
2019-08-16 10:37:51 +01:00
Katie Smith
cec87a9de0 Delete unused code
* The `_should_record_notification_in_history_table` function stopped being
used in this commit: c23ae15f32
* `NOTIFICATIONS_ALERT` stopped being used in this commit: 5aa37f09b6
2019-07-12 16:43:37 +01:00
Katie Smith
c518f6ca76 Add scheduled task to find old letters which still have 'created' status
Added a scheduled task to run once a day and check if there were any
letters from before 17.30 that still have a status of 'created'. This
logs an exception instead of trying to fix the error because the fix
will be different depending on which bucket the letter is in.
2019-06-18 10:58:58 +01:00
Katie Smith
a2f324ad7e Add scheduled task to find precompiled letters in wrong state
Added a task which runs twice a day on weekdays and checks for letters that have
been in the state of `pending-virus-check` for over 90 minutes. This is
just logging an exception for now, not trying to fix things, since we
will need to manually check where the issue was.
2019-06-18 10:58:58 +01:00
Leo Hemsted
2ac2cbbd37 don't pass running totals in to functions
or you can easily end up double-counting things. (the test written
previously returned 6)
2019-06-03 17:47:42 +01:00
Rebecca Law
cfd42a2eb9 Update subquery to be more efficient.
Update subquery to run again but for test keys. Test data is never inserted in Notifications so they need to be deleted separately now given the join to NotificationHistory.
2019-06-03 15:16:46 +01:00
Rebecca Law
b8399b8b9b Add a where clause to join to NotificationHistory, this is some extra assurance that the Notification will not be deleted unless the history exists. 2019-06-03 11:47:02 +01:00
Rebecca Law
c23ae15f32 Remove insert to NotificationHistory
Fix all test failures
2019-05-31 16:52:22 +01:00
Rebecca Law
4154251970 Addd missing reference to the update statement. 2019-05-30 10:54:47 +01:00
Rebecca Law
3374e03ce9 Prepare to stop inserting NotificationHistory at the time of inserting a notificaiton.
Need to remove foreign key to complaints.
Make sure if getting Notification.id we look to both tables.
2019-05-21 16:08:18 +01:00
Rebecca Law
198fd21f7e Update Notification history if there is a mismatch in the number of notifications to be updated and the number actually updated. 2019-05-15 15:30:15 +01:00
Rebecca Law
43334d63f3 Stop updating NotificationHistory
Doing my bit to remove imports of fixtures.
2019-05-15 10:58:39 +01:00
Rebecca Law
d5d2b3d2a6 Update insert to use select_from - this allows the insert query to run as a single bulk insert and should be more efficient. 2019-05-02 13:46:15 +01:00
Rebecca Law
c9265aab68 Don't do anything if the query doesn't yield results. 2019-05-01 15:07:59 +01:00
Rebecca Law
0def0b7fd0 We want to staop inserting and updating NotificationHistory each time we insert/update Notification.
This PR adds a function to upsert (insert or update if exists) NotificationHistory all the rows from Notification that we are about to delete in the nightly task. This will happen just before the delete function. Since it is a upsert query the function can be called more than once.
This should allow us remove all the insert/updates to NotificationHistory.

However, there is a consern that this will double the length of time the tasks take. So do we do these upserts in a separate task or in the same one?
2019-05-01 14:26:11 +01:00
Rebecca Law
a53340b4d7 Update the query that gets the number of notifications that have been sent under 10 seconds to use Notifications rather than NotificationHistory.
Also removed a test that is not useful
2019-04-10 10:06:27 +01:00
Rebecca Law
e9607f227d Remove query that's no longer needed. 2019-03-29 15:38:48 +00:00
Leo Hemsted
38f0ea6cca remove functions to not talk about 7 days
remind us that data retention is flexible
2019-02-26 17:57:35 +00:00