Commit Graph

675 Commits

Author SHA1 Message Date
David McDonald
b28fffc330 Refactor raise_alert_if_letter_notifications_still_sending
Move finding of letter logic into a separate method so it can be unit
tested rather than having to test it by checking the contents of a
zendesk ticket api call.

This will enable us to change the zendesk api ticket call message
without needing to edit lots of tests.
2020-04-02 09:36:56 +01:00
Katie Smith
8b8e736e49 Add retries to process_sanitised_letter task
This task didn't have retries before, based on the assumption that if
the task failed it was likely to be due to a Boto error, so retrying
would not help because a file was probably not in the expected bucket.
During an incident with the database, we had some letters that were
stuck in the `pending-virus-check` state because this task failed.

This change adds retries to the task if there was an Exception that was
not a `BotoClientError`.
2020-03-30 17:28:01 +01:00
David McDonald
c1ba3fb1be Merge pull request #2777 from alphagov/add-runbook-link
Add runbook link to resolve letters pending virus scan
2020-03-30 13:50:30 +01:00
David McDonald
87cb02fa1d Add runbook link to resolve letters pending virus scan 2020-03-30 12:07:05 +01:00
Rebecca Law
46b72a6bbb Refactor process_ses_receipt
It is possible a service has data rention that is smaller than the time it takes to get a delivery receipt.
This PR refactors process_ses_receipt to update NotificationHistory if the Notifcation has already been purged.
2020-03-27 14:12:39 +00:00
Rebecca Law
db4b4d929d - If the task runs twice and the notification already exists ignore the primary key constraint.
- Remove prints
- Add some more tests
- Only allow the new method to run for emails
2020-03-25 12:39:15 +00:00
Rebecca Law
a13bcc6697 Reduce the pressure on the db for API post email requests.
Instead of saving the email notification to the db add it to a queue to save later.
This is an attempt to alleviate pressure on the db from the api requests.
This initial PR is to trial it see if we see improvement in the api performance an a reduction in queue pool errors. If we are happy with this we could remove the hard coding of the service id.

In a nutshell:
 - If POST /v2/notification/email is from our high volume service (hard coded for now) then create a notification to send to a queue to persist the notification to the db.
 - create a save_api_email task to persist the notification
 - return the notification
 - New worker app to process the save_api_email tasks.
2020-03-25 07:59:05 +00:00
Pea M. Tyczynska
900664c172 Merge pull request #2748 from alphagov/format-postcode-for-csvs
Format postcode for CSV letter rows
2020-03-19 11:07:08 +00:00
Rebecca Law
7459a4f6f6 Add a try/except around the code to get the files.
The idea is to log the exception but keep going. That way the "good" files still get sent and we can investigate why a file failed.
2020-03-19 09:15:38 +00:00
Rebecca Law
51f43563d3 Fix serialisation error when creating the create_letters_pdf in resend_created_notifications_older_than 2020-03-17 08:48:02 +00:00
Pea Tyczynska
1fb040dc61 Format postcode for CSV letter rows
We are formatting the postcode here, because if we did it in template
preview, that could break flows like API and admin one-off, since
we are not validating postcode there yet, and format_postcode
needs a nice validated postcode.

We are not doing it in admin, as then we would have to either
rewrite the CSV file or pass data differently to API. First would
be nasty, second is a lot of overhead.

In the long run we might want to move postcode formatting to
template preview so that the postcode in letter preview looks the same
before and after user sends it, but now to get it out quickly it's better
to format the postcode here in the task.
2020-03-13 17:35:36 +00:00
David McDonald
42f02c8c24 Merge pull request #2717 from alphagov/collate-pdf
Look at all previous days when sending letters
2020-02-24 10:16:18 +00:00
David McDonald
2c41b21ddf Remove unnecessary code and unrelevant comment 2020-02-21 16:42:37 +00:00
David McDonald
2cd05dea89 Make test DRY 2020-02-21 16:27:15 +00:00
David McDonald
6f663268e1 remove unneeded mock 2020-02-21 16:19:47 +00:00
David McDonald
e6767590d4 Change function and task name to be more accurate
Will require us to change a cronitor set up
2020-02-21 15:01:19 +00:00
David McDonald
148a5ab456 Refactor dates being passed around
I believe this way is nicer to read, we don't have to change between
datetimes and strings and back.
2020-02-21 15:01:19 +00:00
David McDonald
6226d9e122 Don't send test letters to dvla to print 2020-02-21 15:01:19 +00:00
David McDonald
7578e01b3b Test cases explicitely
In previous tests we check that we can deal with files that end in `pdf`
in various forms of upper and lowercase. I've moved the testing of this
behaviour into it's own test so that's explicit and not just implied
that we care about behaviour on the casing of filenames.

Note however that s3 is case sensitive and we upload all our files in
upper case so technically we'd never expect to see a file ending in
`pdf`. I've updated some of our test data to reflect this but didn't
bother doing it everywhere. I've left the test in anyway but it could be
argued that is is redundant as we don't ever expect to see that case
anyway.
2020-02-21 15:01:19 +00:00
David McDonald
5c5eb8a96a Remove unneeded check that notification is in created state
We instead rely on the fact that only files being passed into this
function we already know are in the created state
2020-02-21 15:01:19 +00:00
David McDonald
dc9bf757a8 Change which letters we want to be sent to look at all days
Previously, when running the `collate_letter_pdfs_for_day` task, we
would only send letters that were created between 5:30pm yesterday and
5:30 today.

Now we send letters that were created before 5:30pm today and that are
still waiting to be sent. This will help us automatically attempt to
send letters that may have fallen through the gaps and not been sent the
previous day when they should have been.

Previously we solved the problem of letters that had fallen the gap by
having to run the task with a date parameter for example
`collate_letter_pdfs_for_day('2020-02-18'). We no longer need this date
parameter as we will always look back across previous days too for
letters that still need sending.

Note, we have to change from using the pagination `list_objects_v2` to
instead getting each individual notification from s3. We reduce load by
using `HEAD` rather than `GET` but this will still greatly increase the
number of API calls. We acknowledge there will be a small cost to this,
say 50p for 5000 letters and think this is tolerable. Boto3 also handles
retries itself so if when making one of the many HEAD requests, there is
a networking blip then it should be retried automatically for us.
2020-02-21 15:01:19 +00:00
Leo Hemsted
0f6f2f1b91 split up _query_for_billing_data into three separate queries
the queries all return lots of columns, but each query has columns it
doesn't care about. eg emails don't have billable units or international
flag, letters don't have international flag, sms don't have a page count
etc. additionally, the query was grouping on things that never change,
like service id and notification type.

by making all of these literals (as in `select 1 as foo`) we see times
that are over 50% quicker for gov.uk email service.

Note: One of the tests changed because previously it involved emails and
sms with statuses that they could never be (eg returned-letter)
2020-02-19 13:12:01 +00:00
David McDonald
a14d5f0225 Remove task that no longer runs
We no longer puts files in these s3 buckets (and have in fact deleted
the buckets) therefore this task is redundant and can be removed.
2020-02-06 10:57:43 +00:00
Katie Smith
35e39bcfa8 Save recipient address in process_sanitised_letter task
If the letter passed sanitisation, the recipient address will be
returned from template preview, so we want to save this as the `to`
field of the notification.
2020-01-24 13:52:12 +00:00
Katie Smith
adf9906a96 Change process_sanitised_letter to take a single encrypted arg
Template preview will now send an encrypted dict containing all the args
to the `process_sanitised_letter` task, so this updates the task to
handle data in the new format.
2020-01-24 13:18:37 +00:00
David McDonald
3a0aece6a1 Up threshold for sms to telephone numbers
We were just ignoring the errors and our users were not fixing things.

Given that 500 texts cost approx £8 it's not the end of the world.

In the long run we may decide to just stop letting people try and send
messages to TV numbers but this is a quick fix to stop emails coming in
which we ignore.
2020-01-17 13:26:20 +00:00
Rebecca Law
5ebd9a473c Add the recipient address in the "to" field for precompiled letters. 2020-01-07 14:35:48 +00:00
Katie Smith
897ed66348 Merge pull request #2675 from alphagov/new-letter-tasks
Add new tasks for processing letters which have passed virus scan
2019-12-17 16:16:29 +00:00
David McDonald
61f1469a20 Improve developer experience for zendesk ticket
We don't need to log this as an exception. It's not an exception, it's
behaviour that is not ideal but is still expected so therefore I've
changed it to warn. Also it removes the email we get for the exception
which is not needed as we get the zendesk ticket instead.

I've also fixed the multiline string meaning the link to the runbook is
included in the zendesk ticket.
2019-12-17 11:47:23 +00:00
Katie Smith
cc2191c19f Add new tasks for sanitising precompiled letters
Added a task, `sanitise-letter`, that will be called from antivirus when
a letter has passed virus scan. This calls a new task in
template-preview which will sanitise the PDF.

A second new task, `process-sanitised-letter`, will be called from the
template preview task and deals with updating the notification and
moving it to the relevant bucket.
2019-12-16 11:55:09 +00:00
Rebecca Law
555e660a13 Merge pull request #2676 from alphagov/add-returned-letters-table
Add returned letters table
2019-12-13 14:13:28 +00:00
Pea Tyczynska
c00f82b81b Co-Authored-By: Chris Hill-Scott <me@quis.cc>
Use .format instead of concatenation to avoid type issues

Trying to concatenate uuid onto a string was throwing an error.

Also it is not possible to use uuid in parametrize statements
it seems as it messes up with running tests on multiple threads
2019-12-11 11:18:42 +00:00
Rebecca Law
c8368d908b Update process_returned_letters task to insert or update the returned_letter table. 2019-12-09 16:23:09 +00:00
Leo Hemsted
6ac4595224 process letters for 10 days when updating ft_notification_status
sms and emails have a very predictable 72 hour lifecycle. letters, on
the other hand, have ridiculously complex lifecycles - they might not
get sent because it's a weekend, they might not get sent because they're
second class and are only processed on alternate days, they might not
get sent because a different letter in the same batch had an error that
we didn't know about. Either way, it's apparent that four days is
definitely not enough time to guarantee that letters have gone from
sending to delivered.

Extend the amount of days we process for letters to 10 days. Keep emails
and sms down at 4 to keep run-times shorter

We're deliberately not thinking about returned letters here at all.
2019-12-09 16:02:43 +00:00
Leo Hemsted
884cb24bfa remove day_start from create nightly notification status
it makes less sense once we introduce different start dates for letters
and emails. Also, we never use it, since we just call the day tasks
ourselves from commands.py
2019-12-09 16:02:21 +00:00
Pea M. Tyczynska
2019070536 Merge pull request #2667 from alphagov/warn-team-about-high-failure-rates
Warn team about high failure rates
2019-12-09 11:28:25 +00:00
Pea Tyczynska
87bc86efa7 Reference dev runbook for instructions in the zendesk ticket 2019-12-06 17:05:43 +00:00
Pea Tyczynska
1b7b26bf24 Query directly for services with high failure rate 2019-12-06 16:57:56 +00:00
Pea Tyczynska
b8de67ae54 Update error message to include a url to offending service 2019-12-06 16:57:54 +00:00
Pea Tyczynska
339b6c0ec7 Refactor a test so it doesn't do query that's tested elsewhere 2019-12-06 16:57:54 +00:00
Pea Tyczynska
cfbb080f57 Simplify failure rate by building separate query 2019-12-06 16:57:44 +00:00
Pea Tyczynska
53efd87e28 Check for services sending sms messages to tv numbers 2019-12-06 16:57:34 +00:00
Pea Tyczynska
d72ab4f4a6 Send zendesk ticket when services found with high failure rates 2019-12-06 16:57:04 +00:00
Leo Hemsted
0448bca542 make create_nightly_notification_status_for_day take notification_type
the nightly task won't be affected, it'll just trigger three times more
sub-tasks.

this doesn't need to be a two-part deploy because we only trigger this
overnight, so as long as the deploy completes in daytime we don't need
to worry about celery task signatures
2019-12-05 14:43:33 +00:00
Leo Hemsted
f7fbd6de5b make 500s change priorities quicker
it's not acceptable for a constantly failing provider to take 50 minutes
to drain (5x reducing priority by 10). But similarly, we need _some_
delay, or a handful of concurrent failures will completely turn off a
provider, rendering the whole excercise kinda pointless. Setting the
delay before it tries to reduce priority again to one minute is nice
because it means that if one request times out and returns 502, then any
other requests that are in flight at that time will time out before the
one minute is up and not switch, but any requests made after the switch
that take sixty seconds to time out will affect it.
2019-11-28 13:29:39 +00:00
Leo Hemsted
cfe82f8f4a make 500 error provider switches also check for recent changes
moving the logic and the test from switch provider on slow delivery to
dao reduce sms provider priority
2019-11-28 13:29:39 +00:00
Leo Hemsted
2a392e7137 update switch provider scheduled task
it now looks at both providers and works out whether to deprioritise
one, rather than binary switching from one to the other. If anything
has altered the priorities in the last ten minutes it won't take any
action. If both providers are slow it also won't take any action.
2019-11-28 13:29:38 +00:00
Leo Hemsted
3c63ccb159 move from dao_toggle_sms_provider to dao_reduce_sms_provider_priority 2019-11-28 13:29:02 +00:00
Leo Hemsted
e29546cb65 flake8 2019-11-28 13:29:02 +00:00
Leo Hemsted
28da190a1c remove get_current_provider
the function no longer makes sense now that we send through both at
the same time. mostly just used in old tests that we'll end up rewriting
shortly anyway
2019-11-28 13:29:02 +00:00