The assumption was that S3 would throw an exception if the object was uploaded twice. That's not the case the default behaviour is that if a file already exists it will be overwritten. So it is completely safe to run the task from the alert.
It can also mean that we don't need to wait 4hours 15 minutes. Shall I decease the amount of time before restarting the task?
When we upload a CSV for a job, we add the sender_id as metadata to the file that is uploaded on S3.
There is more than one place where we process rows from that CSV.
- process_job
- scheduled_job
- check_for_missing_rows_in_completed_jobs
- check_job_status
All of these places need to use the sender_id, now the sender_id is always read from the file metadata.
In a subsequent PR we can remove the optional sender_id parameter from process_job task.
Sometimes a job finishes but has missed a row in the middle. It is a mystery why this is happening, it could be that the task to save the notifications has been dropped.
So until we solve the missing let's find missing rows and process them.
A new scheduled task has been added to find any "finished" jobs that do not have enough notifications created. If there are missing notifications the job processes those rows for the job.
Adding the new task to beat schedule will be done in the next commit.
A unique key constraint has been added to Notifications to ensure that the row is not added twice. Any index or constraint can affect performance, but this unique constraint should not affect it enough for us to notice.
We want to pass the `request_id` to Celery tasks if the task is called
from an HTTP request, so that we can add the `request_id` to the logs.
This change overwrites `apply_async` to add the `request_id` to the
kwargs if available. When we call the task, we then add the `request_id`
to g on Flask's application context.
Tasks called from `send_task` won't have a `request_id` for now, and
this change only affects tasks called from HTTP requests (not from other
tasks or from Celery Beat).
This has been moved to the letters utils file since it will be used in
more than one place. The notification parameter has been removed so that
the function can be used when we don't have a notification id.
previously, we didn't create templated letters, and just marked them as
delivered straight away. However, we may need to return PDFs for these
letters, so we should create them the same as live letters. Then update
the functions so that they know where to look for these letters.
the nightly tasks need to run after the create nightly notification
status task - so that test notifications are still there to record
stats for, and to stop the risk of deleting notificaitons part-way
through recording stats for them.
- Change the NotificationTechnicalFailureException so that it only inherits from Exception.
- The notify_celery task should create the logging message on failure.
- Fix unit tests
- Remove named parameter when raising exception.
If SES raised an `InvalidParameterValue` error (because an email address
was wrong) we were logging an exception and setting the email status to
`technical-failure`. We now set it to `permanent-failure` instead and
change the log level to `info` - setting it to `permanent-failure` means
that people will know not to retry the message.
If the `deliver_sms` catches an exception when trying to send an SMS, we
want the first retry to happen immediately (because we will have
switched providers), then every retry after that to happen at the
standard intervals.
This is because that error is caused by our providers and we
cannot do anything about it but it can make our logs hard to read
and actionable errors harder to spot
Added a scheduled task to run once a day and check if there were any
letters from before 17.30 that still have a status of 'created'. This
logs an exception instead of trying to fix the error because the fix
will be different depending on which bucket the letter is in.
Added a task which runs twice a day on weekdays and checks for letters that have
been in the state of `pending-virus-check` for over 90 minutes. This is
just logging an exception for now, not trying to fix things, since we
will need to manually check where the issue was.
The `process_virus_scan_passed` task now catches S3 errors - if these
occur, it logs an exception and puts the letter in a `technical-failure`
state. We don't retry the task, because the most common reason for
failure would be the letter not being in the expected S3 bucket, in
which case retrying would make no difference.
the celery decorator should always be on the outside so that all other
decorators will be captured within the celery task. We had problems
with cronitor not reporting, and only for this task.
Bumped utils to version 31.2.5, which changes when the rows of a
RecipientCSV get created. Switched to using `.get_rows()` from
RecipientCSV (a generator) instead of the `.rows` property (which builds
a list of the rows in memory).
This celery task was not decorated with the cronitor decorator so never
checked in with cronitor.
Adding the decorator will ensure this task is monitored.
The requisite cronitor key is in the credentials repository already.
Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
When we send a zip file of letters to DVLA we expect them to send back an acknowledgement of those files.
Previously they named the files like NOTIFY.20180202091254.ACK.TXT and the contents would contain the name of the zip file we sent with a date of when they got it.
They have updated this format to mirror the format of the zip file because there was an instance where they sent 2 files of the same name so the later overwrote the first.
Since the name matches our name, there is no need to get the file from S3 but just compare file names.
the create_nightly_notification_status task runs at 00:30am UK time,
however this means that in summer datetime.today() will return the
wrong date as the server (which runs on UTC) will run the task at
23:30 (populating the wrong row in the table).
fix this to use nice tz aware functions