If Monday or Tuesday check for letters still sending after 4 days.
If Saturday or Sunday do nothing
If Wed, Thurs, Fri check for letters still sending after 2 days
Added test for Tuesday, corrected tests after the correction to query.
be known. Added the notification id to the logging message so that
the notification can be traced through the logging system by knowing
the notification id, making it easier to debug. Also changed to raise an
exception so that alerts are generated. This way we should get an email
to say that there has been an error.
application. If the Anti-virus app fails due to s3 errors or ClamAV
so does not scan (even after retries) the file at all an error needs
to be raised and the notification set to technical-failure.
Files should be moved to a 'folder' a separate one for ERROR and FAILURE.
* Added new letter task to process the error
* Added a new method to letter utils.py to move a file into an error or
failure folder based on the input
* Added tests to test the task and the utils.py method
- precompiled PDFs sent by test key uploaded to scan bucket
- set status to VIRUS-SCAN-FAILED for pdfs failing virus scan rather than PERMANENT-FAILURE
- Make call to AV app for precompiled letters sent via a test key, and set notification status to PENDING-VIRUS-SCAN
Which means the sent_at date for the notification could be empty causing the service callback to fail.
- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
We need to deal with this, it's ok when updating a notification status from delivered to delivered. But the DailySortedLetter counts are being doubled.
Adding the file_name to the table as a unique key to get around this issue. It will mean we have multiple rows for each billing_day, but that's ok we can aggregate that.
This will also give us a way to see which file created which count.
We were previously expecting the letter response files to be in the
format of 'NOTIFY.<datetime>.RSP.TXT' but the response files we receive
use '-' in the filenames instead of '.' which was causing an error when
we tried to get the date from the filename.
In the 'update-letter-notifications-statuses' task we want to ensure
that temporary failures are always handled, regardless of whether the
response file we receive contains unknown Sorted statuses or not.
When the notification is timedout by the scheduled task if the service is expecting a status update, that update to the service would fail.
A test has been added.
The tasks are no longer being used, so can be deleted safely:
* record_initial_job_statistics
* record_outcome_job_statistics
* timeout-job-statistics
The test file for the statistics tasks was deleted in a previous commit.
process_incomplete_jobs loops through jobs processing them in a single
task. This means that if the job statuses are all 'error', and then the
process_incomplete_jobs task fails, the later jobs in the list that
never got picked up won't have their status set back to in progress, or
their processing_started time - so will be stuck in 'error' forever.
Instead, we set the job statuses to in progress and the start time to
now before we process any - so if the incomplete_jobs task fails later,
the jobs will be picked up (again) by the check_job_statuses task in
half an hour's time
the process_incomplete_jobs task runs through all incomplete jobs in
a loop, so it might not get a chance to update the processing_started
time of the last job before check_job_status runs again (every minute).
So before we even trigger the process_incomplete_jobs task, lets set
the status of the jobs to error, so that we don't identify them for
re-processing again.
we might stop processing jobs mid-way through if, for example, a
deploy or downscale kills the box working on it. We have a scheduled
task that identifies any job that we started processing more than half
an hour ago that is still processing.
However, we encountered a bug where we triggered the
process_incomplete_job multiple times, because the processing_started
of the job was still set to half an hour ago. If we reset the
processing_started to the current time, then it won't get picked up by
future runs of the check_job_status scheduled task.