We were previously expecting the letter response files to be in the
format of 'NOTIFY.<datetime>.RSP.TXT' but the response files we receive
use '-' in the filenames instead of '.' which was causing an error when
we tried to get the date from the filename.
In the 'update-letter-notifications-statuses' task we want to ensure
that temporary failures are always handled, regardless of whether the
response file we receive contains unknown Sorted statuses or not.
process_incomplete_jobs loops through jobs processing them in a single
task. This means that if the job statuses are all 'error', and then the
process_incomplete_jobs task fails, the later jobs in the list that
never got picked up won't have their status set back to in progress, or
their processing_started time - so will be stuck in 'error' forever.
Instead, we set the job statuses to in progress and the start time to
now before we process any - so if the incomplete_jobs task fails later,
the jobs will be picked up (again) by the check_job_statuses task in
half an hour's time
the process_incomplete_jobs task runs through all incomplete jobs in
a loop, so it might not get a chance to update the processing_started
time of the last job before check_job_status runs again (every minute).
So before we even trigger the process_incomplete_jobs task, lets set
the status of the jobs to error, so that we don't identify them for
re-processing again.
we might stop processing jobs mid-way through if, for example, a
deploy or downscale kills the box working on it. We have a scheduled
task that identifies any job that we started processing more than half
an hour ago that is still processing.
However, we encountered a bug where we triggered the
process_incomplete_job multiple times, because the processing_started
of the job was still set to half an hour ago. If we reset the
processing_started to the current time, then it won't get picked up by
future runs of the check_job_status scheduled task.
In the update_letter_notifications_statuses task we now check whether
each row in the response file that we receive from the DVLA has the
value of 'Sorted' or 'Unsorted' in the postcode validation field. We
then calculate the number of Sorted and Unsorted rows for each day and
save each day as a row in the daily_sorted_letter table.
The data in daily_sorted_letter table should be in local time, so we
convert the datetime before saving.
This will continue to update the notification history for letter notifications.
We currently have an issue where the responses to letters from the provider is taking a long time.
This is due to the manual nature of their process.
Updating the status of the letter will still work if the notification has been purged.
Also turned back on the purge letter notification scheduled task.
The Notify team needs to investigate when a notification is marked as failed.
We will process the whole file and mark the notifications with the appropriate status, if any are failed an exception is raised.
The exception will trigger a cloud watch error for the team to investigate.
This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2.
Messaging would change to something like this:
February 2nd 2018, 15:39:05.885 Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
{'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']}
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:57.727 SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814
February 2nd 2018, 15:38:56.727 Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181
February 2nd 2018, 15:38:56.727 Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery
To somthing like this:
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
- call create fake response file task if in research mode only on preview and development environments to not impact response files on staging and live
- Changed the notification status of letters for letters that DVLA marks
as 'failed' from NOTIFICATION_TECHNICAL_FAILURE to
NOTIFICATION_TEMPORARY_FAILURE.
- Added has_permission helper in models.py to check permission in service
- Moved letters pdf tasks to separate file
- Moved letters pdf tests to own file
- check if service_callback_api exist before putting tasks on queue
- create_service_callback_api in tests before asserting if send_delivery_status_to_service has been called.
this reduces the amount of error messages we log (we'll no longer log
at error level when build-dvla-file-for-job retries while waiting for
the task to finish), and make sure we retry in those cases above - db
or s3 having temporary troubl
This causes an issue when it hits the max retry limit, and tries to
throw your exception to let you deal with it - at this point it was
moaning because we pass in a string
if it's not defined, and we're inside an exception block celery uses
that instead.
this involved:
* moving that task to callback_tasks to prevent circular imports
* updating the dummy research mode callbacks (with actual SNS messages from the
ses simulator emails)
* refactoring tests
also, downgrade client returning bad status codes from `exception` to
error - it's not a problem with our system so we don't want it to trip
any cloudwatch alerts or anything.
Instead of retrying if there are genuine errors, only retry if there are
errors which are unexpected as otherwise the retries will happen and
fail for the same reason e.g. that the message has changed format and
will require a code update.
- Updated process_ses_results to only retry if there in an unknown
exception
- Update test and assert that there is a retry there is a unknown
exception
- Updated the retry and max_retries of the process_ses_results celery
task to be the same as other retry strategies in that file
- Provided a message with the 200 to be similar to how other responses
are handled