Commit Graph

148 Commits

Author SHA1 Message Date
Leo Hemsted
2a392e7137 update switch provider scheduled task
it now looks at both providers and works out whether to deprioritise
one, rather than binary switching from one to the other. If anything
has altered the priorities in the last ten minutes it won't take any
action. If both providers are slow it also won't take any action.
2019-11-28 13:29:38 +00:00
Leo Hemsted
fa7e0a1e84 add dao_reduce_sms_provider_priority function
retrive the sms providers from the DB, and decrease the chosen
provider's priority by 10, while increasing the other by 10.

add a check in to ensure we never decrease below 0 or increase above 100
- this is per provider, we don't check that the two add up to 100 or
  anything. If the values are outside of this range (eg: set via the UI)
then they'll probably* fix themselves at some point - we've added tests
to document these cases.

Use with_for_update to ensure that the method can only run once at a
time - other invocations of the function will be held on that line until
the currently running one ends and commits the transaction. This doesn't
affect anyone doing things from the UI.
2019-11-28 13:29:01 +00:00
Leo Hemsted
6f38cbbcf1 randomly choose from providers based on priority
todo: make sure if they don't add up to 100 we do something sensible,
especially if they're both 0.
2019-11-28 13:29:01 +00:00
Rebecca Law
4fd6f33af2 Merge pull request #2658 from alphagov/fix-letters-in-created-status
Alert if a letter doesn't make it past created status
2019-11-27 13:38:51 +00:00
Rebecca Law
853df6fbfb Fix reference to old time frame for task. 2019-11-27 13:26:53 +00:00
Rebecca Law
e0b4b258aa Shortened the length of time to check for messages with the wrong state.
There is a chance that the there is an outstanding retry task that has yet to run but the task that are replayed here protect against the task running twice. So this just means it might get sent sooner than later.
2019-11-21 15:51:27 +00:00
Rebecca Law
ac4f0e8027 After a comment from @idavidmcdonald, I asked myself why are not creating the task to upload the pdf and update the notification.
The assumption was that S3 would throw an exception if the object was uploaded twice. That's not the case the default behaviour is that if a file already exists it will be overwritten. So it is completely safe to run the task from the alert.

It can also mean that we don't need to wait 4hours 15 minutes. Shall I decease the amount of time before restarting the task?
2019-11-19 16:04:21 +00:00
Rebecca Law
918975b0a6 Use sender_id from CSV metadata.
When we upload a CSV for a job, we add the sender_id as metadata to the file that is uploaded on S3.
There is more than one place where we process rows from that CSV.
 - process_job
 - scheduled_job
 - check_for_missing_rows_in_completed_jobs
 - check_job_status

All of these places need to use the sender_id, now the sender_id is always read from the file metadata.
In a subsequent PR we can remove the optional sender_id parameter from process_job task.
2019-11-15 15:42:29 +00:00
Rebecca Law
6155f7666e Testing with latest 2019-11-15 15:42:24 +00:00
Rebecca Law
516190262a [WIP] 2019-11-15 15:41:27 +00:00
Rebecca Law
c42420c329 Add an alert when a letter is created but doesn't have a file in S3 for sending. We can tell this is the case because there is no updated_at and billable units are still 0.
At this point we are just creating a zendesk ticket - perhaps we can just call the create_letter_pdf task.
2019-11-13 16:39:59 +00:00
Rebecca Law
5aaf5cd588 Add the missing format for the log message when a missing row is processed. 2019-11-07 15:01:23 +00:00
Rebecca Law
559faf3034 Fix the query.
Missing the where clause to join the two tables.... OOPS
2019-11-07 10:57:31 +00:00
Rebecca Law
db5a50c5a7 Adding a scheduled task to processing missing rows from job
Sometimes a job finishes but has missed a row in the middle. It is a mystery why this is happening, it could be that the task to save the notifications has been dropped.
So until we solve the missing let's find missing rows and process them.

A new scheduled task has been added to find any "finished" jobs that do not have enough notifications created. If there are missing notifications the job processes those rows for the job.
Adding the new task to beat schedule will be done in the next commit.

A unique key constraint has been added to Notifications to ensure that the row is not added twice. Any index or constraint can affect performance, but this unique constraint should not affect it enough for us to notice.
2019-11-06 10:49:46 +00:00
Leo Hemsted
975af113e4 Merge pull request #2639 from alphagov/remove-loadtesting-db-migration
remove loadtesting from the database
2019-11-06 10:49:46 +00:00
Katie Smith
a790acc091 Create a Zendesk ticket for letters in the wrong state
This creates a Zendesk ticket if either the
`check_precompiled_letter_state` or `check_templated_letter_state` tasks
fail.
2019-06-18 10:58:58 +01:00
Katie Smith
c518f6ca76 Add scheduled task to find old letters which still have 'created' status
Added a scheduled task to run once a day and check if there were any
letters from before 17.30 that still have a status of 'created'. This
logs an exception instead of trying to fix the error because the fix
will be different depending on which bucket the letter is in.
2019-06-18 10:58:58 +01:00
Katie Smith
a2f324ad7e Add scheduled task to find precompiled letters in wrong state
Added a task which runs twice a day on weekdays and checks for letters that have
been in the state of `pending-virus-check` for over 90 minutes. This is
just logging an exception for now, not trying to fix things, since we
will need to manually check where the issue was.
2019-06-18 10:58:58 +01:00
Leo Hemsted
2f94e1d9bc lower provider switch threshold from 20% to 30%
make it less likely to switch on slow messages to allow more manual
control of provider balance
2019-03-14 16:11:59 +00:00
Leo Hemsted
f00bfdfe85 move slow sms provider threshold from 10% to 20%
provider switching is a process that can happen as often as we like
without disrupting the flow of the system - however, there are some
reasons why we might not want to switch. One problem we've seen is
when a provider is having an issue, we might switch away from them
manually only for the app to automatically switch back to them again
and again.

Long term we'd like to have a system better suited for sharing the load
equally between our two sms providers, but short term, by increasing
the threshold for switching from 10% (of messages sent are slow) to
20%, we hope to make switching happen less often.

A notification is considered slow if it was sent in the last ten
minutes, on the current provider, and is either

* still in sending or pending after 4 minutes
* in delivered, but took at least 4 minutes to send
2019-02-25 14:29:39 +00:00
Leo Hemsted
a617ccca9d allow pending notifications to influence switchover.
Currently we switch if:

* status = delivered and updated_at - sent_at > threshold
* status = sending and now - sent_at > threshold

firetext can leave notifications in the pending state, which is
equivalent to sending in terms of how we should handle it, so this
commit changes the second case to allow pending as well as sending.
2019-02-21 16:30:42 +00:00
Leo Hemsted
d3d56a3224 separate nightly tasks and other scheduled tasks.
other tasks is anything that is run on a different frequency than
nightly
2019-01-18 15:36:53 +00:00
Rebecca Law
efad58edd8 There is no need to have a separate table to store template monthly statistics. It's easy enough to aggregate the stats from ft_notification_status.
This removes the nightly task, and all the dao methods.
The next PR will remove the table.
2019-01-14 16:30:36 +00:00
Rebecca Law
62a8076161 Commit the deletes every 10,000 rows. 2018-12-21 13:57:35 +00:00
Katie Smith
a4f2880721 Fix log messages when emails and letters don't get deleted 2018-12-20 10:57:14 +00:00
Pea Tyczynska
af185adf4c Log the ratio of slow notifications 2018-12-11 15:28:38 +00:00
Pea Tyczynska
abe01c0bc0 Revert "Switch providers on slow delivery only produces logs"
This reverts commit 6938600ab8.
2018-12-11 15:14:08 +00:00
Pea Tyczynska
6938600ab8 Switch providers on slow delivery only produces logs 2018-12-05 15:56:16 +00:00
Pea Tyczynska
418060fbdb Update switch provider on slow delivery task to change max once evey 10 minutes 2018-12-05 15:56:16 +00:00
Pea Tyczynska
50811c3b8e Archive job after corresponding file deleted from s3 2018-11-28 14:38:59 +00:00
Pea Tyczynska
e5fd027192 Move nightly tasks before introduction of archived flag on jobs 2018-11-28 14:38:59 +00:00
Pea Tyczynska
be6f37069b Change job selection dao to take flexible retention into account
Also test deleting jobs with flexible data retention

Also update tests for default data retention following logic
change: dao_get_jobs_older_than_data_retention now counts
today at the start of the day, not at a time when function runs
and updated tests reflect that
2018-11-28 14:37:43 +00:00
Rebecca Law
7a16ac35bd Remove letter-jobs api
When we first built letters you could only send them via a CSV upload, initially we needed a way to send those files to dvla per job.
We since stopped using this page. So let's delete it!
2018-11-15 17:24:37 +00:00
Rebecca Law
f1b04193ca In this PR we remove trigger-letter-pdfs-for-day scheduled task and just call collate_letter_pdfs_for_day instead.
There was a datetime bug in the query which resulted in files not being sent to the postal provider.
The trigger-letter-pdfs-for-day task is no longer needed, so rather than fix the query just call collate_letter_pdfs_for_day directly.
Less code is always better.

Deployment considerations: I realized this is strictly not backwards compatible if the scheduled job is in progress and a task is on the queue that no longer exists. This is ok since we will deploy this well before 17:50.
2018-09-12 17:16:34 +01:00
Pea Tyczynska
ca2b350a99 Remove references to monthly_billing table from api 2018-07-30 11:07:42 +01:00
Pea Tyczynska
c0f309a2a6 Delete scheduled task to populate monthly_billing 2018-07-30 11:06:04 +01:00
Pea (Malgorzata Tyczynska)
1b21a12b83 Merge pull request #1971 from alphagov/add_task_to_send_complaints_on
Add and call task to send complaints on to service callback APIs
2018-07-24 15:07:09 +01:00
Leo Hemsted
6e87b36303 remove duplication shutdown loggers
also add **kwargs to make it celery4 compatible
2018-07-20 12:09:00 +01:00
Pea Tyczynska
812f4d20dd Send complaints on to service callback APIs using an async task 2018-07-19 16:59:39 +01:00
Pea Tyczynska
86978c225a Filter 'get_service_callback_api_for_service' to only get status updates
Also rename it to 'get_service_delivery_status_callback_api_for_service'
2018-07-18 11:36:39 +01:00
Leo Hemsted
897ab93148 zendesk instead of deskpro 2018-04-27 16:36:39 +01:00
Rebecca Law
167f7a18e3 Fix the query that raises the alert for letters still in sending.
If Monday or Tuesday check for letters still sending after 4 days.
If Saturday or Sunday do nothing
If Wed, Thurs, Fri check for letters still sending after 2 days

Added test for Tuesday, corrected tests after the correction to query.
2018-04-25 10:10:25 +01:00
Katie Smith
417d382d1b Add extra day before raising letter still sending alert
We now want to wait an extra day before sending the alert that letters
are still sending.
2018-04-10 09:29:29 +01:00
Rebecca Law
9549ada200 Run task every 15 minutes.
Move variable to task from config.
2018-03-26 10:26:24 +01:00
Rebecca Law
f596d17bf2 If a sms or email has not been sent after 4 hours and 15 minutes then put it on the delivery queue. 2018-03-23 15:38:35 +00:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Rebecca Law
c9477a7400 When a notification is timed out in the scheduled task that may happen because the notification has not been sent.
Which means the sent_at date for the notification could be empty causing the service callback to fail.

- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
2018-03-16 14:47:56 +00:00
Rebecca Law
68d658086b Merge pull request #1759 from alphagov/fix-bug-timeout-notifications
Fix the bug in timeout notifications.
2018-03-13 09:34:33 +00:00
Rebecca Law
144356f096 Fix the bug in timeout notifications.
When the notification is timedout by the scheduled task if the service is expecting a status update, that update to the service would fail.
A test has been added.
2018-03-12 12:15:03 +00:00
Katie Smith
4f7dd1d258 Delete job statistics tasks
The tasks are no longer being used, so can be deleted safely:
* record_initial_job_statistics
* record_outcome_job_statistics
* timeout-job-statistics

The test file for the statistics tasks was deleted in a previous commit.
2018-03-12 10:48:46 +00:00