Commit Graph

654 Commits

Author SHA1 Message Date
Pea Tyczynska
e033f3300b Degrade MaxRetriesExceededError to warning status in logger
This is because that error is caused by our providers and we
cannot do anything about it but it can make our logs hard to read
and actionable errors harder to spot
2019-06-27 14:55:10 +01:00
Katie Smith
a790acc091 Create a Zendesk ticket for letters in the wrong state
This creates a Zendesk ticket if either the
`check_precompiled_letter_state` or `check_templated_letter_state` tasks
fail.
2019-06-18 10:58:58 +01:00
Katie Smith
c518f6ca76 Add scheduled task to find old letters which still have 'created' status
Added a scheduled task to run once a day and check if there were any
letters from before 17.30 that still have a status of 'created'. This
logs an exception instead of trying to fix the error because the fix
will be different depending on which bucket the letter is in.
2019-06-18 10:58:58 +01:00
Katie Smith
a2f324ad7e Add scheduled task to find precompiled letters in wrong state
Added a task which runs twice a day on weekdays and checks for letters that have
been in the state of `pending-virus-check` for over 90 minutes. This is
just logging an exception for now, not trying to fix things, since we
will need to manually check where the issue was.
2019-06-18 10:58:58 +01:00
Katie Smith
3d01276ce2 Log exception and set precompiled letter to tech-failure if S3 errors
The `process_virus_scan_passed` task now catches S3 errors - if these
occur, it logs an exception and puts the letter in a `technical-failure`
state. We don't retry the task, because the most common reason for
failure would be the letter not being in the expected S3 bucket, in
which case retrying would make no difference.
2019-06-18 10:58:58 +01:00
Leo Hemsted
5045590d75 allow you to pass in date to send perf stats
make it easier to replay sending data for a day if it failed the first
time round
2019-06-11 13:57:17 +01:00
Leo Hemsted
09888f7479 ensure cronitor decorator is inside the notify_task wrapper
the celery decorator should always be on the outside so that all other
decorators will be captured within the celery task. We had problems
with cronitor not reporting, and only for this task.
2019-06-03 11:46:07 +01:00
Rebecca Law
3374e03ce9 Prepare to stop inserting NotificationHistory at the time of inserting a notificaiton.
Need to remove foreign key to complaints.
Make sure if getting Notification.id we look to both tables.
2019-05-21 16:08:18 +01:00
Rebecca Law
e3ee99e70d Reduce the number of days to recalculate billing. It's not necessary to calculate longer than 4 days. 2019-05-15 14:40:53 +01:00
Katie Smith
c02b7edb92 Bump utils to bring in changes to RecipientCSV rows
Bumped utils to version 31.2.5, which changes when the rows of a
RecipientCSV get created. Switched to using `.get_rows()` from
RecipientCSV (a generator) instead of the `.rows` property (which builds
a list of the rows in memory).
2019-04-25 10:58:19 +01:00
Rebecca Law
1c68e0f565 Remove unused method.
last_n_days was only being used in a test.
2019-04-12 10:26:46 +01:00
Rebecca Law
4ce2b9eaba The rstrip was not working for all file names so this changes it to a replace. 2019-04-08 12:04:14 +01:00
Toby Lorne
0022923bd0 Add Cronitor decorator collate-letter-pdfs-for-day
This celery task was not decorated with the cronitor decorator so never
checked in with cronitor.

Adding the decorator will ensure this task is monitored.

The requisite cronitor key is in the credentials repository already.

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
2019-04-05 10:26:18 +01:00
Rebecca Law
dc8159104e Update letter_raise_alert_if_no_ack_file_for_zip for new DVLA file format
When we send a zip file of letters to DVLA we expect them to send back an acknowledgement of those files.
Previously they named the files like NOTIFY.20180202091254.ACK.TXT and the contents would contain the name of the zip file we sent with a date of when they got it.
They have updated this format to mirror the format of the zip file because there was an instance where they sent 2 files of the same name so the later overwrote the first.
Since the name matches our name, there is no need to get the file from S3 but just compare file names.
2019-04-03 11:03:42 +01:00
Leo Hemsted
1dc084be54 fix nightly ft stats tables task to respect BST
the create_nightly_notification_status task runs at 00:30am UK time,
however this means that in summer datetime.today() will return the
wrong date as the server (which runs on UTC) will run the task at
23:30 (populating the wrong row in the table).

fix this to use nice tz aware functions
2019-04-02 15:15:07 +01:00
Leo Hemsted
3739d9055d clean up usage of dates/datetimes in performance platform tasks
* call variables unambiguous things like `start_time` or `bst_date` to
  reduce risk of passing in the wrong thing
* simplify the count_dict object - remove nested dict and start_date
  fields as superfluous
* use static datetime objects in tests rather than calculating them
  each time
2019-04-02 11:49:20 +01:00
Rebecca Law
1456aa7789 Fix for performance platform updates.
Changed the query to get the performance platform stats from ft_notification_status. But the date used for the query needed to be a date, not datetime so the equality worked.
2019-04-01 12:03:57 +01:00
Rebecca Law
4105f6638e Split the update letter statuses from counting the daily sorted/unsorted numbers.
We need to back fill the daily_sorted_count tables, so we need to iterate through all the files. No need to update the notification status. So this task has been separated out.
2019-03-25 15:30:48 +00:00
Leo Hemsted
9da9968028 downgrade error to info 2019-03-22 14:06:45 +00:00
Leo Hemsted
6fa7f0290d ignore case in the cost_threshold in dvla response files
we failed when we received UNSORTED instead of Unsorted
2019-03-22 12:07:08 +00:00
Leo Hemsted
b288031adb add a hash of letter filenames to the dvla zip file name
if we partially retry a day, we would create new zip files, containing
different letters (if some were processed succesfully). We need these
files to have different filenames to earlier zip files so that we can
avoid overwriting log data in zips_sent.

Hashing the filename means that we'll only overwrite if it was the same
file containing the same content.
2019-03-21 15:40:24 +00:00
Leo Hemsted
334eb473ed separate batch num from date
DVLA don't care about the naming conventions of zip files, other than
it must start with `NOTIFY.` and end with `.ZIP`. So lets format the
date in a more readable way, and separate it from the batch number
2019-03-20 12:15:25 +00:00
Leo Hemsted
1a4baf4283 pass upload filename to notify-ftp
previously ftp would name the files itself by giving them a timestamp
when uploading. we ran into issues with tasks being picked up multiple
times and as such, uploading duplicate files. By naming the file before
creating the task, we can avoid this issue.

Files are now named `NOTIFY.YYYYMMDD######.ZIP` where the number is a
counter that increments with each task we've issued in that run of
collate-letter-pdfs-for-day
2019-03-19 13:48:17 +00:00
Leo Hemsted
2f94e1d9bc lower provider switch threshold from 20% to 30%
make it less likely to switch on slow messages to allow more manual
control of provider balance
2019-03-14 16:11:59 +00:00
Rebecca Law
1625371106 Merge pull request #2381 from alphagov/inbound-sms-retention
Inbound sms now deletes according to data retention
2019-03-08 10:58:01 +00:00
Alexey Bezhan
6f5822ae5b Downgrade log level for missing notifications in SES receipt
The timestamps available in the SES receipt don't always correspond
to the time the notification has been sent. We've seen callbacks with
a current timestamp in both 'mail' and 'bounce' objects that referenced
a notification sent a week ago, which means we can't rely on it to skip
archived notifications.

One possible approach would be to look up the notification reference in
the notification_history table, but this goes against our plans to stop
relying on it in the future.

This changes the SES receipts logic to retry missing notifications once
(if the callback timestamp is within the last 5 minutes the task will
retry after a 5 minute delay) to capture callbacks arriving before the
notification reference has been persisted to the DB. Otherwise, we log
the missing notification as a warning instead of error.
2019-03-06 11:35:32 +00:00
Leo Hemsted
653f1ab6b9 stub out antivirus in dev
antivirus is sometimes tough to get running locally - now in dev
antivirus is skipped unless `ANTIVIRUS_ENABLED=1` is set on the command
line. on all other environments it is always enabled.
2019-02-27 10:59:31 +00:00
Leo Hemsted
38f0ea6cca remove functions to not talk about 7 days
remind us that data retention is flexible
2019-02-26 17:57:35 +00:00
Leo Hemsted
f00bfdfe85 move slow sms provider threshold from 10% to 20%
provider switching is a process that can happen as often as we like
without disrupting the flow of the system - however, there are some
reasons why we might not want to switch. One problem we've seen is
when a provider is having an issue, we might switch away from them
manually only for the app to automatically switch back to them again
and again.

Long term we'd like to have a system better suited for sharing the load
equally between our two sms providers, but short term, by increasing
the threshold for switching from 10% (of messages sent are slow) to
20%, we hope to make switching happen less often.

A notification is considered slow if it was sent in the last ten
minutes, on the current provider, and is either

* still in sending or pending after 4 minutes
* in delivered, but took at least 4 minutes to send
2019-02-25 14:29:39 +00:00
Alexey Bezhan
c2e15d4ee2 Allow retry exception to propagate from ses callback task
Celery `self.retry` raises an exception to communicate that the task
needs to be retried. Since our ses task is wrapped in a catch-all
except block it logs that exception as an error before retrying.

Handling Retry class separately allows us to raise it without logging
the traceback.
2019-02-25 13:25:50 +00:00
Alexey Bezhan
2932b44eb8 Add retries for SES callbacks for recent notifications
We've seen errors caused by what we suspect is a race condition when
SES callback processing tries to look up the notification before the
sender worker has saved notification reference from the SES POST
response to the database.

This adds a retry for SES callback task if the notification was not
found and the message is less than 10 minutes old and removes the
error log message for notifications older than 3 days (since they
might no longer exist in the notifications table and would've been
marked as failure by then either way).

In order to be able to call retry and silence the error log based on
notification time this change inlines `process_ses_response` and
`update_notification_by_reference` functions into the celery task.
It also removes a lot of defensive error-handling that doesn't appear
to have been triggered in the last few months (for things like missing
keys in SES callback data).
2019-02-25 10:36:37 +00:00
Leo Hemsted
a617ccca9d allow pending notifications to influence switchover.
Currently we switch if:

* status = delivered and updated_at - sent_at > threshold
* status = sending and now - sent_at > threshold

firetext can leave notifications in the pending state, which is
equivalent to sending in terms of how we should handle it, so this
commit changes the second case to allow pending as well as sending.
2019-02-21 16:30:42 +00:00
Leo Hemsted
afc5c96927 Don't fallback to dvla_organisation if letter branding unset
The template preview app now accepts a null value for the `filename` 
parameter. If a service doesn't have a letter branding option set, 
previously we defaulted to their dvla_organisation (probably HM 
Government). Now, we pass through None, so that we generate letters 
without any logo or branding.
2019-02-13 11:58:54 +00:00
Rebecca Law
0b7fca4167 Merge branch 'master' into letter-branding 2019-01-24 16:39:30 +00:00
Rebecca Law
e4ea208d06 Use the letter_branding logo if it exists otherwise fall back to the dvla_organisation logo. 2019-01-23 12:51:09 +00:00
Leo Hemsted
f5198bf71d remove unnecessary job_types arg from remove_csv_files celery tasks 2019-01-22 10:31:37 +00:00
Leo Hemsted
754c65a6a2 create cronitor decorator that alerts if tasks fail
make a decorator that pings cronitor before and after each task run.
Designed for use with nightly tasks, so we have visibility if they
fail. We have a bunch of cronitor monitors set up - 5 character keys
that go into a URL that we then make a GET to with a self-explanatory
url path (run/fail/complete).

the cronitor URLs are defined in the credentials repo as a dictionary
of celery task names to URL slugs. If the name passed in to the
decorator  isn't in that dict, it won't run.

to use it, all you need to do is call `@cronitor(my_task_name)`
instead of `@notify_celery.task`, and make sure that the task name and
the matching slug are included in the credentials repo (or locally,
json dumped and stored in the CRONITOR_KEYS environment variable)
2019-01-18 15:36:53 +00:00
Leo Hemsted
d3d56a3224 separate nightly tasks and other scheduled tasks.
other tasks is anything that is run on a different frequency than
nightly
2019-01-18 15:36:53 +00:00
Pea (Malgorzata Tyczynska)
276a9a3828 Merge pull request #2293 from alphagov/choose_postage_for_precompiled
Choose postage on POST request for precompiled letters
2019-01-16 14:13:26 +00:00
Pea Tyczynska
5ebeb9937a Avoid call to database to get template in persist_notifications 2019-01-14 17:53:06 +00:00
Rebecca Law
efad58edd8 There is no need to have a separate table to store template monthly statistics. It's easy enough to aggregate the stats from ft_notification_status.
This removes the nightly task, and all the dao methods.
The next PR will remove the table.
2019-01-14 16:30:36 +00:00
Katie Smith
a9b755b08c Move letters which can't be opened to invalid PDF bucket
If a precompiled letter can't be opened (e.g. because it isn't a valid
PDF) we were setting its billable units to 0, but not moving it to the
invalid PDF bucket. If a precompiled letter failed sanitisation, we were
moving it to the invalid PDF bucket but not setting its billable units
to 0.

This commit makes sure that we always set the billable units to 0
and move the PDF to the right bucket if it fails sanitisation or can't be
opened.
2019-01-11 16:59:07 +00:00
Rebecca Law
62a8076161 Commit the deletes every 10,000 rows. 2018-12-21 13:57:35 +00:00
Katie Smith
a4f2880721 Fix log messages when emails and letters don't get deleted 2018-12-20 10:57:14 +00:00
Katie Smith
e9fb60f05c Send extra headers to Template Preview /precompiled/sanitise endpoint
We want to send two new headers, ServiceId and NotificationId to the
template preview /precompiled/sanitise endpoint. This is to allow us to log
errors from this endpoint in template preview with all the information needed,
instead of needing to pass the information back to notifications-api and
to log it there.
2018-12-19 13:49:27 +00:00
Pea Tyczynska
af185adf4c Log the ratio of slow notifications 2018-12-11 15:28:38 +00:00
Pea Tyczynska
abe01c0bc0 Revert "Switch providers on slow delivery only produces logs"
This reverts commit 6938600ab8.
2018-12-11 15:14:08 +00:00
Pea Tyczynska
6938600ab8 Switch providers on slow delivery only produces logs 2018-12-05 15:56:16 +00:00
Pea Tyczynska
418060fbdb Update switch provider on slow delivery task to change max once evey 10 minutes 2018-12-05 15:56:16 +00:00
Pea Tyczynska
50811c3b8e Archive job after corresponding file deleted from s3 2018-11-28 14:38:59 +00:00