Commit Graph

709 Commits

Author SHA1 Message Date
Rebecca Law
22c296b0ef Merge pull request #1780 from alphagov/send-service-callback-if-sent_at_is-None
Send service callback if sent at is none
2018-03-19 16:32:15 +00:00
Rebecca Law
fdfd6838a6 Fix error message.
The id in the message is referring to a notification not a service
2018-03-19 15:24:59 +00:00
Rebecca Law
cd2d85f2a3 Updates after code review.
- Remove print
- Update exception message.
2018-03-19 14:08:38 +00:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Rebecca Law
c9477a7400 When a notification is timed out in the scheduled task that may happen because the notification has not been sent.
Which means the sent_at date for the notification could be empty causing the service callback to fail.

- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
2018-03-16 14:47:56 +00:00
venusbb
7e2947790f merged master and up migration version 2018-03-16 10:57:23 +00:00
venusbb
bb95a2784f Create schedueled job, fixed tests 2018-03-16 09:22:34 +00:00
venusbb
ea70c6454a Fine-tuning DB model and create taks for data migration
- Removed unused columns in ft_billing
- Create tasks for nightly data migration
2018-03-14 17:07:33 +00:00
Rebecca Law
82cc6d6bef As it turns out the s3ftp used to mount the s3 bucket to our ftp server puts the file on s3 more than once. So the SNS topic is triggered more than once.
We need to deal with this, it's ok when updating a notification status from delivered to delivered. But the DailySortedLetter counts are being doubled.
Adding the file_name to the table as a unique key to get around this issue. It will mean we have multiple rows for each billing_day, but that's ok we can aggregate that.
This will also give us a way to see which file created which count.
2018-03-14 17:04:58 +00:00
Katie Smith
41a9f5a06e Change format of the letter response file we expect
We were previously expecting the letter response files to be in the
format of 'NOTIFY.<datetime>.RSP.TXT' but the response files we receive
use '-' in the filenames instead of '.' which was causing an error when
we tried to get the date from the filename.
2018-03-14 09:25:51 +00:00
Katie Smith
79b5a735e2 Ensure update letter notification status always handles temporary failure
In the 'update-letter-notifications-statuses' task we want to ensure
that temporary failures are always handled, regardless of whether the
response file we receive contains unknown Sorted statuses or not.
2018-03-14 09:25:51 +00:00
Rebecca Law
68d658086b Merge pull request #1759 from alphagov/fix-bug-timeout-notifications
Fix the bug in timeout notifications.
2018-03-13 09:34:33 +00:00
Rebecca Law
144356f096 Fix the bug in timeout notifications.
When the notification is timedout by the scheduled task if the service is expecting a status update, that update to the service would fail.
A test has been added.
2018-03-12 12:15:03 +00:00
Katie Smith
4f7dd1d258 Delete job statistics tasks
The tasks are no longer being used, so can be deleted safely:
* record_initial_job_statistics
* record_outcome_job_statistics
* timeout-job-statistics

The test file for the statistics tasks was deleted in a previous commit.
2018-03-12 10:48:46 +00:00
Leo Hemsted
ea2b0dfbc9 set processing_started in before earlier jobs are processed
process_incomplete_jobs loops through jobs processing them in a single
task. This means that if the job statuses are all 'error', and then the
process_incomplete_jobs task fails, the later jobs in the list that
never got picked up won't have their status set back to in progress, or
their processing_started time - so will be stuck in 'error' forever.

Instead, we set the job statuses to in progress and the start time to
now before we process any - so if the incomplete_jobs task fails later,
the jobs will be picked up (again) by the check_job_statuses task in
half an hour's time
2018-03-09 17:20:26 +00:00
Leo Hemsted
64bb94af9e set job status to error in the check_job_status scheduled task
the process_incomplete_jobs task runs through all incomplete jobs in
a loop, so it might not get a chance to update the processing_started
time of the last job before check_job_status runs again (every minute).
So before we even trigger the process_incomplete_jobs task, lets set
the status of the jobs to error, so that we don't identify them for
re-processing again.
2018-03-09 16:42:58 +00:00
Leo Hemsted
f0ca3d40de reset job processing time when re-processing incomplete jobs
we might stop processing jobs mid-way through if, for example, a
deploy or downscale kills the box working on it. We have a scheduled
task that identifies any job that we started processing more than half
an hour ago that is still processing.

However, we encountered a bug where we triggered the
process_incomplete_job multiple times, because the processing_started
of the job was still set to half an hour ago. If we reset the
processing_started to the current time, then it won't get picked up by
future runs of the check_job_status scheduled task.
2018-03-09 16:30:50 +00:00
Chris Hill-Scott
c7cc7822f7 Merge pull request #1738 from alphagov/refactored-csv-processing
Bring in refactored CSV processing
2018-03-09 14:28:48 +00:00
Rebecca Law
a3d04ca672 Improve log message 2018-03-09 12:01:08 +00:00
Rebecca Law
00b17b5ad7 When we sent the service the status callback for a notification, we have all the information we need.
Which means we can remove the need to request the data from the database.
In order for the PR to be backwards compatible I have added an optional parameter "encrypted_status_update".
If this is not None then the new code is called.

The next PR will send the encrypted data to this task.

A final PR will remove the code that uses the database to get the notification and service callback api.
2018-03-08 16:17:41 +00:00
Leo Hemsted
651c3062b9 retry service callbacks if the db queries fail
we don't expect them to fail, but they might if we accidentally
exhaust our connection pool. Just in case, lets retry.
2018-03-08 14:08:56 +00:00
Chris Hill-Scott
f902ba476c Bring in refactored CSV processing
Shouldn’t be any functional changes here, just things being named more
clearly.
2018-03-08 13:12:00 +00:00
Katie Smith
7f2e9f507e Delete functions which call the job statistics tasks
The JobStatistics table is going to be deleted. There are currently
3 tasks which use the JobStatistics model via the Statistics DAO, so we
need to make sure that these tasks aren't being used before they are
deleted in a separate PR.

This commit deletes:
* The `create_initial_notification_statistic_tasks` function which gets
used to call the `record_initial_job_statistics` task.
* The `create_outcome_notification_statistic_tasks` function which gets
used to call the `record_outcome_job_statistics` task.
* And the scheduling of the `timeout-job-statistics` scheduled task.
2018-03-07 09:23:29 +00:00
Katie Smith
0c1d5c12a3 Merge pull request #1699 from alphagov/persist-sorted-in-letter-response-file
Create and populate DailySortedLetter table
2018-03-07 09:06:33 +00:00
Katie Smith
136d89e2f1 Persist daily sorted letter counts from response file
In the update_letter_notifications_statuses task we now check whether
each row in the response file that we receive from the DVLA has the
value of 'Sorted' or 'Unsorted' in the postcode validation field. We
then calculate the number of Sorted and Unsorted rows for each day and
save each day as a row in the daily_sorted_letter table.

The data in daily_sorted_letter table should be in local time, so we
convert the datetime before saving.
2018-03-06 09:09:02 +00:00
Chris Hill-Scott
ca167206d5 Add command to backfill Performance Platform totals
We don’t have any way of playing back the totals we send to performance
platform.

This commit copies the command used to backfill the processing time and
adapts it to backfill the totals instead. Under the hood it uses the
same code that we use in the scheduled tasks to update performance
platform on a daily basis. I had to modify this code to take a `day`
argument because it was hardcoded to only work for ‘yesterday’.
2018-03-05 17:02:33 +00:00
Chris Hill-Scott
7e1aa03371 Send count of sent letters to performance platform
Now we’ve been sending real letters for quite a while it would be nice
to show how many.
2018-03-02 16:54:48 +00:00
Rebecca Law
8aea452df9 Because research mode and test keys do not create notification history, a try except block has been added. 2018-03-02 11:37:15 +00:00
Rebecca Law
c474b2312b Process responses for letters even after the notification has been deleted.
This will continue to update the notification history for letter notifications.
We currently have an issue where the responses to letters from the provider is taking a long time.
This is due to the manual nature of their process.
Updating the status of the letter will still work if the notification has been purged.

Also turned back on the purge letter notification scheduled task.
2018-03-02 11:29:22 +00:00
Rebecca Law
d6c929d127 Another unused method to delete 2018-03-02 11:05:06 +00:00
Rebecca Law
bffc4863db Remove all methods no longer used now that we only send pdf files to DVLA. 2018-03-02 11:05:05 +00:00
Rebecca Law
304f4d5c67 Remove unused methods 2018-03-02 11:05:05 +00:00
Rebecca Law
891a80addf Added notification id to the log message for upload pdf to make it easier to search for the letter notification in the logs. 2018-03-01 10:37:07 +00:00
Rebecca Law
7faf375375 Merge pull request #1695 from alphagov/org-user-endpoints
Organisation user endpoints
2018-02-26 16:27:01 +00:00
Alexey Bezhan
8971a5adce Upload pre-compiled letter PDF to S3
Pre-compiled letter endpoint uploads PDF contents to S3 directly
instead of creating a letter task to generate PDF using template
preview.

This moves some of the utility functions used by existing letter
celery tasks to app.letters.utils, so that they can be reused by
the API endpoint.
2018-02-23 17:52:25 +00:00
Leo Hemsted
5b71d2f36e add org invite template to db 2018-02-23 10:45:18 +00:00
Leo Hemsted
a2a1c5e9af add organisation invite rest and dao 2018-02-23 10:45:18 +00:00
Rebecca Law
385653af44 Added a new exception type for DVLAException.
The Notify team needs to investigate when a notification is marked as failed.
We will process the whole file and mark the notifications with the appropriate status, if any are failed an exception is raised.
The exception will trigger a cloud watch error for the team to investigate.
2018-02-22 15:05:37 +00:00
Alexey Bezhan
8a67b714e2 Merge pull request #1651 from alphagov/eventlet-celery-workers-service-callbacks
Switch service callback workers to use eventlet pool implementation
2018-02-15 11:08:54 +00:00
Rebecca Law
e736c90d00 Switch to using the pdf letter flow.
When sending letters always use the pdf letter flow regardless of service permissions.
2018-02-13 18:38:32 +00:00
Alexey Bezhan
9eada23392 Release DB connection before executing service API callback
Flask-SQLAlchemy sets up a connection pool with 5 connections and will
create up to 10 additional connections if all the pool ones are in use.

If all connections in the pool and all overflow connections are in
use, SQLAlchemy will block new DB sessions until a connection becomes
available. If a session can't acquire a connections for a specified
time (we set it to 30s) then a TimeoutError is raised.

By default db.session is deleted with the related context object
(so when the request is finished or app context is discarded).

This effectively limits the number of concurrent requests/tasks with
multithreaded gunicorn/celery workers to the maximum DB connection pool
size. Most of the time these limits are fine since the API requests are
relatively quick and are mainly interacting with the database anyway.

Service callbacks however have to make an HTTP request to a third party.
If these requests start taking a long time and the number of threads is
larger than the number of DB connections then remaining threads will
start blocking and potentially failing if it takes more than 30s to
acquire a connection. For example if a 100 threads start running tasks
that take 20s each with a max DB connection pool size of 10 then first 10
threads will acquire a connection right away, next 10 tasks will block for
20 seconds before the initial connections are released and all other tasks
will raise a TimeoutError after 30 seconds.

To avoid this, we perform all database operations at the beginning of
the task and then explicitly close the DB session before sending the
HTTP request to the service callback URL. Closing the session ends
the transaction and frees up the connection, making it available for
other tasks. Making calls to the DB after calling `close` will acquire
a new connection. This means that tasks are still limited to running
at most 15 queries at the same time, but can have a lot more concurrent
HTTP requests in progress.
2018-02-13 16:44:30 +00:00
Alexey Bezhan
599e611278 Create an app context for each celery task run
Celery tasks require an active app context in order to use app logger,
database or app configuration values. Since there's no builtin support
we create this context by using a custom celery task class NotifyTask.

However, NotifyTask was using `current_app`, which itself is only
available within an app context so the code was pushing an initial
context in run_celery.py.

This works with the default prefork pool implementation, but raises
a "working outside of application context" error with an eventlet
pool since that initial  application context is local to a thread
and eventlet celery worker pool will create multiple green threads.

To avoid this, we bind NotifyTask to the app variable with a closure
and use that variable instead of `current_app` to create the context
for executing the task. This avoids any issues caused by shared
initial app context being lost when spawning additional worker threads.

We still need to keep the context push in run_celery.py for prefork
workers since it's required for logging events outside of tasks
(eg logging during `worker_process_shutdown` signal processing).
2018-02-13 16:44:30 +00:00
Leo Hemsted
1ef6718918 Merge pull request #1646 from alphagov/add-tags-to-pdf-letters
upload letter pdfs with retention tag
2018-02-12 15:03:53 +00:00
Leo Hemsted
093e8083e0 upload letter pdfs with retention tag
so we can delete them automatically with s3's lifecycle policy
2018-02-09 17:13:37 +00:00
Richard Chapman
632364633b Fixed typo in call, should be self.name not Task.name.
Task.name always returned None.
2018-02-08 17:47:59 +00:00
Rebecca Law
8aa829e93a Merge pull request #1622 from alphagov/reduce-logging
Reducing the logging for the life cycle of a notification
2018-02-08 16:57:21 +00:00
Richard Chapman
d855b4e4ec Removed statsd from the api and use the statsd in the utils library.
The statsd code was added to the utils library a while ago, uses the
statsd from the util library and therefore consolidates the code into
once place.
2018-02-06 09:52:15 +00:00
Richard Chapman
2d670e8cf0 Updated utils to the latest version. This version of utils has less
logging at info level and as such no longer prints out the celery task
timing which are found to be use to find out if a tasks has been called
but also the timing for the task. Added an extra timing message for
celery tasks so that it can be determined if the these are less frequent
than the API calls and provide more useful information
2018-02-05 14:58:02 +00:00
kentsanggds
4773e1b51b Merge pull request #1617 from alphagov/ken-set-page-count-0-fake-dvla-response
Update the dvla response data to 0 page count
2018-02-05 10:48:28 +00:00
Rebecca Law
dce79832ff As Notify matures we probably need less logging, especially to report happy path events.
This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2.

Messaging would change to something like this:
February 2nd 2018, 15:39:05.885	Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
{'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']}
February 2nd 2018, 15:39:05.885	Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:57.727	SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814
February 2nd 2018, 15:38:56.727	Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181
February 2nd 2018, 15:38:56.727	Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827
February 2nd 2018, 15:38:49.449	sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
February 2nd 2018, 15:38:49.449	sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery

To somthing like this:
February 2nd 2018, 15:39:05.885	Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:49.449	sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
2018-02-02 15:55:25 +00:00