notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-23 17:01:35 -05:00

Author	SHA1	Message	Date
Rebecca Law	22c296b0ef	Merge pull request #1780 from alphagov/send-service-callback-if-sent_at_is-None Send service callback if sent at is none	2018-03-19 16:32:15 +00:00
Rebecca Law	fdfd6838a6	Fix error message. The id in the message is referring to a notification not a service	2018-03-19 15:24:59 +00:00
Rebecca Law	cd2d85f2a3	Updates after code review. - Remove print - Update exception message.	2018-03-19 14:08:38 +00:00
Rebecca Law	0dc50190b2	Throw an exception whenever we updated a notification to technical failure. If this is happening we want to know about it.	2018-03-16 17:18:44 +00:00
Rebecca Law	c9477a7400	When a notification is timed out in the scheduled task that may happen because the notification has not been sent. Which means the sent_at date for the notification could be empty causing the service callback to fail. - Allow code to work if notification.sent_at or updated_at is None - Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.	2018-03-16 14:47:56 +00:00
venusbb	7e2947790f	merged master and up migration version	2018-03-16 10:57:23 +00:00
venusbb	bb95a2784f	Create schedueled job, fixed tests	2018-03-16 09:22:34 +00:00
venusbb	ea70c6454a	Fine-tuning DB model and create taks for data migration - Removed unused columns in ft_billing - Create tasks for nightly data migration	2018-03-14 17:07:33 +00:00
Rebecca Law	82cc6d6bef	As it turns out the s3ftp used to mount the s3 bucket to our ftp server puts the file on s3 more than once. So the SNS topic is triggered more than once. We need to deal with this, it's ok when updating a notification status from delivered to delivered. But the DailySortedLetter counts are being doubled. Adding the file_name to the table as a unique key to get around this issue. It will mean we have multiple rows for each billing_day, but that's ok we can aggregate that. This will also give us a way to see which file created which count.	2018-03-14 17:04:58 +00:00
Katie Smith	41a9f5a06e	Change format of the letter response file we expect We were previously expecting the letter response files to be in the format of 'NOTIFY.<datetime>.RSP.TXT' but the response files we receive use '-' in the filenames instead of '.' which was causing an error when we tried to get the date from the filename.	2018-03-14 09:25:51 +00:00
Katie Smith	79b5a735e2	Ensure update letter notification status always handles temporary failure In the 'update-letter-notifications-statuses' task we want to ensure that temporary failures are always handled, regardless of whether the response file we receive contains unknown Sorted statuses or not.	2018-03-14 09:25:51 +00:00
Rebecca Law	68d658086b	Merge pull request #1759 from alphagov/fix-bug-timeout-notifications Fix the bug in timeout notifications.	2018-03-13 09:34:33 +00:00
Rebecca Law	144356f096	Fix the bug in timeout notifications. When the notification is timedout by the scheduled task if the service is expecting a status update, that update to the service would fail. A test has been added.	2018-03-12 12:15:03 +00:00
Katie Smith	4f7dd1d258	Delete job statistics tasks The tasks are no longer being used, so can be deleted safely: * record_initial_job_statistics * record_outcome_job_statistics * timeout-job-statistics The test file for the statistics tasks was deleted in a previous commit.	2018-03-12 10:48:46 +00:00
Leo Hemsted	ea2b0dfbc9	set processing_started in before earlier jobs are processed process_incomplete_jobs loops through jobs processing them in a single task. This means that if the job statuses are all 'error', and then the process_incomplete_jobs task fails, the later jobs in the list that never got picked up won't have their status set back to in progress, or their processing_started time - so will be stuck in 'error' forever. Instead, we set the job statuses to in progress and the start time to now before we process any - so if the incomplete_jobs task fails later, the jobs will be picked up (again) by the check_job_statuses task in half an hour's time	2018-03-09 17:20:26 +00:00
Leo Hemsted	64bb94af9e	set job status to error in the check_job_status scheduled task the process_incomplete_jobs task runs through all incomplete jobs in a loop, so it might not get a chance to update the processing_started time of the last job before check_job_status runs again (every minute). So before we even trigger the process_incomplete_jobs task, lets set the status of the jobs to error, so that we don't identify them for re-processing again.	2018-03-09 16:42:58 +00:00
Leo Hemsted	f0ca3d40de	reset job processing time when re-processing incomplete jobs we might stop processing jobs mid-way through if, for example, a deploy or downscale kills the box working on it. We have a scheduled task that identifies any job that we started processing more than half an hour ago that is still processing. However, we encountered a bug where we triggered the process_incomplete_job multiple times, because the processing_started of the job was still set to half an hour ago. If we reset the processing_started to the current time, then it won't get picked up by future runs of the check_job_status scheduled task.	2018-03-09 16:30:50 +00:00
Chris Hill-Scott	c7cc7822f7	Merge pull request #1738 from alphagov/refactored-csv-processing Bring in refactored CSV processing	2018-03-09 14:28:48 +00:00
Rebecca Law	a3d04ca672	Improve log message	2018-03-09 12:01:08 +00:00
Rebecca Law	00b17b5ad7	When we sent the service the status callback for a notification, we have all the information we need. Which means we can remove the need to request the data from the database. In order for the PR to be backwards compatible I have added an optional parameter "encrypted_status_update". If this is not None then the new code is called. The next PR will send the encrypted data to this task. A final PR will remove the code that uses the database to get the notification and service callback api.	2018-03-08 16:17:41 +00:00
Leo Hemsted	651c3062b9	retry service callbacks if the db queries fail we don't expect them to fail, but they might if we accidentally exhaust our connection pool. Just in case, lets retry.	2018-03-08 14:08:56 +00:00
Chris Hill-Scott	f902ba476c	Bring in refactored CSV processing Shouldn’t be any functional changes here, just things being named more clearly.	2018-03-08 13:12:00 +00:00
Katie Smith	7f2e9f507e	Delete functions which call the job statistics tasks The JobStatistics table is going to be deleted. There are currently 3 tasks which use the JobStatistics model via the Statistics DAO, so we need to make sure that these tasks aren't being used before they are deleted in a separate PR. This commit deletes: * The `create_initial_notification_statistic_tasks` function which gets used to call the `record_initial_job_statistics` task. * The `create_outcome_notification_statistic_tasks` function which gets used to call the `record_outcome_job_statistics` task. * And the scheduling of the `timeout-job-statistics` scheduled task.	2018-03-07 09:23:29 +00:00
Katie Smith	0c1d5c12a3	Merge pull request #1699 from alphagov/persist-sorted-in-letter-response-file Create and populate DailySortedLetter table	2018-03-07 09:06:33 +00:00
Katie Smith	136d89e2f1	Persist daily sorted letter counts from response file In the update_letter_notifications_statuses task we now check whether each row in the response file that we receive from the DVLA has the value of 'Sorted' or 'Unsorted' in the postcode validation field. We then calculate the number of Sorted and Unsorted rows for each day and save each day as a row in the daily_sorted_letter table. The data in daily_sorted_letter table should be in local time, so we convert the datetime before saving.	2018-03-06 09:09:02 +00:00
Chris Hill-Scott	ca167206d5	Add command to backfill Performance Platform totals We don’t have any way of playing back the totals we send to performance platform. This commit copies the command used to backfill the processing time and adapts it to backfill the totals instead. Under the hood it uses the same code that we use in the scheduled tasks to update performance platform on a daily basis. I had to modify this code to take a `day` argument because it was hardcoded to only work for ‘yesterday’.	2018-03-05 17:02:33 +00:00
Chris Hill-Scott	7e1aa03371	Send count of sent letters to performance platform Now we’ve been sending real letters for quite a while it would be nice to show how many.	2018-03-02 16:54:48 +00:00
Rebecca Law	8aea452df9	Because research mode and test keys do not create notification history, a try except block has been added.	2018-03-02 11:37:15 +00:00
Rebecca Law	c474b2312b	Process responses for letters even after the notification has been deleted. This will continue to update the notification history for letter notifications. We currently have an issue where the responses to letters from the provider is taking a long time. This is due to the manual nature of their process. Updating the status of the letter will still work if the notification has been purged. Also turned back on the purge letter notification scheduled task.	2018-03-02 11:29:22 +00:00
Rebecca Law	d6c929d127	Another unused method to delete	2018-03-02 11:05:06 +00:00
Rebecca Law	bffc4863db	Remove all methods no longer used now that we only send pdf files to DVLA.	2018-03-02 11:05:05 +00:00
Rebecca Law	304f4d5c67	Remove unused methods	2018-03-02 11:05:05 +00:00
Rebecca Law	891a80addf	Added notification id to the log message for upload pdf to make it easier to search for the letter notification in the logs.	2018-03-01 10:37:07 +00:00
Rebecca Law	7faf375375	Merge pull request #1695 from alphagov/org-user-endpoints Organisation user endpoints	2018-02-26 16:27:01 +00:00
Alexey Bezhan	8971a5adce	Upload pre-compiled letter PDF to S3 Pre-compiled letter endpoint uploads PDF contents to S3 directly instead of creating a letter task to generate PDF using template preview. This moves some of the utility functions used by existing letter celery tasks to app.letters.utils, so that they can be reused by the API endpoint.	2018-02-23 17:52:25 +00:00
Leo Hemsted	5b71d2f36e	add org invite template to db	2018-02-23 10:45:18 +00:00
Leo Hemsted	a2a1c5e9af	add organisation invite rest and dao	2018-02-23 10:45:18 +00:00
Rebecca Law	385653af44	Added a new exception type for DVLAException. The Notify team needs to investigate when a notification is marked as failed. We will process the whole file and mark the notifications with the appropriate status, if any are failed an exception is raised. The exception will trigger a cloud watch error for the team to investigate.	2018-02-22 15:05:37 +00:00
Alexey Bezhan	8a67b714e2	Merge pull request #1651 from alphagov/eventlet-celery-workers-service-callbacks Switch service callback workers to use eventlet pool implementation	2018-02-15 11:08:54 +00:00
Rebecca Law	e736c90d00	Switch to using the pdf letter flow. When sending letters always use the pdf letter flow regardless of service permissions.	2018-02-13 18:38:32 +00:00
Alexey Bezhan	9eada23392	Release DB connection before executing service API callback Flask-SQLAlchemy sets up a connection pool with 5 connections and will create up to 10 additional connections if all the pool ones are in use. If all connections in the pool and all overflow connections are in use, SQLAlchemy will block new DB sessions until a connection becomes available. If a session can't acquire a connections for a specified time (we set it to 30s) then a TimeoutError is raised. By default db.session is deleted with the related context object (so when the request is finished or app context is discarded). This effectively limits the number of concurrent requests/tasks with multithreaded gunicorn/celery workers to the maximum DB connection pool size. Most of the time these limits are fine since the API requests are relatively quick and are mainly interacting with the database anyway. Service callbacks however have to make an HTTP request to a third party. If these requests start taking a long time and the number of threads is larger than the number of DB connections then remaining threads will start blocking and potentially failing if it takes more than 30s to acquire a connection. For example if a 100 threads start running tasks that take 20s each with a max DB connection pool size of 10 then first 10 threads will acquire a connection right away, next 10 tasks will block for 20 seconds before the initial connections are released and all other tasks will raise a TimeoutError after 30 seconds. To avoid this, we perform all database operations at the beginning of the task and then explicitly close the DB session before sending the HTTP request to the service callback URL. Closing the session ends the transaction and frees up the connection, making it available for other tasks. Making calls to the DB after calling `close` will acquire a new connection. This means that tasks are still limited to running at most 15 queries at the same time, but can have a lot more concurrent HTTP requests in progress.	2018-02-13 16:44:30 +00:00
Alexey Bezhan	599e611278	Create an app context for each celery task run Celery tasks require an active app context in order to use app logger, database or app configuration values. Since there's no builtin support we create this context by using a custom celery task class NotifyTask. However, NotifyTask was using `current_app`, which itself is only available within an app context so the code was pushing an initial context in run_celery.py. This works with the default prefork pool implementation, but raises a "working outside of application context" error with an eventlet pool since that initial application context is local to a thread and eventlet celery worker pool will create multiple green threads. To avoid this, we bind NotifyTask to the app variable with a closure and use that variable instead of `current_app` to create the context for executing the task. This avoids any issues caused by shared initial app context being lost when spawning additional worker threads. We still need to keep the context push in run_celery.py for prefork workers since it's required for logging events outside of tasks (eg logging during `worker_process_shutdown` signal processing).	2018-02-13 16:44:30 +00:00
Leo Hemsted	1ef6718918	Merge pull request #1646 from alphagov/add-tags-to-pdf-letters upload letter pdfs with retention tag	2018-02-12 15:03:53 +00:00
Leo Hemsted	093e8083e0	upload letter pdfs with retention tag so we can delete them automatically with s3's lifecycle policy	2018-02-09 17:13:37 +00:00
Richard Chapman	632364633b	Fixed typo in call, should be self.name not Task.name. Task.name always returned None.	2018-02-08 17:47:59 +00:00
Rebecca Law	8aa829e93a	Merge pull request #1622 from alphagov/reduce-logging Reducing the logging for the life cycle of a notification	2018-02-08 16:57:21 +00:00
Richard Chapman	d855b4e4ec	Removed statsd from the api and use the statsd in the utils library. The statsd code was added to the utils library a while ago, uses the statsd from the util library and therefore consolidates the code into once place.	2018-02-06 09:52:15 +00:00
Richard Chapman	2d670e8cf0	Updated utils to the latest version. This version of utils has less logging at info level and as such no longer prints out the celery task timing which are found to be use to find out if a tasks has been called but also the timing for the task. Added an extra timing message for celery tasks so that it can be determined if the these are less frequent than the API calls and provide more useful information	2018-02-05 14:58:02 +00:00
kentsanggds	4773e1b51b	Merge pull request #1617 from alphagov/ken-set-page-count-0-fake-dvla-response Update the dvla response data to 0 page count	2018-02-05 10:48:28 +00:00
Rebecca Law	dce79832ff	As Notify matures we probably need less logging, especially to report happy path events. This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2. Messaging would change to something like this: February 2nd 2018, 15:39:05.885 Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126 {'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']} February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126 February 2nd 2018, 15:38:57.727 SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814 February 2nd 2018, 15:38:56.727 Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181 February 2nd 2018, 15:38:56.727 Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827 February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113 February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery To somthing like this: February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126 February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113	2018-02-02 15:55:25 +00:00

... 3 4 5 6 7 ...

709 Commits