notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-13 16:52:23 -05:00

Author	SHA1	Message	Date
Kenneth Kehl	46c5cb24fe	lazy init encryption	2025-10-06 12:16:34 -07:00
Kenneth Kehl	ebdb78e52e	reformat up to latest version of black	2024-04-01 15:12:33 -07:00
Kenneth Kehl	1ecb747c6d	reformat	2023-08-29 14:54:30 -07:00
stvnrlly	e9fdfd59f4	clean flake8 except provider code	2022-10-19 16:16:26 +00:00
jimmoffet	434b7b2d08	clean up and remove redundancy	2022-10-04 16:01:30 -07:00
David McDonald	0d952b4d8c	Reduce timeout for service callback attempt to 5 seconds It is currently 60 seconds but we have had two incidents in the past week where there is a connection error talking to a service and the request takes up to 60 seconds before failing. When this happens, if there are a few of these callbacks then all of them will completely hog the service callback worker and build up a big queue of all the other service callbacks. 5 seconds has been chosen as that is still a pretty decent length time for a simple web request that should just be giving them a little bit of information for them to store. 5 seconds should be a sufficient enough reduction that we dramatically reduce this problem for the moment. Open to this number being changed in the future based on how we see it perform.	2022-03-08 13:05:32 +00:00
Leo Hemsted	2ad9a3a380	retry service callbacks on 429 if we're served a 429, put the item on the retry queue and retry the same as if the service returned a 5xx. 429 is commonly returned for rate limit exceeding, and retrying on a delay is a typical response to that.	2021-07-13 16:09:17 +01:00
Ben Thorner	e3e067c795	Remove redundant @statsd timing decorators These are superseded by timing task execution generically in the NotifyTask superclass [1]. Note that we need to wait until we've gathered enough data under the new metrics before removing these. [1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376	2021-04-12 15:19:18 +01:00
Katie Smith	3bfd084a77	Simplify send_delivery_status_to_service code Now that https://github.com/alphagov/notifications-api/pull/3184 has been deployed for a while, the `send_delivery_status_to_service` task will always have `template_id` and `template_version` being passed in. This means we don't need to check if those fields are there.	2021-03-30 08:50:42 +01:00
Katie Smith	27b3cece7d	Send template id and version with delivery status callback This adds the `template_id` and `template_version` fields to the data sent to services from the `send_delivery_status_to_service` task. We need to account for the task not being passed these fields at first since there might be tasks retrying which don't have that data. Once all tasks have been called with the new fields we can then update the code to assume they are always there. Since we only send delivery status callbacks for SMS and emails, I've removed the tests where we call that task with letters.	2021-03-24 10:55:45 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Katie Smith	5eebcf6452	Put service callback retries on a different queue At the moment, if a service callback fails, it will get put on the retry queue. This causes a potential problem though: If a service's callback server goes down, we may generate a lot of retries and this may then put a lot of items on the retry queue. The retry queue is also responsible for other important parts of Notify such as retrying message delivery and we don't want a service's callback server going down to have an impact on the rest of Notify. Putting the retries on a different queue means that tasks get processed faster than if they were put back on the same 'service-callbacks' queue.	2021-02-09 13:31:16 +00:00
Pea Tyczynska	95deb5a52f	Move DATETIME_FORMAT from app to app.utils To avoid cyclical import issues	2020-12-18 17:39:35 +00:00
David McDonald	224d9bf35a	Log when we don't retry a callback We don't retry any callbacks when it receives a 4xx status. We should probably be aware of this happening and at the moment there is nothing in our logs to easily identify whether the request failed and is being retried or if it failed and is not being retried. This will enable us to search our logs easily and figure out how much it's happening. It's quite likely that we should in the future allow callbacks to retry if they get a 429 http response (rate limiting) but we should do this in a smart way (exponential backoff) and so this is a first step to being aware of how big a problem it is in case we want to do something about it.	2020-11-17 11:26:32 +00:00
Pea Tyczynska	e033f3300b	Degrade MaxRetriesExceededError to warning status in logger This is because that error is caused by our providers and we cannot do anything about it but it can make our logs hard to read and actionable errors harder to spot	2019-06-27 14:55:10 +01:00
Rebecca Law	00f04c33c8	Some minor refactoring. - Updated notifications_dao.update_notification_status_by_id with an optional parameter to set the sent_by, this will eliminate a separate update to notifcaitons. - Added the callback url to the log message, that way we can see if it's the same url failing. - Stop sending the status callbacks for PENDING status.	2018-10-24 11:24:53 +01:00
Leo Hemsted	bc3fab09d0	don't log exception info for retries it includes task args, which might contain PII. And we don't need to know where the retry exception came from - it came from the line above	2018-10-22 11:33:16 +01:00
Pea Tyczynska	3048c05850	Revert notification_id name change in delivery_status service_callback_task data	2018-07-20 11:22:38 +01:00
Pea Tyczynska	812f4d20dd	Send complaints on to service callback APIs using an async task	2018-07-19 16:59:39 +01:00
Rebecca Law	ee46803a12	The send_delivery_status_to_service task was refactor to take the details of the notification and service api callback such that the task no longer needed to go to the database to provide the status update. This PR removes the code that is no longer used. This extra step was necessary to keep the tasks backward compatible.	2018-03-19 17:38:20 +00:00
Rebecca Law	fdfd6838a6	Fix error message. The id in the message is referring to a notification not a service	2018-03-19 15:24:59 +00:00
Rebecca Law	c9477a7400	When a notification is timed out in the scheduled task that may happen because the notification has not been sent. Which means the sent_at date for the notification could be empty causing the service callback to fail. - Allow code to work if notification.sent_at or updated_at is None - Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.	2018-03-16 14:47:56 +00:00
Rebecca Law	a3d04ca672	Improve log message	2018-03-09 12:01:08 +00:00
Rebecca Law	00b17b5ad7	When we sent the service the status callback for a notification, we have all the information we need. Which means we can remove the need to request the data from the database. In order for the PR to be backwards compatible I have added an optional parameter "encrypted_status_update". If this is not None then the new code is called. The next PR will send the encrypted data to this task. A final PR will remove the code that uses the database to get the notification and service callback api.	2018-03-08 16:17:41 +00:00
Leo Hemsted	651c3062b9	retry service callbacks if the db queries fail we don't expect them to fail, but they might if we accidentally exhaust our connection pool. Just in case, lets retry.	2018-03-08 14:08:56 +00:00
Rebecca Law	891a80addf	Added notification id to the log message for upload pdf to make it easier to search for the letter notification in the logs.	2018-03-01 10:37:07 +00:00
Alexey Bezhan	9eada23392	Release DB connection before executing service API callback Flask-SQLAlchemy sets up a connection pool with 5 connections and will create up to 10 additional connections if all the pool ones are in use. If all connections in the pool and all overflow connections are in use, SQLAlchemy will block new DB sessions until a connection becomes available. If a session can't acquire a connections for a specified time (we set it to 30s) then a TimeoutError is raised. By default db.session is deleted with the related context object (so when the request is finished or app context is discarded). This effectively limits the number of concurrent requests/tasks with multithreaded gunicorn/celery workers to the maximum DB connection pool size. Most of the time these limits are fine since the API requests are relatively quick and are mainly interacting with the database anyway. Service callbacks however have to make an HTTP request to a third party. If these requests start taking a long time and the number of threads is larger than the number of DB connections then remaining threads will start blocking and potentially failing if it takes more than 30s to acquire a connection. For example if a 100 threads start running tasks that take 20s each with a max DB connection pool size of 10 then first 10 threads will acquire a connection right away, next 10 tasks will block for 20 seconds before the initial connections are released and all other tasks will raise a TimeoutError after 30 seconds. To avoid this, we perform all database operations at the beginning of the task and then explicitly close the DB session before sending the HTTP request to the service callback URL. Closing the session ends the transaction and frees up the connection, making it available for other tasks. Making calls to the DB after calling `close` will acquire a new connection. This means that tasks are still limited to running at most 15 queries at the same time, but can have a lot more concurrent HTTP requests in progress.	2018-02-13 16:44:30 +00:00
Richard Chapman	d855b4e4ec	Removed statsd from the api and use the statsd in the utils library. The statsd code was added to the utils library a while ago, uses the statsd from the util library and therefore consolidates the code into once place.	2018-02-06 09:52:15 +00:00
Rebecca Law	a26588decd	Update the json in the service callback task to read completed_at rather than updated_at	2017-12-11 17:14:36 +00:00
venusbb	81ead8b246	code style fix	2017-12-05 11:23:21 +00:00
venusbb	5482ee4fe7	- wrap apply_async parameter notification_id in a str() argument - check if service_callback_api exist before putting tasks on queue - create_service_callback_api in tests before asserting if send_delivery_status_to_service has been called.	2017-12-04 17:58:38 +00:00
venusbb	489f43a2c9	rename callback_tasks.py to process_ses_receipts.py create service_callback_tasks.py for tasks to send delivery statuses to services	2017-12-01 16:15:21 +00:00

32 Commits