Commit Graph

32 Commits

Author SHA1 Message Date
Kenneth Kehl
46c5cb24fe lazy init encryption 2025-10-06 12:16:34 -07:00
Kenneth Kehl
ebdb78e52e reformat up to latest version of black 2024-04-01 15:12:33 -07:00
Kenneth Kehl
1ecb747c6d reformat 2023-08-29 14:54:30 -07:00
stvnrlly
e9fdfd59f4 clean flake8 except provider code 2022-10-19 16:16:26 +00:00
jimmoffet
434b7b2d08 clean up and remove redundancy 2022-10-04 16:01:30 -07:00
David McDonald
0d952b4d8c Reduce timeout for service callback attempt to 5 seconds
It is currently 60 seconds but we have had two incidents in the
past week where there is a connection error talking to a service
and the request takes up to 60 seconds before failing. When this
happens, if there are a few of these callbacks then all of them
will completely hog the service callback worker and build up a big
queue of all the other service callbacks.

5 seconds has been chosen as that is still a pretty decent length
time for a simple web request that should just be giving them a
little bit of information for them to store. 5 seconds should be a
sufficient enough reduction that we dramatically reduce this problem
for the moment.

Open to this number
being changed in the future based on how we see it perform.
2022-03-08 13:05:32 +00:00
Leo Hemsted
2ad9a3a380 retry service callbacks on 429
if we're served a 429, put the item on the retry queue and retry the
same as if the service returned a 5xx. 429 is commonly returned for rate
limit exceeding, and retrying on a delay is a typical response to that.
2021-07-13 16:09:17 +01:00
Ben Thorner
e3e067c795 Remove redundant @statsd timing decorators
These are superseded by timing task execution generically in the
NotifyTask superclass [1]. Note that we need to wait until we've
gathered enough data under the new metrics before removing these.

[1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376
2021-04-12 15:19:18 +01:00
Katie Smith
3bfd084a77 Simplify send_delivery_status_to_service code
Now that https://github.com/alphagov/notifications-api/pull/3184 has
been deployed for a while, the `send_delivery_status_to_service` task will
always have `template_id` and `template_version` being passed in. This
means we don't need to check if those fields are there.
2021-03-30 08:50:42 +01:00
Katie Smith
27b3cece7d Send template id and version with delivery status callback
This adds the `template_id` and `template_version` fields to the data
sent to services from the `send_delivery_status_to_service` task.

We need to account for the task not being passed these fields at first
since there might be tasks retrying which don't have that data. Once all
tasks have been called with the new fields we can then update the code
to assume they are always there.

Since we only send delivery status callbacks for SMS and emails, I've
removed the tests where we call that task with letters.
2021-03-24 10:55:45 +00:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Katie Smith
5eebcf6452 Put service callback retries on a different queue
At the moment, if a service callback fails, it will get put on the retry queue.
This causes a potential problem though:

If a service's callback server goes down, we may generate a lot of retries and
this may then put a lot of items on the retry queue. The retry queue is also
responsible for other important parts of Notify such as retrying message
delivery and we don't want a service's callback server going down to have an
impact on the rest of Notify.

Putting the retries on a different queue means that tasks get processed
faster than if they were put back on the same 'service-callbacks' queue.
2021-02-09 13:31:16 +00:00
Pea Tyczynska
95deb5a52f Move DATETIME_FORMAT from app to app.utils
To avoid cyclical import issues
2020-12-18 17:39:35 +00:00
David McDonald
224d9bf35a Log when we don't retry a callback
We don't retry any callbacks when it receives a 4xx status. We should
probably be aware of this happening and at the moment there is nothing
in our logs to easily identify whether the request failed and is being
retried or if it failed and is not being retried. This will enable us to
search our logs easily and figure out how much it's happening.

It's quite likely that we should in the future allow callbacks to retry
if they get a 429 http response (rate limiting) but we should do this in
a smart way (exponential backoff) and so this is a first step to being
aware of how big a problem it is in case we want to do something about
it.
2020-11-17 11:26:32 +00:00
Pea Tyczynska
e033f3300b Degrade MaxRetriesExceededError to warning status in logger
This is because that error is caused by our providers and we
cannot do anything about it but it can make our logs hard to read
and actionable errors harder to spot
2019-06-27 14:55:10 +01:00
Rebecca Law
00f04c33c8 Some minor refactoring.
- Updated notifications_dao.update_notification_status_by_id with an optional parameter to set the sent_by, this will eliminate a separate update to notifcaitons.
- Added the callback url to the log message, that way we can see if it's the same url failing.
- Stop sending the status callbacks for PENDING status.
2018-10-24 11:24:53 +01:00
Leo Hemsted
bc3fab09d0 don't log exception info for retries
it includes task args, which might contain PII. And we don't need to
know where the retry exception came from - it came from the line above
2018-10-22 11:33:16 +01:00
Pea Tyczynska
3048c05850 Revert notification_id name change in delivery_status service_callback_task data 2018-07-20 11:22:38 +01:00
Pea Tyczynska
812f4d20dd Send complaints on to service callback APIs using an async task 2018-07-19 16:59:39 +01:00
Rebecca Law
ee46803a12 The send_delivery_status_to_service task was refactor to take the details of the notification and service api callback such that the task no longer needed to go to the database to provide the status update.
This PR removes the code that is no longer used. This extra step was necessary to keep the tasks backward compatible.
2018-03-19 17:38:20 +00:00
Rebecca Law
fdfd6838a6 Fix error message.
The id in the message is referring to a notification not a service
2018-03-19 15:24:59 +00:00
Rebecca Law
c9477a7400 When a notification is timed out in the scheduled task that may happen because the notification has not been sent.
Which means the sent_at date for the notification could be empty causing the service callback to fail.

- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
2018-03-16 14:47:56 +00:00
Rebecca Law
a3d04ca672 Improve log message 2018-03-09 12:01:08 +00:00
Rebecca Law
00b17b5ad7 When we sent the service the status callback for a notification, we have all the information we need.
Which means we can remove the need to request the data from the database.
In order for the PR to be backwards compatible I have added an optional parameter "encrypted_status_update".
If this is not None then the new code is called.

The next PR will send the encrypted data to this task.

A final PR will remove the code that uses the database to get the notification and service callback api.
2018-03-08 16:17:41 +00:00
Leo Hemsted
651c3062b9 retry service callbacks if the db queries fail
we don't expect them to fail, but they might if we accidentally
exhaust our connection pool. Just in case, lets retry.
2018-03-08 14:08:56 +00:00
Rebecca Law
891a80addf Added notification id to the log message for upload pdf to make it easier to search for the letter notification in the logs. 2018-03-01 10:37:07 +00:00
Alexey Bezhan
9eada23392 Release DB connection before executing service API callback
Flask-SQLAlchemy sets up a connection pool with 5 connections and will
create up to 10 additional connections if all the pool ones are in use.

If all connections in the pool and all overflow connections are in
use, SQLAlchemy will block new DB sessions until a connection becomes
available. If a session can't acquire a connections for a specified
time (we set it to 30s) then a TimeoutError is raised.

By default db.session is deleted with the related context object
(so when the request is finished or app context is discarded).

This effectively limits the number of concurrent requests/tasks with
multithreaded gunicorn/celery workers to the maximum DB connection pool
size. Most of the time these limits are fine since the API requests are
relatively quick and are mainly interacting with the database anyway.

Service callbacks however have to make an HTTP request to a third party.
If these requests start taking a long time and the number of threads is
larger than the number of DB connections then remaining threads will
start blocking and potentially failing if it takes more than 30s to
acquire a connection. For example if a 100 threads start running tasks
that take 20s each with a max DB connection pool size of 10 then first 10
threads will acquire a connection right away, next 10 tasks will block for
20 seconds before the initial connections are released and all other tasks
will raise a TimeoutError after 30 seconds.

To avoid this, we perform all database operations at the beginning of
the task and then explicitly close the DB session before sending the
HTTP request to the service callback URL. Closing the session ends
the transaction and frees up the connection, making it available for
other tasks. Making calls to the DB after calling `close` will acquire
a new connection. This means that tasks are still limited to running
at most 15 queries at the same time, but can have a lot more concurrent
HTTP requests in progress.
2018-02-13 16:44:30 +00:00
Richard Chapman
d855b4e4ec Removed statsd from the api and use the statsd in the utils library.
The statsd code was added to the utils library a while ago, uses the
statsd from the util library and therefore consolidates the code into
once place.
2018-02-06 09:52:15 +00:00
Rebecca Law
a26588decd Update the json in the service callback task to read completed_at rather than updated_at 2017-12-11 17:14:36 +00:00
venusbb
81ead8b246 code style fix 2017-12-05 11:23:21 +00:00
venusbb
5482ee4fe7 - wrap apply_async parameter notification_id in a str() argument
- check if service_callback_api exist before putting tasks on queue
- create_service_callback_api in tests before asserting if send_delivery_status_to_service has been called.
2017-12-04 17:58:38 +00:00
venusbb
489f43a2c9 rename callback_tasks.py to process_ses_receipts.py
create service_callback_tasks.py for tasks to send delivery statuses to services
2017-12-01 16:15:21 +00:00