Commit Graph

19 Commits

Author SHA1 Message Date
David McDonald
224d9bf35a Log when we don't retry a callback
We don't retry any callbacks when it receives a 4xx status. We should
probably be aware of this happening and at the moment there is nothing
in our logs to easily identify whether the request failed and is being
retried or if it failed and is not being retried. This will enable us to
search our logs easily and figure out how much it's happening.

It's quite likely that we should in the future allow callbacks to retry
if they get a 429 http response (rate limiting) but we should do this in
a smart way (exponential backoff) and so this is a first step to being
aware of how big a problem it is in case we want to do something about
it.
2020-11-17 11:26:32 +00:00
Pea Tyczynska
e033f3300b Degrade MaxRetriesExceededError to warning status in logger
This is because that error is caused by our providers and we
cannot do anything about it but it can make our logs hard to read
and actionable errors harder to spot
2019-06-27 14:55:10 +01:00
Rebecca Law
00f04c33c8 Some minor refactoring.
- Updated notifications_dao.update_notification_status_by_id with an optional parameter to set the sent_by, this will eliminate a separate update to notifcaitons.
- Added the callback url to the log message, that way we can see if it's the same url failing.
- Stop sending the status callbacks for PENDING status.
2018-10-24 11:24:53 +01:00
Leo Hemsted
bc3fab09d0 don't log exception info for retries
it includes task args, which might contain PII. And we don't need to
know where the retry exception came from - it came from the line above
2018-10-22 11:33:16 +01:00
Pea Tyczynska
3048c05850 Revert notification_id name change in delivery_status service_callback_task data 2018-07-20 11:22:38 +01:00
Pea Tyczynska
812f4d20dd Send complaints on to service callback APIs using an async task 2018-07-19 16:59:39 +01:00
Rebecca Law
ee46803a12 The send_delivery_status_to_service task was refactor to take the details of the notification and service api callback such that the task no longer needed to go to the database to provide the status update.
This PR removes the code that is no longer used. This extra step was necessary to keep the tasks backward compatible.
2018-03-19 17:38:20 +00:00
Rebecca Law
fdfd6838a6 Fix error message.
The id in the message is referring to a notification not a service
2018-03-19 15:24:59 +00:00
Rebecca Law
c9477a7400 When a notification is timed out in the scheduled task that may happen because the notification has not been sent.
Which means the sent_at date for the notification could be empty causing the service callback to fail.

- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
2018-03-16 14:47:56 +00:00
Rebecca Law
a3d04ca672 Improve log message 2018-03-09 12:01:08 +00:00
Rebecca Law
00b17b5ad7 When we sent the service the status callback for a notification, we have all the information we need.
Which means we can remove the need to request the data from the database.
In order for the PR to be backwards compatible I have added an optional parameter "encrypted_status_update".
If this is not None then the new code is called.

The next PR will send the encrypted data to this task.

A final PR will remove the code that uses the database to get the notification and service callback api.
2018-03-08 16:17:41 +00:00
Leo Hemsted
651c3062b9 retry service callbacks if the db queries fail
we don't expect them to fail, but they might if we accidentally
exhaust our connection pool. Just in case, lets retry.
2018-03-08 14:08:56 +00:00
Rebecca Law
891a80addf Added notification id to the log message for upload pdf to make it easier to search for the letter notification in the logs. 2018-03-01 10:37:07 +00:00
Alexey Bezhan
9eada23392 Release DB connection before executing service API callback
Flask-SQLAlchemy sets up a connection pool with 5 connections and will
create up to 10 additional connections if all the pool ones are in use.

If all connections in the pool and all overflow connections are in
use, SQLAlchemy will block new DB sessions until a connection becomes
available. If a session can't acquire a connections for a specified
time (we set it to 30s) then a TimeoutError is raised.

By default db.session is deleted with the related context object
(so when the request is finished or app context is discarded).

This effectively limits the number of concurrent requests/tasks with
multithreaded gunicorn/celery workers to the maximum DB connection pool
size. Most of the time these limits are fine since the API requests are
relatively quick and are mainly interacting with the database anyway.

Service callbacks however have to make an HTTP request to a third party.
If these requests start taking a long time and the number of threads is
larger than the number of DB connections then remaining threads will
start blocking and potentially failing if it takes more than 30s to
acquire a connection. For example if a 100 threads start running tasks
that take 20s each with a max DB connection pool size of 10 then first 10
threads will acquire a connection right away, next 10 tasks will block for
20 seconds before the initial connections are released and all other tasks
will raise a TimeoutError after 30 seconds.

To avoid this, we perform all database operations at the beginning of
the task and then explicitly close the DB session before sending the
HTTP request to the service callback URL. Closing the session ends
the transaction and frees up the connection, making it available for
other tasks. Making calls to the DB after calling `close` will acquire
a new connection. This means that tasks are still limited to running
at most 15 queries at the same time, but can have a lot more concurrent
HTTP requests in progress.
2018-02-13 16:44:30 +00:00
Richard Chapman
d855b4e4ec Removed statsd from the api and use the statsd in the utils library.
The statsd code was added to the utils library a while ago, uses the
statsd from the util library and therefore consolidates the code into
once place.
2018-02-06 09:52:15 +00:00
Rebecca Law
a26588decd Update the json in the service callback task to read completed_at rather than updated_at 2017-12-11 17:14:36 +00:00
venusbb
81ead8b246 code style fix 2017-12-05 11:23:21 +00:00
venusbb
5482ee4fe7 - wrap apply_async parameter notification_id in a str() argument
- check if service_callback_api exist before putting tasks on queue
- create_service_callback_api in tests before asserting if send_delivery_status_to_service has been called.
2017-12-04 17:58:38 +00:00
venusbb
489f43a2c9 rename callback_tasks.py to process_ses_receipts.py
create service_callback_tasks.py for tasks to send delivery statuses to services
2017-12-01 16:15:21 +00:00