Removed all existing statsd logging and replaced with:
- statsd decorator. Infers the stat name from the decorated function call. Delegates statsd call to statsd client. Calls incr and timing for each decorated method. This is applied to all tasks and all dao methods that touch the notifications/notification_history tables
- statsd client changed to prefix all stats with "notification.api."
- Relies on https://github.com/alphagov/notifications-utils/pull/61 for request logging. Once integrated we pass the statsd client to the logger, allowing us to statsd all API calls. This passes in the start time and the method to be called (NOT the url) onto the global flask object. We then construct statsd counters and timers in the following way
notifications.api.POST.notifications.send_notification.200
This should allow us to aggregate to the level of
- API or ADMIN
- POST or GET etc
- modules
- methods
- status codes
Finally we count the callbacks received from 3rd parties to mapped status.
dont send reply_to_addresses around from process_job and send_email -
take it from the service in send_email_to_provider. also clean up
the kwarg in aws_ses.send_email to more accurately reflect what we
might pass in
If the notification ends up in the retry queue and the delivery app is restarted the notification will get sent twice.
This is because when the app is restarted another message will be in the retry queue as message available which is a
duplicate of the one in the queue that is a message in flight.
This should complete https://www.pivotaltracker.com/story/show/121844427
10 seconds, 1 minute, 5 minutes, 1 hour and 4 hours.
Total elapsed wait is max 5 hours 6 minutes and 10 seconds.
Changed visibility window of SQS to be 4 hours 10 seconds, longer the max retry period.
- 2 separate tasks - DB and Provider
- DB to persist notification
- Provider to contact provider
- Each piece has separate retries
- Provider retries have configured back-off