This allows us to prefix metrics with the environment to allow stats from staging and live to go to the same statsd, and alls us to filter in the dashboard by environment.
Removed all existing statsd logging and replaced with:
- statsd decorator. Infers the stat name from the decorated function call. Delegates statsd call to statsd client. Calls incr and timing for each decorated method. This is applied to all tasks and all dao methods that touch the notifications/notification_history tables
- statsd client changed to prefix all stats with "notification.api."
- Relies on https://github.com/alphagov/notifications-utils/pull/61 for request logging. Once integrated we pass the statsd client to the logger, allowing us to statsd all API calls. This passes in the start time and the method to be called (NOT the url) onto the global flask object. We then construct statsd counters and timers in the following way
notifications.api.POST.notifications.send_notification.200
This should allow us to aggregate to the level of
- API or ADMIN
- POST or GET etc
- modules
- methods
- status codes
Finally we count the callbacks received from 3rd parties to mapped status.
dont send reply_to_addresses around from process_job and send_email -
take it from the service in send_email_to_provider. also clean up
the kwarg in aws_ses.send_email to more accurately reflect what we
might pass in
In order to set a message as temporary-failure, we check if it is in pending status first.
Otherwise a delivery receipt for failure is set to permanent failure.
If the notification has a status == sending then update the status otherwise do not update the status.
In other words do not change the status more than once.
- new client for statsd, follows conventions used elsewhere for configuration
- client wraps underlying library so we can use a config property to send/not send statsd
Added statsd metrics for:
- count of API successful calls SMS/Email
- count of successful task execution for SMS/Email
- count of errors from Client libraries
- timing of API calls to third party clients
- timing of how long messages live on the SQS queue
Status codes from mmg are:
2: UNDELIVERABLE Message is undeliverable.
3: DELIVERED Message is delivered.
4: EXPIRED Message is expired.
5: REJECTED Message is rejected.
Refactor process_firetext_responses
Removed the abstract ClientResponses for firetext and mmg. There is a map for each response to handle the status codes sent by each client.
Since MMG has about 20 different status code, none of which seem to be a pending state (unlike firetext that has 3 status one for pending - network delay).
For MMG status codes, look for 00 as successful, everything else is assumed to be a failure.
- when a provider callback occurs and we update the status of the notification, also update the statistics table
Adds:
- Mapping object to the clients to handle mapping to various states from the response codes, this replaces the map.
- query lookup in the DAO to get the query based on response type / template type
Tests around rest class and dao to check correct updating of stats
Missing:
- multiple client callbacks will keep incrementing the counts of success/failure. This edge case needs to be handle in a future story.
- client updated to raise errors with fire text error codes/messages
New endpoint
- /notifications/sms/firetext
For delivery notifications to be sent to.