specifically, all of the performance platform specific data layout now
happens in performance_platform_client.py - stuff like setting the
_timestamp, period etc, and the perf platform-specific nomenclature is
all handled there.
so that it doesn't appear generic when it's actually specific to
sending the daily notification totals. To do this, split it out into a
separate performance_platform directory, containing the business logic,
and make the performance_platform_client incredibly thin - all it
handles is adding ids to payloads, and sending stats.
Also, some changes to the config (not all done yet) since there is one
token per endpoint, not one for the whole platform as we'd previously
coded
- This is a CONNECT and a READ timeout.
- Gets wrapped in the standard client exception, with a status code of 504, message Gateway timeout.
- Is quiet noisy in logs to allow us to see it
- Ensures we flick across the provider.
To test change the timeout to 0 and it will timeout.
1) It's incr not inc on the redis client, so renamed the calls everywhere
2) Redis returns bytes/string rather than an int if the value stored is an int. Cast the result to an int before use. Not you can set up the GET to do this transparently but I've not done this as we *may * use GETS for non-int and the callback sets up the cast for the connection not the call.
- Uses Redis cache to check for current count
- If not present then sets the value based on the database state
- Any Redis errors are swallowed. Cache failures should NOT fail the request.
we shouldn't try and use statsd to log an error if they fail, for example
[we also shouldn't retry sending the message but that's a problem for another time]
This allows us to prefix metrics with the environment to allow stats from staging and live to go to the same statsd, and alls us to filter in the dashboard by environment.
Removed all existing statsd logging and replaced with:
- statsd decorator. Infers the stat name from the decorated function call. Delegates statsd call to statsd client. Calls incr and timing for each decorated method. This is applied to all tasks and all dao methods that touch the notifications/notification_history tables
- statsd client changed to prefix all stats with "notification.api."
- Relies on https://github.com/alphagov/notifications-utils/pull/61 for request logging. Once integrated we pass the statsd client to the logger, allowing us to statsd all API calls. This passes in the start time and the method to be called (NOT the url) onto the global flask object. We then construct statsd counters and timers in the following way
notifications.api.POST.notifications.send_notification.200
This should allow us to aggregate to the level of
- API or ADMIN
- POST or GET etc
- modules
- methods
- status codes
Finally we count the callbacks received from 3rd parties to mapped status.
dont send reply_to_addresses around from process_job and send_email -
take it from the service in send_email_to_provider. also clean up
the kwarg in aws_ses.send_email to more accurately reflect what we
might pass in
In order to set a message as temporary-failure, we check if it is in pending status first.
Otherwise a delivery receipt for failure is set to permanent failure.