Removed all existing statsd logging and replaced with:
- statsd decorator. Infers the stat name from the decorated function call. Delegates statsd call to statsd client. Calls incr and timing for each decorated method. This is applied to all tasks and all dao methods that touch the notifications/notification_history tables
- statsd client changed to prefix all stats with "notification.api."
- Relies on https://github.com/alphagov/notifications-utils/pull/61 for request logging. Once integrated we pass the statsd client to the logger, allowing us to statsd all API calls. This passes in the start time and the method to be called (NOT the url) onto the global flask object. We then construct statsd counters and timers in the following way
notifications.api.POST.notifications.send_notification.200
This should allow us to aggregate to the level of
- API or ADMIN
- POST or GET etc
- modules
- methods
- status codes
Finally we count the callbacks received from 3rd parties to mapped status.
please ensure that any changes to notifications table happen through either dao_create_notification or dao_update_notification.
changed the notification status update triggered by the provider callbacks to ensure that sets updated_by and can update the history table.
also re-added the character_count so we can reconstruct billing data if needed.
triggered via calls in dao_create_notification and dao_update_notification - if you don't use those functions (eg update in bulk) you'll have to update the history table yourself!
This happens because more than one beat process was creating the timeout task, resulting in multiple workers running the same queries at the same time.
if api_key used to access endpoint is type team, endpoints only return that type -
will 404 if you provide a different ID. Same applies for normal (normal api keys
cannot see team notifications)
also, for convenience, set sample_notification to supply key_type of KEY_TYPE_NORMAL
by default
- 2 separate tasks - DB and Provider
- DB to persist notification
- Provider to contact provider
- Each piece has separate retries
- Provider retries have configured back-off
In order to set a message as temporary-failure, we check if it is in pending status first.
Otherwise a delivery receipt for failure is set to permanent failure.
If the notification has a status == sending then update the status otherwise do not update the status.
In other words do not change the status more than once.
We were using two different queries to filter template stats to the past
7 days, plus today.
Since we’re storing both as short dates, we can now use the same query
for both.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
-------|---------|-----------|----------|--------|----------|--------|-------
Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday | Monday
So if we are on Monday, the stats should include today, plus everything
back to last Monday.
Previously the template stats query was only going back to the Tuesday.
This should mean the numbers on the dashboard always line up.
This means we will retain notifications for a full week and not
delete records that are 7 x 24 hours older than the time of the run of
the deletion task.
Also the task only needs to run once a day now, so I have changed
the celery config for the deletion tasks.
adapted to recording inserts and updates.
This removes need to manually create history tables.
Our code still remains in control of when history records are
created.
https://www.pivotaltracker.com/story/show/117920839
On the dashboard we want to show counts of notifications sent in the
last 7 days, plus today. So the API enpoint needs to accept an argument
to limit how many days worth of statistics it will return.
This is a bit fiddly at the DAO level because the date is just stored as
a string.