The command takes a service id and a day, grabs the historical data for
that day (potentially out of notification_history), and pops it in
redis (for eight days, same as if it were written to manually).
also, prefix template usage key with "service" to make clear that it's
a service id, and not an individual template id.
We've run into issues with redis expiring keys while we try and write
to them - short lived redis TTLs aren't really sustainable for keys
where we mutate the state. Template usage is a hash contained in redis
where we increment a count keyed by template_id each time a message is
sent for that template. But if the key expires, hincrby (redis command
for incrementing a value in a hash) will re-create an empty hash.
This is no good, as we need the hash to be populated with the last
seven days worth of data, which we then increment further. We can't
tell whether the hincrby created the key, so a different approach
entirely was needed:
* New redis key: <service_id>-template-usage-<YYYY-MM-DD>. Note: This
YYYY-MM-DD is BTC time so it lines up nicely with ft_billing table
* Incremented to from process_notification - if it doesn't exist yet,
it'll be created then.
* Expiry set to 8 days every time it's incremented to.
Then, at read time, we'll just read the last eight days of keys from
Redis, and sum them up. This works because we're only ever incrementing
from that one place - never setting wholesale, never recreating the
data from scratch. So we know that if the data is in redis, then it is
good and accurate data.
One thing we *don't* know and *cannot* reason about is what no key in
redis means. It could be either of:
* This is the first message that the service has sent today.
* The key was deleted from redis for some reason.
Since we set the TTL to so long, we'll never be writing to a key that
previously expired. But if there is a redis (or operator) error and the
key is deleted, then we'll have bad data - after any data loss we'll
have to rebuild the data.
when functions get as big as that, it's confusing to try and work out what
things are what. By including a * as the first arg, we require that anyone
calling the function has to use kwargs to reference the parameters
* Alter config so an error will be raised if you forget to mock out a
celery call in one of your tests
* Remove an unneeded exception type that was masking errors
- uses the reference field on the notifications table to store a 16char random string used to cross reference DVLA letters back to the notification
- used as letter barcode does not have space for a UUID notification id
Depends on https://github.com/alphagov/notifications-utils/pull/149
Renamed the numeric_id to notification_reference in utils and changed validation rules to match this
Note also the persist_notification method set "reference" to be "client_reference" which is confusing and they are different things, so fixed this too.
This is being done for the PaaS migration to allow us to keep traffic coming in whilst we migrate the database.
uses the same tasks as the CSV uploaded notifications. Simple changes to not persist the notification, and call into a different task.
> If a user makes an API request with additional personalisation fields,
> we should simply discard any fields that the template doesn't have.
>
> This gives a couple of related advantages:
>
> - modifying template parameters no longer requires downtime for
> clients - as they can pass in extra new parameters before a template
> change, or continue passing in old unused parameters after removing
> them from a template
>
> - services can pass in large user objects, for example, and then play
> around with templates adding and removing fields at will
>
> we should make sure we still return an error if a user doesn't pass in
> a required parameter.
– https://www.pivotaltracker.com/story/show/140774195
Cache expires every 10 minutes, but will help with the every 2 second query, especially when a job is running.
There is some clean up and qa to do for this yet
- Added the `simulate` notification logic to version 2. We have 3 email addresses and phone numbers that are used
to simulate a successful post to /notifications. This was missed out of the version 2 endpoint.
- Added a test to template_dao to check for the default value of normal for new templates
- in v2 get_notifications, casted the path param to a uuid, if not uuid abort(404)
If the template is marked as priority the notification will be sent using the `notify` queue.
The `notify` queue is a low volume queue, messages here will not be queue behind a large job and should be delivered with in a more consistent time frame.
- Added templates.process_type and templates_history.process_type column.
- Added a template_process_type table to handle the enum for templates.process_type, initial values are normal and priority
https://www.pivotaltracker.com/story/show/135429147
- note this is an unexpectedly big change.
- When we create a service we pass the service id to the persist method. This means that we don't have the service available to check if in research mode.
- All calling methods (expecting the one where we use the notify service) have the service available. So rather than reload it I changed the method signature to pass the service, not the ID to persist.
- Touches a few places.
Note this means that the update or create methods will fall over on a null service. But this seems correct.
Goes back to the story which we need to play to make the service available as the API user so that the need to load and pass around services is minimised.
- Problem was that on notification creation we pass the template ID not the template onto the new notification object
- We then set the history object from this notification object by copying all the fields. This is OK at this point.
- We then set the relationship on the history object based on the template, which we haven't passed in. We only passed the ID. This means that SQLAlchemy nulls the relationship, removing the template_id.
- Later we update the history row when we send the message, this fixes the data. BUT if we ever have a send error, then this never happens and the template is never set on the history table.
Fix:
Set only the template ID when creating the history object.
This makes this test a couple of seconds faster - 0.7s instead of 2.5s for me
locally. sample_notification also creates a service, template, user and
permissions, but we don't need any of these objects to exist in the database
for this test. It's particularly helpful for this test because there are so
many parameterized cases. Thanks @leohemsted for suggesting doing this here.
In this PR the id for the notification is passed in and used to created the notification, which causes a integrity error.
Normally when we get a SQLAlchemy error here we send the message to the retry queue, but if the notification already exists
we just ignore it.