We've run into issues with redis expiring keys while we try and write
to them - short lived redis TTLs aren't really sustainable for keys
where we mutate the state. Template usage is a hash contained in redis
where we increment a count keyed by template_id each time a message is
sent for that template. But if the key expires, hincrby (redis command
for incrementing a value in a hash) will re-create an empty hash.
This is no good, as we need the hash to be populated with the last
seven days worth of data, which we then increment further. We can't
tell whether the hincrby created the key, so a different approach
entirely was needed:
* New redis key: <service_id>-template-usage-<YYYY-MM-DD>. Note: This
YYYY-MM-DD is BTC time so it lines up nicely with ft_billing table
* Incremented to from process_notification - if it doesn't exist yet,
it'll be created then.
* Expiry set to 8 days every time it's incremented to.
Then, at read time, we'll just read the last eight days of keys from
Redis, and sum them up. This works because we're only ever incrementing
from that one place - never setting wholesale, never recreating the
data from scratch. So we know that if the data is in redis, then it is
good and accurate data.
One thing we *don't* know and *cannot* reason about is what no key in
redis means. It could be either of:
* This is the first message that the service has sent today.
* The key was deleted from redis for some reason.
Since we set the TTL to so long, we'll never be writing to a key that
previously expired. But if there is a redis (or operator) error and the
key is deleted, then we'll have bad data - after any data loss we'll
have to rebuild the data.
Which means the sent_at date for the notification could be empty causing the service callback to fail.
- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
The JobStatistics table is going to be deleted. There are currently
3 tasks which use the JobStatistics model via the Statistics DAO, so we
need to make sure that these tasks aren't being used before they are
deleted in a separate PR.
This commit deletes:
* The `create_initial_notification_statistic_tasks` function which gets
used to call the `record_initial_job_statistics` task.
* The `create_outcome_notification_statistic_tasks` function which gets
used to call the `record_outcome_job_statistics` task.
* And the scheduling of the `timeout-job-statistics` scheduled task.
- If the SMS client sends a status code that we do not recognize raise a ClientException and set the notification status to technical-failure
- Simplified the code in process_client_response, using a simple map.
This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2.
Messaging would change to something like this:
February 2nd 2018, 15:39:05.885 Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
{'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']}
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:57.727 SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814
February 2nd 2018, 15:38:56.727 Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181
February 2nd 2018, 15:38:56.727 Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery
To somthing like this:
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
needed for monitoring the performance of the v2 endpoints. They were put
in as a temporary measure whilst sustained performance testing was
taking place.
The whitelist was built to help developers and designers making
prototypes to do realistic usability testing of them, without having to
go through the whole go live process.
These users are sending messages using the API. The whitelist wasn’t
made available to users uploading spreadsheets. The users sending one
off messages are similar to those uploading spreadsheets, not those
using the API. Therefore they shouldn’t be able to use the whitelist to
expand the range of recipients they can send to.
Passing the argument through three methods doesn’t feel that great, but
can’t think of a better way without major refactoring…
Previously, if the SMS recipient was None there would be a 500 error
with no message displayed to the user. We now check if the recipient is
None and raise a BadRequestError if this is the case.
PR #1550 added the rate_limit column to the Service table.
This PR removes the rate limits from the config and uses rate_limit from
the Service model instead. Rate limits are still separated into 'team',
'normal' and 'test', but these values are the same for a service.
Pivotal story https://www.pivotaltracker.com/story/show/153992529
Validators check that service_letter_contact_id belongs to the
same service as the notification/template.
Generic reply_to validator calls the correct function for the given
type (for either notification or template). It can be used by the
template API endpoints to verify that given reply_to ID has the same
service_id as the template itself.
The original approach was to create a DB foreign key constraint,
but this caused issues with the `version_class` decorator saving
related Service objects without creating a history record.
Checks authentication header value on inbound SMS requests from
MMG against a list of allowed API keys set in the application
config.
At the moment, we're only logging the attempts without aborting the
requests. Once this is rolled out to production and we've checked
the logs we'll switch on the aborts and add the tests for 401 and 403
responses.
This work has already been done for Firetext in a previous PR:
https://github.com/alphagov/notifications-api/pull/1409
- check if service_callback_api exist before putting tasks on queue
- create_service_callback_api in tests before asserting if send_delivery_status_to_service has been called.
- Disable Redis as there is a current connection limit of 256 which
could slow down the request if they are all used
- Added statd to methods in the post to help spot any bottlenecks
Switches on authentication checks for Firetext inbound SMS callbacks.
This should only be released once Firetext callback URLs have been
updated with authentication details.
Checks authentication header value on inbound SMS requests from
Firetext against a list of allowed API keys set in the application
config.
At the moment, we're only logging the attempts without aborting the
requests. Once this is rolled out to production and we've checked
the logs we'll switch on the aborts and add the tests for 401 and 403
responses.