Previously, we logged a warning containing the notification reference
and new status. However it wasn't a great message - this new one
includes the notification id, the old status, the time difference and
more.
This separates out logs for callbacks for notifications we don't know
(error level) and duplicates (info level).
Services reaching rate limits are triggering our alerts and make it
hard to find actual exceptions in the logs.
As far as the API is concerned this is not an exceptional state,
so we shouldn't log it as such.
Bumped notifications-utils to 3.7.0. Version 3.7.0 includes the
`convert_utc_to_bst` and `convert_bst_to_utc` functions and the
`LETTER_PROCESSING_DEADLINE` constant, so these have been removed from
this repo and anywhere using these has now been updated to get these
from `notifications-utils`.
Also bumped pytest by a patch version to bring in a bug fix.
This commit modifies the code paths the admin app uses to send one off
emails and text messages to also accept letters.
This mostly worked already, the two changes were:
- making sure that one-off letters are processed by the correct task,
from the correct queue
- one-off letters sent from a service in research mode don’t get put on
a queue and go straight to `delivered` (because we don’t want to send
them for real)
- Updated notifications_dao.update_notification_status_by_id with an optional parameter to set the sent_by, this will eliminate a separate update to notifcaitons.
- Added the callback url to the log message, that way we can see if it's the same url failing.
- Stop sending the status callbacks for PENDING status.
Admin, API and utils were all defining a value for SMS_CHAR_COUNT_LIMIT.
This value has been updated in notifications-utils to allow text
messages to be 4 fragments long and notifications-api now gets the value of
SMS_CHAR_COUNT_LIMIT from notifications-utils instead of defining it in
config.
Also updated some tests to check for the higher limit.
If there is a bounce we update the email to failed.
However, there is more than one reason for the failed message. Adding this logging will give us more details about the failure message.
The json we were getting from SES was not quite as expected, the test data now reflects what we get.
New test added, fix a test that was passing regardless.
When handling the complaint we don't want to throw an exception if the message is missing fields. Only log an exception if we are unable to tie a complaint to a notification.
If a someone gets an email from one of our services and then complain about it (mark as spam or otherwise), we get a callback from SES.
The service needs to know about these complaints so they can remove that email from their mailing list.
We've run into issues with redis expiring keys while we try and write
to them - short lived redis TTLs aren't really sustainable for keys
where we mutate the state. Template usage is a hash contained in redis
where we increment a count keyed by template_id each time a message is
sent for that template. But if the key expires, hincrby (redis command
for incrementing a value in a hash) will re-create an empty hash.
This is no good, as we need the hash to be populated with the last
seven days worth of data, which we then increment further. We can't
tell whether the hincrby created the key, so a different approach
entirely was needed:
* New redis key: <service_id>-template-usage-<YYYY-MM-DD>. Note: This
YYYY-MM-DD is BTC time so it lines up nicely with ft_billing table
* Incremented to from process_notification - if it doesn't exist yet,
it'll be created then.
* Expiry set to 8 days every time it's incremented to.
Then, at read time, we'll just read the last eight days of keys from
Redis, and sum them up. This works because we're only ever incrementing
from that one place - never setting wholesale, never recreating the
data from scratch. So we know that if the data is in redis, then it is
good and accurate data.
One thing we *don't* know and *cannot* reason about is what no key in
redis means. It could be either of:
* This is the first message that the service has sent today.
* The key was deleted from redis for some reason.
Since we set the TTL to so long, we'll never be writing to a key that
previously expired. But if there is a redis (or operator) error and the
key is deleted, then we'll have bad data - after any data loss we'll
have to rebuild the data.
Which means the sent_at date for the notification could be empty causing the service callback to fail.
- Allow code to work if notification.sent_at or updated_at is None
- Update calls to send_delivery_status_to_service to send the data encrypted so that the task does not need to use the db.
The JobStatistics table is going to be deleted. There are currently
3 tasks which use the JobStatistics model via the Statistics DAO, so we
need to make sure that these tasks aren't being used before they are
deleted in a separate PR.
This commit deletes:
* The `create_initial_notification_statistic_tasks` function which gets
used to call the `record_initial_job_statistics` task.
* The `create_outcome_notification_statistic_tasks` function which gets
used to call the `record_outcome_job_statistics` task.
* And the scheduling of the `timeout-job-statistics` scheduled task.
- If the SMS client sends a status code that we do not recognize raise a ClientException and set the notification status to technical-failure
- Simplified the code in process_client_response, using a simple map.
This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2.
Messaging would change to something like this:
February 2nd 2018, 15:39:05.885 Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
{'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']}
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:57.727 SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814
February 2nd 2018, 15:38:56.727 Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181
February 2nd 2018, 15:38:56.727 Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery
To somthing like this:
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
needed for monitoring the performance of the v2 endpoints. They were put
in as a temporary measure whilst sustained performance testing was
taking place.
The whitelist was built to help developers and designers making
prototypes to do realistic usability testing of them, without having to
go through the whole go live process.
These users are sending messages using the API. The whitelist wasn’t
made available to users uploading spreadsheets. The users sending one
off messages are similar to those uploading spreadsheets, not those
using the API. Therefore they shouldn’t be able to use the whitelist to
expand the range of recipients they can send to.
Passing the argument through three methods doesn’t feel that great, but
can’t think of a better way without major refactoring…
Previously, if the SMS recipient was None there would be a 500 error
with no message displayed to the user. We now check if the recipient is
None and raise a BadRequestError if this is the case.
PR #1550 added the rate_limit column to the Service table.
This PR removes the rate limits from the config and uses rate_limit from
the Service model instead. Rate limits are still separated into 'team',
'normal' and 'test', but these values are the same for a service.
Pivotal story https://www.pivotaltracker.com/story/show/153992529
Validators check that service_letter_contact_id belongs to the
same service as the notification/template.
Generic reply_to validator calls the correct function for the given
type (for either notification or template). It can be used by the
template API endpoints to verify that given reply_to ID has the same
service_id as the template itself.
The original approach was to create a DB foreign key constraint,
but this caused issues with the `version_class` decorator saving
related Service objects without creating a history record.