When we get a callback from SES, we identify the notification by the
SES reference that we set on the notification after sending. When we
wrote the log message, we assumed that we'd always have a notification
for every callback, so if one couldn't be found we would raise an error
log. This isn't the case for a few reasons:
* We might receive a callback before the sender worker has persisted
the reference to the database.
* We might have deleted the notification, especially if the service has
a short data retention period
* We sometimes receive callbacks for references that we have no record
of whatsoever (this is quite alarming but we have no way of knowing
why this happens)
The error logs were happening pretty frequently, and we don't have a
real way to solve them at the moment, so lets cut down on noise and
downgrade them to info level for now.
Flask-SQLAlchemy paginate function issues a separate query to get
the total count of rows for a given filter. This query (with
filters used by the API integration Message log page) is slow for
services with large number of notifications.
Since Message log page doesn't actually allow users to paginate
through the response (it only shows the last 50 messages) we can
use limit instead of paginate, which requires passing in another
flag from admin to the dao method.
`count` flag has been added to `paginate` in March 2018, however
there was no release of flask-sqlalchemy since then, so we need
to pull the dev version of the package from Github.
Previously, we logged a warning containing the notification reference
and new status. However it wasn't a great message - this new one
includes the notification id, the old status, the time difference and
more.
This separates out logs for callbacks for notifications we don't know
(error level) and duplicates (info level).
We are adding an index to Notifications to optimize the get_notifications_for_service. We need to build the index concurrently which can not be run inside a transaction block so the index will need to be run on the db directly.
CREATE INDEX CONCURRENTLY ix_notifications_service_created_at ON notifications (service_id, created_at);
DROP INDEX CONCURRENTLY ix_notifications_service_created_at
The delivery for provider is slow if more than threshold (currently
we pass in threshold 10%) either took x (for now 4) minutes to deliver,
or are still sending after that time. We look at all notifications
for current provider which are delivered or sending, and are not under
test key, for the last 10 minutes.
We are using created_at to establish if notifications are from last
10 minutes because we have an index on it, so the query is faster.
Also write tests for new is_delivery_slow_for_provider query
It can be useful to get a notification by id while checking that the
notification belongs to a given service. This changes the
get_notification_by_id DAO function to optionally also filter by
service_id so that we can check this.
Bumped notifications-utils to 3.7.0. Version 3.7.0 includes the
`convert_utc_to_bst` and `convert_bst_to_utc` functions and the
`LETTER_PROCESSING_DEADLINE` constant, so these have been removed from
this repo and anywhere using these has now been updated to get these
from `notifications-utils`.
Also bumped pytest by a patch version to bring in a bug fix.
- Updated notifications_dao.update_notification_status_by_id with an optional parameter to set the sent_by, this will eliminate a separate update to notifcaitons.
- Added the callback url to the log message, that way we can see if it's the same url failing.
- Stop sending the status callbacks for PENDING status.
There was a datetime bug in the query which resulted in files not being sent to the postal provider.
The trigger-letter-pdfs-for-day task is no longer needed, so rather than fix the query just call collate_letter_pdfs_for_day directly.
Less code is always better.
Deployment considerations: I realized this is strictly not backwards compatible if the scheduled job is in progress and a task is on the queue that no longer exists. This is ok since we will deploy this well before 17:50.
For returned letter updates most notifications won't exist in the
notifications table, so in order to find out whether the reference
matches any known letters we need to check the count of updated
history records.
If there are no files to delete we won't get an excpetion.
Wrap the delete file in a try/except to avoid stopping the entire task.
Fix the missing slash for the file name.
Added the option to filter by one_off messages to the DAO function
`get_notifications_for_service`. Previously, one-off notifications
were not returned - this has changed so that the default is for
one-off notifications to be returned. Also simplified the `include_jobs`
filter for this function.
The DAO function gets used in 3 places - for the V1 and V2 API endpoints,
which will now start to return one-off messages. It also gets used by
the admin app which needs to pass in `include_one_off=False` to the
`get_all_notifications_for_service` where we don't want one-off
notifications to show, such as the API message log page.
By adding the service id, the query performance has improved greatly. It went from 6200ms to 0.04ms.
This should stop the 500s when a template is deleted.
Test notifications are only stored in the notification table, not the
notification_history table. We were deciding which table to query for
the platform admin stats based on the start date passed to the DAO
function. This means that if the start date given was more than 7 days
ago and the end date was within the last 7 days, test notifications were
not being returned. We now use the end date to choose which table to
query which means that test notifications always get returned when they
should.
Created a platform-stats blueprint and moved the new platform stats
endpoint to the new blueprint (it was previously in the service
blueprint). Since the original platform stats route and the new platform
stats route are now in different blueprints, their view functions can
have the same name without any issues.
Moved the `fetch_new_aggregate_stats_by_date_range_for_all_services`
DAO function from the services DAO to the Notifications DAO since this
function queries the `notification` and `notification_history` tables.
Also added a test to check that the data returned from the function
takes BST into account.
The caseworking view is going to have a page which displays emails and
text messages combined together.
In order for the search to work on this page the user needs to be able
to search for an email or a text message. This commit makes it guess
what to search for when the `notification_type` isn’t known (basically
by saying ‘if the search term is only digits they’re probably looking
searching by phone number’).
to get the data for a day can be reasonably slow (a few hundred
milliseconds), and if someone's viewing a service with no activity we
don't want to do that query seven times every two seconds. So if there
is no data in redis, when we get the data out of the database, we
should put it in redis so we can just grab it from there next time.
This'll happen in two cases:
* redis data is deleted
* the service sent no messages that day
additionally, make sure that we convert nicely from redis' return
values (ascii strings) to unicode keys and integer counts.
New redis keys are partitioned per service per day. New process is as
follows:
* require a count of days to filter by. Currently admin always gives 7.
* for each day, check and see if there's anything in redis. There won't
be if either a) redis is/was down or b) the service didn't send any
notifications that day
- if there isn't, go to the database and get a count out.
* combine all these stats together
* get the names/template types etc out of the DB at the end.
it's used in a few places - it should definitely know what timezones
are and return datetimes rather than dates, which are hard to work with
in terms of figuring out how tz aware they are.
we should be very careful with when we get data from
NotificationHistory - this should probably only be from scheduled
tasks. `dao_get_template_usage` is only called from the template
statistics rest endpoint, so shouldn't ever hit template history.
also, moved tests out to new file to break up the 2k test file a bit