When we're bulk updating, make sure we call `synchronize_session=False`,
and make sure that we then commit before we try and access any ORM
objects in the session that might be deleted.
Tutorial time:
when you delete, sqlalchemy needs to work out what to do with objects in
the session. "evaluate", "fetch", or False (do nothing). It wants to
remove items from the session if you're about to delete them, so that
you don't get confused about the state of objects
* evaluate compares the delete query to each item in the session in
turn. this is the default. if you have lots in the session this could
be super slow I guess but that's rarely the case for us. This can lead
to errors if the column names are different to the model names, like
on our notification and history models.
* fetch runs the delete query as if it's a select, and then checks the
results of that vs the session. This could be slow.
* False doesn't do anything. This means the session will be stale and
potentially will contain now-deleted items, until we call `commit` or
`expire_all` on the session.
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=query.update#sqlalchemy.orm.query.Query.delete
So since removing the subquery in the previous commit, we now are doing
all of the inserts using offsets and limits to group by 10,000 and deleting
all records in a single query (which could be up to as many as 10
million rows).
We want to avoid doing this, because both of these ways we think are
going to result in expensive queries. Therefore we have introduced our
own chunking of the notifications by hour periods meaning we do not need
to use offset and limits.
We estimate that GOV.UK email will at most send 600,000 notifications
per hour (175 per second * 60 * 60).
- Table to store meta data for the emergency contact list for a service.
- Endpoint for fetching contact lists for service
- Endpoint for saving contact list for service.
The list will be stored in S3. The service will then be able to send emergency announcements to staff.
Currently the dashboard in the admin app pull the entire returned letter
summary for a service to calculate how many letters have been returned
in the last seven days.
Adding a separate endpoint for this purpose is better because:
- it’s a more efficient query
- it’s less data to send down the pipe
- it gives us a place to return the complete datetime, so the dashboard
can be more precise about when the most recent report was
If your caseworking system always spits out files with the same name it
will be hard to differentiate them when looking at the uploads page.
Seeing who the letter was sent to will help you differentiate them.
We can’t do this until the API returns the recipient.
This will switch on this feature for new services.
After this we will:
- give existing services this permission with a database migration
- remove this permission from the codebase entirely so that everyone has
this feature and can’t switch it off
This endpoint may need to change, but we'd like to see how this performs, so we'll test this with a real data set. Then come back to make sure the format is correct and check for missing tests for the endpoint,
Previously, when running the `collate_letter_pdfs_for_day` task, we
would only send letters that were created between 5:30pm yesterday and
5:30 today.
Now we send letters that were created before 5:30pm today and that are
still waiting to be sent. This will help us automatically attempt to
send letters that may have fallen through the gaps and not been sent the
previous day when they should have been.
Previously we solved the problem of letters that had fallen the gap by
having to run the task with a date parameter for example
`collate_letter_pdfs_for_day('2020-02-18'). We no longer need this date
parameter as we will always look back across previous days too for
letters that still need sending.
Note, we have to change from using the pagination `list_objects_v2` to
instead getting each individual notification from s3. We reduce load by
using `HEAD` rather than `GET` but this will still greatly increase the
number of API calls. We acknowledge there will be a small cost to this,
say 50p for 5000 letters and think this is tolerable. Boto3 also handles
retries itself so if when making one of the many HEAD requests, there is
a networking blip then it should be retried automatically for us.
we don't need it here - as exceptions are re-raised, they will be logged
additionally by error handlers further up. All this exception logger
tells us is that service names are already in use, which isn't something
we're really interested in.
the queries all return lots of columns, but each query has columns it
doesn't care about. eg emails don't have billable units or international
flag, letters don't have international flag, sms don't have a page count
etc. additionally, the query was grouping on things that never change,
like service id and notification type.
by making all of these literals (as in `select 1 as foo`) we see times
that are over 50% quicker for gov.uk email service.
Note: One of the tests changed because previously it involved emails and
sms with statuses that they could never be (eg returned-letter)
We use boto3 for our interaction with s3. Therefore if an expection is
thrown it will be thrown from the botocore library (which boto3 is built
on top of).
I have copied
app/aws/s3.py::file_exists for an example of this exception catching.
The date in the notifications table should always be the most recent date for the template.
Removed the template_type param for the query as well.
Simplified the tests.
The existing endpoint returned a whole notification for the last time the template was used. But this only takes into account data in the last week. This new methods allows us to be specific about when the template was last used if ever but looking into the ft_notification_status table as well.
and update it when users have to use their email to interact with
Notify service.
Initial population:
If user has email_auth, set last_validated_at to logged_in_at.
If user has sms_auth, set it to created_at.
Then:
Update email_access_valdiated_at date when:
- user with email_auth logs in
- new user is created
- user resets password when logged out, meaning we send them an
email with a link they have to click to reset their password.
We were just ignoring the errors and our users were not fixing things.
Given that 500 texts cost approx £8 it's not the end of the world.
In the long run we may decide to just stop letting people try and send
messages to TV numbers but this is a quick fix to stop emails coming in
which we ignore.
Currently if you visit the job page and the job is older than the data retention the totals on the page are all wrong because this query gets the counts from the notification table. With this change the data should always be correct. It also eliminates the need for looking at data retention. If the job is new and nothing has been created yet (i.e. the job hasn't started yet) then the page should show the correctly because the outcomes are empty (as expected), once the notifications for the jobs are created the numbers will start going up.
- Do not show "hidden" or precompiled templates, users don't know about them.
- Remove the client reference if it is the file name of an uploaded file.
- Format the date for created_at
- Added a test for all the different types of letters.
1) One off templated letter
2) Letter created by a csv upload or job.
3) Uploaded letter
4) Templated letter sent by the API
5) Precompiled letter sent by the API
For notification and notification_history we do an upsert. Here, as the
inbound_sms table is never updated, only inserted to once (signified by
lack of updated_at field), an upsert would be unnecessary.
Therefore, if for some reason the delete statement failed as part of
moving data into the inbound_sms_history table, we can simply just
ignore any db conflicts raised by a rerun of
`delete_inbound_sms_older_than_retention`.
- Check if right keys in new history rows
- Improve model and get rid of old revision version
- Add updated migration file
- Test data when inserting into inbound sms history