Was failing when ran after 5:30pm as this would cause the letters to be
in a different subfolder (for one day later). Solved by freezetiming it
Example build that failed: https://cd.gds-reliability.engineering/builds/1876957
Flake8 Bugbear checks for some extra things that aren’t code style
errors, but are likely to introduce bugs or unexpected behaviour. A
good example is having mutable default function arguments, which get
shared between every call to the function and therefore mutating a value
in one place can unexpectedly cause it to change in another.
This commit enables all the extra warnings provided by Flake8 Bugbear,
except for:
- the line length one (because we already lint for that separately)
- B903 Data class should either be immutable or use `__slots__` because
this seems to false-positive on some of our custom exceptions
- B902 Invalid first argument 'cls' used for instance method because
some SQLAlchemy decorators (eg `declared_attr`) make things that
aren’t formally class methods take a class not an instance as their
first argument
It disables:
- _B306: BaseException.message is removed in Python 3_ because I think
our exceptions have a custom structure that means the `.message`
attribute is still present
Matches the work done in other repos:
- https://github.com/alphagov/notifications-admin/pull/3172/files
A few weeks ago, we deleted some pdf letters that had reached their
retention period. However, these letters were in the 'created' state so
it's very arguable that we should not have deleted them because we were
expecting to resend them and were unable to. Part of the reason for this
is that we marked the letters back to `created` as the status but we did
not nullify the `sent_at` timestamp, meaning the check on
ebb43082d5/app/dao/notifications_dao.py (L346)
did not catch it. Regardless of that check, which controls whether the
files were removed from S3, they were also archived into the
`notification_history` table as by default.
This commit does changes our code such that letters that are not in
their final state do not go through our retention process. This could
mean they violate their retention policy but that is likely the lesser
of two evils (the other being we delete them and are unable to resend
them).
Note, `sending` letters have been included in those not to be removed
because there is a risk that we give the letter to DVLA and put it in
`sending` but then they come back to us later telling us they've had
problems and require us to resend.
This test was added in
ebb43082d5
However, there are a few problems with it
1. The test name doesn't seem to match what the code is doing. It looks
like instead that it is NOT trying to delete from s3 when the letter
has not been sent and therefore I've updated the test name as such.
2. `delete_notifications_older_than_retention_by_type` was being called
with `email` as it's argument which doesn't match. This is a test for
letters. It definitely wouldn't do any looking in s3 for emails.
3. `created_at` needed bumping back into the past, past the default 7
days retention so these letters would be considered old enough to
delete
4. For the letter to not be sent, it needs to be in `created`, not in
`sending` so I have updated the status. Note, there could be other
statuses which class as 'not sent' but this is the most obvious one
to test with
Now that we’re grouping jobs sent from contact lists within their
parent, they shouldn’t also be listed on the jobs page at the top level.
The jobs page uses the uploads API, not the jobs API, so this commit
makes sure that filtering is happening in the proper place.
replacing get_earlier_provider_messages. The old function returned the
previous references for earlier events for a broadcast_message. However,
these depend on the message sent to a specific provider, so the function
needs to change. It now takes in a provider, and only returns
broadcast_provider_messages sent to that provider. If there are earlier
broadcast_events without a provider_message for the chosen provider, it
raises an exception - you cannot cancel a message if all the previous
events have not been created properly (as we wouldn't know what
references to cancel).
(instead of using the id from broadcast_event)
we need every XML blob we send to have a different ID. if we're sending
different XML blobs for each provider, then each one should have a
different identifier. So, instead of taking the identifier from the
broadcast_event, take it from the broadcast_provider_message instead.
Note: We're still going to the broadcast_event for most fields, to
ensure they stay consistent between different providers. The last thing
we want is for different phone networks to get different content
i think it's causing havoc with my attempts to mock stuff in the
`app.clients` directory because it's also accessible at that path. the
name's super vague and doesn't explain what it is anyway
limit means we only return 50k letters, if there are more than that for
a service we'll skip them and they won't be picked up until the next
day.
If you remove the limit, sqlalchemy prefetches query results so it can
build up ORM results, for example collapsing joined rows into single
objects with chidren. SQLAlchemy streams the data into a buffer, and
normally will still prefetch the entire resultset so it can ensure
integrity of the session, (so that if you modify one result that is
duplicated further down in the results, both rows are updated in the
session for example). However, we don't care about that, but we do care
about preventing the result set taking up too much memory. We can use
`yield_per` to yield from sqlalchemy to the iterator (in this case the
`for letter in letters_awaiting_sending` loop in letters_pdf_tasks.py) -
this means every time we hit 10000 rows, we go back to the database to
get the next 10k. This way, we only ever need 10k rows in memory at a
time.
This has some caveats, mostly around how we handle the data the query
returns. They're a bit hard to parse but I'm pretty sure the notable
limitations are:
* It's dangerous to modify ORM objects returned by yield_per queries
* It's dangerous to join in a yield_per query if you think there will be
more than one row per item (for example, if you join from notification
to service, there'll be multiple result rows containing the same
service, and if these are split over different yield chunks, then we
may experience undefined behaviour.
These two limitations are focused around there being no guarantee of
having one unique row per item.
For more reading:
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_perhttps://www.mail-archive.com/sqlalchemy@googlegroups.com/msg12443.html
previously we were returning the entire ORM object. Returning columns
has a couple of benefits:
* Means we can join on to services there and then, avoiding second
queries to get the crown status of the service later in the collate
flow.
* Massively reduces the amount of data we return - particularly free
text fields like personalisation that could be potentially quite big.
5 columns rather than 26 columns.
* Minor thing, but will skip some CPU cycles as sqlalchemy will no
longer construct an ORM object and try and keep track of changes. We
know this function doesn't change any of the values to persist them
back, so this is an unnecessary step from sqlalchemy.
Disadvantages are:
* The dao_get_letters_to_be_printed return interface is now much more
tightly coupled to the get_key_and_size_of_letters_to_be_sent_to_print
function that calls it.
we had issues where we had 150k 2nd class notifications, and the collate
task never ran properly, presumably because the volume of data being
returned was too big.
to try and help with this, we can switch to streaming rather than using
`.all` and building up lists of data. This should help, though the
initial query may be a problem still.
We have had a few instances where letters have caused problems. Particularly for precompiled letters, often the issue comes from the same service.
The hope is that by adding a sort order this will help the print provider narrow down the problem.
There is a small degradation of the performance of the query, but it's not enough to concern me.
Since we’ve doubled the number of rows in a job, jobs can take twice as
long to insert all the notifications. We don’t check for missing rows
until we’re pretty confident that the original tasks have finished
processing. This means we need to double the time we wait to still be
as sure.
Our code was assuming that any notifications with `international` set to
`True` were text messages. It was then trying to look up delivery
information for a notification which wasn’t sent to a phone number,
causing an exception.
`international` for letters in `ft_billing` was always False. Now that
letters can be international, this changes the column value to the value
of `international` for the notification.
Reflects the new name of the feature.
Note that the name of the underlying table hasn’t changed because it’s
explicitly set to `service_whitelist`. Changing this will be a more
involved process.
Usage for all services is a platform admin report that groups letters by
postage. We want it to show `europe` and `rest-of-world` letters under a
single category of `international`, so this updates the query to do
that and to order appropriately.
New rows giving the prices of letters with a post_class of `europe` and
`rest-of-world` have been added to the `letter_rates` table. All rates
are currently the same for international letters.
This done so that we do not use statsd on our http endpoint.
We decided we do not need metrics that this gave us. If we
change our minds, we will add Prometheus-friendly decorators
instead in the future.
Years ago we started to implement a way to schedule a notification. We hit a problem but we never came up with a good solution and the feature never made it back to the top of the priority list.
This PR removes the code for scheduled_for. There will be another PR to drop the scheduled_notifications table and remove the schedule_notifications service permission
Unfortunately, I don't think we can remove the `scheduled_for` attribute from the notification.serialized method because out clients might fail if something is missing. For now I have left it in but defaulted the value to None.
The constaints on notifications and notification_history have already
been dropped in production, but still exist in staging and in dev
environments.
The constraints on templates and templates_history exist in all
environments.
Rather than showing all jobs that have been ‘copied’ from a contact list
I think it makes more sense to group them under the contact list. This
way it’s easier to see what messages have been sent to a given group of
contacts over time.
Part of this work means the API needs to return only jobs that have been
created from a given contact list, when asked.
Because we won’t be showing uploaded letters individually on the uploads
page any more we need a way of listing them. This should be by printing
day, to match how we’re grouping them on the uploads page.
The response matches a normal `get_notifications` response so we can
reuse the same code in the admin app.
Some teams have started uploading quite a lot of letters (in the
hundreds per week). They’re also uploading CSVs of emails. This means
the uploads page ends up quite jumbled.
This is because:
- there’s just a lot of items to scan through
- conceptually it’s a bit odd to have batches of things displayed
alongside individual things on the same page
So instead this commit starts grouping together uploaded letters. It
does this by the date on which we ‘start’ printing them, or in other
words the time at which they can no longer be cancelled.
This feels like a natural grouping, and it matches what we know about
people’s mental models of ‘batches’ and ‘runs’ when talking about
printing.
The code for this is a bit gnarly because:
- timezones
- the print cutoff doesn’t align with the end of a day
- we have to do this in SQL because it wouldn’t be efficient to query
thousands of letters and then do the timezone calculations on them in
Python
The standard way that we indicate that there are more results than can
be returned is by paginating. So even though we don’t intend to paginate
the search results in the admin app, it can still use the presence or
absence of a ‘next’ link to determine whether or not to show a message
about only showing the first 50 results.