Commit Graph

374 Commits

Author SHA1 Message Date
Ben Thorner
ff7eebc90a Simplify deleting old letters
Previously we made a call to S3 to list objects for a letter, even
though we already had the precise key of the single object to hand.
This removes the one usage of "get_s3_bucket_objects" and uses the
filename directly in the call to remove the object.
2021-03-15 17:18:20 +00:00
Leo Hemsted
6784ae62a6 Raise Exception if letter PDF not in S3
Previously, the function would just return a presumed filename. Now that
it actually checks s3, if the file doesn't exist it'll raise an
exception. By default that's a StopIteration at the end of the bucket
iterator, which isn't ideal as this will get supressed if the function
is called within a generator loop further up or anything.

There are a couple of places where we expect the file may not exist, so
we define a custom exception to rescue specifically here. I did consider
subclassing boto's ClientError, but this wasn't straightforward as the
constructor expects to know the operation that failed, which for me is a
signal that it's not an appropriate (re-)use of the class.
2021-03-15 17:18:11 +00:00
Ben Thorner
b43a367d5f Relax lookup of letter PDFs in S3 buckets
Previously we generated the filename we expected a letter PDF to be
stored at in S3, and used that to retrieve it. However, the generated
filename can change over the course of a notification's lifetime e.g.
if the service changes from crown ('.C.') to non-crown ('.N.').

The prefix of the filename is stable: it's based on properties of the
notification - reference and creation - that don't change. This commit
changes the way we interact with letter PDFs in S3:

- Uploading uses the original method to generate the full file name.
The method is renamed to 'generate_' to distinguish it from the new one.

- Downloading uses a new 'find_' method to get the filename using just
its prefix, which makes it agnostic to changes in the filename suffix.

Making this change helps to decouple our code from the requirements DVLA
have on the filenames. While it means more traffic to S3, we rely on S3
in any case to download the files. From experience, we know S3 is highly
reliable and performant, so don't anticipate any issues.

In the tests we favour using moto to mock S3, so that the behaviour is
realistic. There are a couple of places where we just mock the method,
since what it returns isn't important for the test.

Note that, since the new method requires a notification object, we need
to change a query in one place, the columns of which were only selected
to appease the original method to generate a filename.
2021-03-15 13:55:44 +00:00
David McDonald
41d95378ea Remove everything for the performance platform
We no longer will send them any stats so therefore don't need the code
- the code to work out the nightly stats
- the performance platform client
- any configuration for the client
- any nightly tasks that kick off the sending off the stats

We will require a change in cronitor as we no longer will have this task
run meaning we need to delete the cronitor check.
2021-03-15 12:04:53 +00:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Rebecca Law
19f7a6ce38 Refactor method for deciding the failure type 2021-03-10 14:39:55 +00:00
Rebecca Law
21edf7bfdd Persist the processing time statistics to the database.
The performance platform is going away soon. The only stat that we do not have in our database is the processing time. Let me clarify the only statistic we don't have in our database that we can query efficiently is the processing time. Any queries on notification_history are too inefficient to use on a web page.
Processing time = the total number of normal/team emails and text messages plus the number of messages that have gone from created to sending within 10 seconds per whole day. We can then easily calculate the percentage of messages that were marked as sending under 10 seconds.
2021-02-26 07:49:49 +00:00
Pea Tyczynska
e0c73ac342 Send daily email with letter and sheet volumes to DVLA 2021-02-23 15:13:19 +00:00
Pea Tyczynska
c8ffebcce8 Query to get letter and sheet volumes
So we can send daily email with these volumes to DVLA.
2021-02-23 15:13:18 +00:00
David McDonald
e35ea57ba2 Do not delete letters if not in final state
A few weeks ago, we deleted some pdf letters that had reached their
retention period. However, these letters were in the 'created' state so
it's very arguable that we should not have deleted them because we were
expecting to resend them and were unable to. Part of the reason for this
is that we marked the letters back to `created` as the status but we did
not nullify the `sent_at` timestamp, meaning the check on
ebb43082d5/app/dao/notifications_dao.py (L346)
did not catch it. Regardless of that check, which controls whether the
files were removed from S3, they were also archived into the
`notification_history` table as by default.

This commit does changes our code such that letters that are not in
their final state do not go through our retention process. This could
mean they violate their retention policy but that is likely the lesser
of two evils (the other being we delete them and are unable to resend
them).

Note, `sending` letters have been included in those not to be removed
because there is a risk that we give the letter to DVLA and put it in
`sending` but then they come back to us later telling us they've had
problems and require us to resend.
2020-12-16 10:50:11 +00:00
David McDonald
47e146f010 Move variable out of loop that didn't need to be 2020-12-16 10:40:30 +00:00
David McDonald
1bf9b29905 Document behaviour of s3 letter deleting
The behaviour was a bit of opaque so I have added tests around it so
it's clear what it is doing and why. No functionality has changed
2020-12-16 10:39:31 +00:00
Leo Hemsted
3bc3ed88b3 use yield_per instead of limit
limit means we only return 50k letters, if there are more than that for
a service we'll skip them and they won't be picked up until the next
day.

If you remove the limit, sqlalchemy prefetches query results so it can
build up ORM results, for example collapsing joined rows into single
objects with chidren. SQLAlchemy streams the data into a buffer, and
normally will still prefetch the entire resultset so it can ensure
integrity of the session, (so that if you modify one result that is
duplicated further down in the results, both rows are updated in the
session for example). However, we don't care about that, but we do care
about preventing the result set taking up too much memory. We can use
`yield_per` to yield from sqlalchemy to the iterator (in this case the
`for letter in letters_awaiting_sending` loop in letters_pdf_tasks.py) -
this means every time we hit 10000 rows, we go back to the database to
get the next 10k. This way, we only ever need 10k rows in memory at a
time.

This has some caveats, mostly around how we handle the data the query
returns. They're a bit hard to parse but I'm pretty sure the notable
limitations are:

* It's dangerous to modify ORM objects returned by yield_per queries
* It's dangerous to join in a yield_per query if you think there will be
  more than one row per item (for example, if you join from notification
  to service, there'll be multiple result rows containing the same
  service, and if these are split over different yield chunks, then we
  may experience undefined behaviour.

These two limitations are focused around there being no guarantee of
having one unique row per item.

For more reading:
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
https://www.mail-archive.com/sqlalchemy@googlegroups.com/msg12443.html
2020-10-26 13:01:34 +00:00
Leo Hemsted
ed182c2a22 return just the columns we need for collating letters
previously we were returning the entire ORM object. Returning columns
has a couple of benefits:

* Means we can join on to services there and then, avoiding second
  queries to get the crown status of the service later in the collate
  flow.
* Massively reduces the amount of data we return - particularly free
  text fields like personalisation that could be potentially quite big.
  5 columns rather than 26 columns.
* Minor thing, but will skip some CPU cycles as sqlalchemy will no
  longer construct an ORM object and try and keep track of changes. We
  know this function doesn't change any of the values to persist them
  back, so this is an unnecessary step from sqlalchemy.

Disadvantages are:

* The dao_get_letters_to_be_printed return interface is now much more
  tightly coupled to the get_key_and_size_of_letters_to_be_sent_to_print
  function that calls it.
2020-10-23 20:01:18 +01:00
Leo Hemsted
f8fbaba77d limit collate task to 50,000 messages
we've seen issues where tasks mysteriously hang and do not process
large volumes of letters - in this case >150k letters in created state.

to try and get at least some letters out of the door, limit the query to
only return 50k letters per postage type. We may need to run the task
multiple times, or letters may get delayed until the next day when
they'd be picked up (provided there's enough capacity then). The task
should only be re-run AFTER the ftp tasks have all finished, and updated
the letters to sending, or we run the risk of sending the same letters
twice.

For context, the largest ever letter day we've sent is ~65k in march of
this year.
2020-10-23 15:28:34 +01:00
Leo Hemsted
4b61060d32 stream notifications when collating zip files
we had issues where we had 150k 2nd class notifications, and the collate
task never ran properly, presumably because the volume of data being
returned was too big.

to try and help with this, we can switch to streaming rather than using
`.all` and building up lists of data. This should help, though the
initial query may be a problem still.
2020-10-23 12:20:26 +01:00
Pea Tyczynska
9ac65ee95c Start sending letters from insolvency service again 2020-10-21 16:56:18 +01:00
Pea Tyczynska
30bd311eb1 Temporarily do not send letters from Insolvency Service
to DVLA. This is a temporary measure over the weekend so that
DVLA can catch up with all other letters.

We should revert this on Monday 19.10.2020
2020-10-16 16:13:32 +01:00
Rebecca Law
b2ff4277c9 Adding service_id to the sort order for the letters being sent to print.
We have had a few instances where letters have caused problems. Particularly for precompiled letters, often the issue comes from the same service.
The hope is that by adding a sort order this will help the print provider narrow down the problem.

There is a small degradation of the performance of the query, but it's not enough to concern me.
2020-10-15 09:39:07 +01:00
Leo Hemsted
1e928a926a rename sending_date to created_at
we don't name letters based on the day we send them on, rather, the day
we create them on. If we process a letter for a second time for whatever
reason, even if it's a couple of days later, it'll still go in a folder
based on the created_at timestamp. There's still a slight confusion,
however - if the timestamp is after 5:30pm, the folder will be for the
day after. However, still the day after creation, so I think created_at
still makes the most sense.

Remove the term `sending_date` to try and make this relationship more
apparent.
2020-09-21 14:40:22 +01:00
Leo Hemsted
bb33927b3d rename letter get_folder_name args
`_now`? why would we ever use a different _now? instead say created_at,
because that's what it'll always be set to, even if we're replaying old
letters. We always set the folder name to when the letter was
created_at, or we might not know where to look to find it.

`dont_use_sending_date` doesn't really tell us what might happen if we
don't use it - the answer is we return an empty string. we ignore the
folder entirely. so lets call it that.

Also, remove use of freeze_gun in the tests, to prove that we don't use
the current time in any calculations. Also add an assert to a mock in
the get_pdf_for_templated_letter test, because we were mocking but not
asserting before, so the tests didn't fail when the function signature
changed.
2020-09-21 14:32:57 +01:00
Chris Hill-Scott
0e1f6f31e4 Use constant for notification type
Co-authored-by: Katie Smith <klssmith@users.noreply.github.com>
2020-09-09 11:12:06 +01:00
Chris Hill-Scott
cfda289746 Allow international letters to be cancelled
Our code was assuming that any notifications with `international` set to
`True` were text messages. It was then trying to look up delivery
information for a notification which wasn’t sent to a phone number,
causing an exception.
2020-09-09 10:55:55 +01:00
Pea M. Tyczynska
fbdfa6416f Merge pull request #2921 from alphagov/remove-statsd-http-api-decorators
Remove statsd http api decorators and turn statsd back on for celery apps
2020-07-14 10:16:44 +01:00
Pea M. Tyczynska
9186083ea7 Merge pull request #2796 from alphagov/split-letters-into-zips-based-on-postage
Split letters into zips based on postage
2020-07-08 11:49:21 +01:00
Pea Tyczynska
9c4205c7c6 Remove statsd decorators from dao functions
This done so that we do not use statsd on our http endpoint.
We decided we do not need metrics that this gave us. If we
change our minds, we will add Prometheus-friendly decorators
instead in the future.
2020-07-07 18:02:24 +01:00
Pea Tyczynska
61de908c5d Simplify putting letters in right postage folders 2020-07-02 16:26:44 +01:00
David McDonald
12f460adc5 Turn off statsd wrapper for synchronous statsd calls during POSTs
This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint

This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.

The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.

We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
  our logs if we want it
- the response times and response codes per flask endpoint are already
  exposed using the gds metrics python library

I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
  we should have already removed most calls to StatsD which should
  greatly reduce the chance of their being any further issues with
  DNS resolution
- Some of the other metrics still provide value so no point deleting
  them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
  users as they are async and also we do not currently have the
  infrastructure set up to replace them with a prometheus HTTP endpoint that
  can be scraped so this would require more work
2020-06-29 12:40:22 +01:00
Rebecca Law
ce32e577b7 Remove the use of schedule_for in post_notifications.
Years ago we started to implement a way to schedule a notification. We hit a problem but we never came up with a good solution and the feature never made it back to the top of the priority list.

This PR removes the code for scheduled_for. There will be another PR to drop the scheduled_notifications table and remove the schedule_notifications service permission

Unfortunately, I don't think we can remove the `scheduled_for` attribute from the notification.serialized method because out clients might fail if something is missing. For now I have left it in but defaulted the value to None.
2020-06-24 14:54:40 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Chris Hill-Scott
80fc5e6600 Paginate search results for notifications
The standard way that we indicate that there are more results than can
be returned is by paginating. So even though we don’t intend to paginate
the search results in the admin app, it can still use the presence or
absence of a ‘next’ link to determine whether or not to show a message
about only showing the first 50 results.
2020-05-06 12:13:00 +01:00
Chris Hill-Scott
625aad97c9 Limit search by recipient to 50 results
Things could get ugly if you use a short search string on a service with
lots of notifications…
2020-05-06 11:44:35 +01:00
Chris Hill-Scott
1e20cb6be9 Stop searching the to field
We were doing this temporarily while the `normalised_to` field was not
populated for letters. Once we have a week of data in the
`normalised_to` field we can stop looking in the `to` field. This should
improve performance because:
- it’s one `WHERE` clause not two
- we had to do `ilike` on the `to` field, because we don’t lowercase its
  contents – even if the two where clauses are somehow paralleized it’s
  is slower than a `like` comparison, which is case-sensitive

Depends on:
- [ ] Tuesday 5 May (7 full days after deploying https://github.com/alphagov/notifications-api/pull/2814)
2020-04-30 10:51:30 +01:00
Chris Hill-Scott
26793899d4 Merge pull request #2814 from alphagov/search-letters
Let users search for letters
2020-04-27 15:06:32 +01:00
Chris Hill-Scott
64674be817 Make allowed notification types explicit in search
By not having a catch-all else, it makes it clearer what we’re
expecting. And then we think it’s worth adding a comment explaining why
we normalise as we do for letters and the `None` case.
2020-04-23 16:06:34 +01:00
Chris Hill-Scott
7897ae70ce Let users search for letters
Like we have search by email address or phone number, finding an
individual letter is a common task. At the moment users are having to
click through pages and pages of letters to find the one they’re looking
for.

We have to search in the `to` and `normalised_to` fields for now because
we’re not populating the `normalised_to` column for letters at the
moment.
2020-04-21 16:25:37 +01:00
Pea Tyczynska
9782cbc5b5 Check failure code on failure only.
We should check error code on failure only, because if we get an
error code on pending code, we should still set the notification
to pending status.
2020-04-21 13:43:45 +01:00
Pea Tyczynska
385d249787 Take into consideration Firetext 000 code - no error. Also change logs from error to warning 2020-04-21 12:06:28 +01:00
Pea Tyczynska
b962dcac5d Check Firetext code first to determine status and log the reason
If there is no code, fall back to old way of determining status.
If the code is not recognised, log an Error so we are notified
and can look into it.
2020-04-20 18:33:19 +01:00
Pea Tyczynska
470d00c26e Make comment clearer following review 2020-04-17 10:58:25 +01:00
Pea Tyczynska
a4507dbb3f Use firetext response code to see if temporary or permanent failure if available 2020-04-17 10:58:25 +01:00
Katie Smith
bfd40a843c Delete send-scheduled-notifications task
This was added a few years ago, but never used.
2020-04-02 14:45:18 +01:00
Rebecca Law
ac8fc49539 Merge pull request #2770 from alphagov/increase-qry-limit
Increase default query limit to 50,000 from 10,000
2020-03-31 17:15:12 +01:00
Rebecca Law
1444150d48 Add team key to the notifications to delete/insert into notifications/notification_history 2020-03-31 09:23:41 +01:00
Rebecca Law
5d0830d7f7 Rename function for clarity 2020-03-27 15:48:54 +00:00
Rebecca Law
8da73510c2 Delete all notification_status types.
This will ensure the whole day has been deleted. The stats table could get the wrong updates if there is partial data for a day.
2020-03-27 12:14:30 +00:00
Rebecca Law
e5b6fa2143 Reduce log messages. 2020-03-25 08:08:33 +00:00
Rebecca Law
31f83bb9c4 Increase default query limit to 50,000 from 10,000 2020-03-24 17:04:38 +00:00
Rebecca Law
69102db4cb Remove the hourly loop and replace it with a while loop, which check if we still have things to delete for the day.
We are already limiting the query by a query limit.
2020-03-24 14:54:06 +00:00
Rebecca Law
3ca73a35ed Change as per code review
- fix test name
 - make query filter consistent with each other
 - add comment for clarity
 - add inner loop to continue to insert and delete notifications while the delete count is greater than 0
2020-03-24 14:17:47 +00:00