Commit Graph

1129 Commits

Author SHA1 Message Date
Pea Tyczynska
0dbe4b27c8 Rearrange fixture for readability 2021-03-19 16:50:01 +00:00
Pea Tyczynska
100d47f4e8 Refactor test and fixture for getting billing report data
Names of services and orgs were confusing, and variable setting
was done in a way that made it easy to introduce errors.

Now hopefully it is more readable and more error-proof.
2021-03-19 16:50:00 +00:00
Ben Thorner
c76e789f1e Reduce extra S3 ops when working with letter PDFs
Previously we did some unnecessary work:

- Collate task. This had one S3 request to get a summary of the object,
which was then used in another request to get the full object. We only
need the size of the object, which is included in the summary [1].

- Archive task. This had one S3 request to get a summary of the object,
which was then used to make another request to delete it. We still need
both requests, but we can remove the S3.Object in the middle.

[1]: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#objectsummary
2021-03-16 12:53:13 +00:00
Ben Thorner
ff7eebc90a Simplify deleting old letters
Previously we made a call to S3 to list objects for a letter, even
though we already had the precise key of the single object to hand.
This removes the one usage of "get_s3_bucket_objects" and uses the
filename directly in the call to remove the object.
2021-03-15 17:18:20 +00:00
Ben Thorner
b43a367d5f Relax lookup of letter PDFs in S3 buckets
Previously we generated the filename we expected a letter PDF to be
stored at in S3, and used that to retrieve it. However, the generated
filename can change over the course of a notification's lifetime e.g.
if the service changes from crown ('.C.') to non-crown ('.N.').

The prefix of the filename is stable: it's based on properties of the
notification - reference and creation - that don't change. This commit
changes the way we interact with letter PDFs in S3:

- Uploading uses the original method to generate the full file name.
The method is renamed to 'generate_' to distinguish it from the new one.

- Downloading uses a new 'find_' method to get the filename using just
its prefix, which makes it agnostic to changes in the filename suffix.

Making this change helps to decouple our code from the requirements DVLA
have on the filenames. While it means more traffic to S3, we rely on S3
in any case to download the files. From experience, we know S3 is highly
reliable and performant, so don't anticipate any issues.

In the tests we favour using moto to mock S3, so that the behaviour is
realistic. There are a couple of places where we just mock the method,
since what it returns isn't important for the test.

Note that, since the new method requires a notification object, we need
to change a query in one place, the columns of which were only selected
to appease the original method to generate a filename.
2021-03-15 13:55:44 +00:00
David McDonald
41d95378ea Remove everything for the performance platform
We no longer will send them any stats so therefore don't need the code
- the code to work out the nightly stats
- the performance platform client
- any configuration for the client
- any nightly tasks that kick off the sending off the stats

We will require a change in cronitor as we no longer will have this task
run meaning we need to delete the cronitor check.
2021-03-15 12:04:53 +00:00
Leo Hemsted
ebd4eda8bd remove duplicate dao invite fns and improve naming 2021-03-12 13:56:05 +00:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Rebecca Law
19f7a6ce38 Refactor method for deciding the failure type 2021-03-10 14:39:55 +00:00
Rebecca Law
11d10d5293 Rename to performance_dashboard
Fix totals to return totals for all time rather than for date range.
Added more test data
2021-03-10 13:16:25 +00:00
Rebecca Law
b06850e611 Add an endpoint to return all the data required for the performance
platform page.
2021-03-05 09:59:03 +00:00
Rebecca Law
0849070eca Add created_at and updated_at columns to ft_processing_time 2021-02-26 07:49:49 +00:00
Rebecca Law
21edf7bfdd Persist the processing time statistics to the database.
The performance platform is going away soon. The only stat that we do not have in our database is the processing time. Let me clarify the only statistic we don't have in our database that we can query efficiently is the processing time. Any queries on notification_history are too inefficient to use on a web page.
Processing time = the total number of normal/team emails and text messages plus the number of messages that have gone from created to sending within 10 seconds per whole day. We can then easily calculate the percentage of messages that were marked as sending under 10 seconds.
2021-02-26 07:49:49 +00:00
Pea Tyczynska
e0c73ac342 Send daily email with letter and sheet volumes to DVLA 2021-02-23 15:13:19 +00:00
Pea Tyczynska
c8ffebcce8 Query to get letter and sheet volumes
So we can send daily email with these volumes to DVLA.
2021-02-23 15:13:18 +00:00
David McDonald
9aba3d758b Fix test that fails after 5:30pm
Was failing when ran after 5:30pm as this would cause the letters to be
in a different subfolder (for one day later). Solved by freezetiming it

Example build that failed: https://cd.gds-reliability.engineering/builds/1876957
2020-12-24 09:57:52 +00:00
Chris Hill-Scott
3b0b96834d Do extra code style checks with flake8-bugbear
Flake8 Bugbear checks for some extra things that aren’t code style
errors, but are likely to introduce bugs or unexpected behaviour. A
good example is having mutable default function arguments, which get
shared between every call to the function and therefore mutating a value
in one place can unexpectedly cause it to change in another.

This commit enables all the extra warnings provided by Flake8 Bugbear,
except for:
- the line length one (because we already lint for that separately)
- B903 Data class should either be immutable or use `__slots__` because
  this seems to false-positive on some of our custom exceptions
- B902 Invalid first argument 'cls' used for instance method because
  some SQLAlchemy decorators (eg `declared_attr`) make things that
  aren’t formally class methods take a class not an instance as their
  first argument

It disables:
- _B306: BaseException.message is removed in Python 3_ because I think
  our exceptions have a custom structure that means the `.message`
  attribute is still present

Matches the work done in other repos:
- https://github.com/alphagov/notifications-admin/pull/3172/files
2020-12-22 16:26:45 +00:00
David McDonald
e35ea57ba2 Do not delete letters if not in final state
A few weeks ago, we deleted some pdf letters that had reached their
retention period. However, these letters were in the 'created' state so
it's very arguable that we should not have deleted them because we were
expecting to resend them and were unable to. Part of the reason for this
is that we marked the letters back to `created` as the status but we did
not nullify the `sent_at` timestamp, meaning the check on
ebb43082d5/app/dao/notifications_dao.py (L346)
did not catch it. Regardless of that check, which controls whether the
files were removed from S3, they were also archived into the
`notification_history` table as by default.

This commit does changes our code such that letters that are not in
their final state do not go through our retention process. This could
mean they violate their retention policy but that is likely the lesser
of two evils (the other being we delete them and are unable to resend
them).

Note, `sending` letters have been included in those not to be removed
because there is a risk that we give the letter to DVLA and put it in
`sending` but then they come back to us later telling us they've had
problems and require us to resend.
2020-12-16 10:50:11 +00:00
David McDonald
1bf9b29905 Document behaviour of s3 letter deleting
The behaviour was a bit of opaque so I have added tests around it so
it's clear what it is doing and why. No functionality has changed
2020-12-16 10:39:31 +00:00
David McDonald
219023f4c6 Fix test that was passing unintentionally
This test was added in
ebb43082d5

However, there are a few problems with it

1. The test name doesn't seem to match what the code is doing. It looks
   like instead that it is NOT trying to delete from s3 when the letter
   has not been sent and therefore I've updated the test name as such.

2. `delete_notifications_older_than_retention_by_type` was being called
   with `email` as it's argument which doesn't match. This is a test for
   letters. It definitely wouldn't do any looking in s3 for emails.

3. `created_at` needed bumping back into the past, past the default 7
   days retention so these letters would be considered old enough to
   delete

4. For the letter to not be sent, it needs to be in `created`, not in
   `sending` so I have updated the status. Note, there could be other
   statuses which class as 'not sent' but this is the most obvious one
   to test with
2020-12-15 09:48:08 +00:00
Chris Hill-Scott
682cbc5130 Don’t return jobs sent from contact lists
Now that we’re grouping jobs sent from contact lists within their
parent, they shouldn’t also be listed on the jobs page at the top level.

The jobs page uses the uploads API, not the jobs API, so this commit
makes sure that filtering is happening in the proper place.
2020-12-01 15:26:36 +00:00
Chris Hill-Scott
10e1fe6902 Revert "Don’t return jobs sent from contact lists"
This reverts commit 061c0a0050.
2020-12-01 15:18:32 +00:00
Chris Hill-Scott
061c0a0050 Don’t return jobs sent from contact lists
Now that we’re grouping jobs sent from contact lists within their
parent, they shouldn’t also be listed on the jobs page at the top level.
2020-12-01 11:56:34 +00:00
Leo Hemsted
0257774cfa add get_earlier_provider_message fn to broadcast_event
replacing get_earlier_provider_messages. The old function returned the
previous references for earlier events for a broadcast_message. However,
these depend on the message sent to a specific provider, so the function
needs to change. It now takes in a provider, and only returns
broadcast_provider_messages sent to that provider. If there are earlier
broadcast_events without a provider_message for the chosen provider, it
raises an exception - you cannot cancel a message if all the previous
events have not been created properly (as we wouldn't know what
references to cancel).
2020-11-19 15:50:37 +00:00
Leo Hemsted
f12c949ae9 create broadcast_provider_message and use id from that instead
(instead of using the id from broadcast_event)

we need every XML blob we send to have a different ID. if we're sending
different XML blobs for each provider, then each one should have a
different identifier. So, instead of taking the identifier from the
broadcast_event, take it from the broadcast_provider_message instead.

Note: We're still going to the broadcast_event for most fields, to
ensure they stay consistent between different providers. The last thing
we want is for different phone networks to get different content
2020-11-19 15:50:37 +00:00
Leo Hemsted
732c203d3e rename clients to notification_provider_clients
i think it's causing havoc with my attempts to mock stuff in the
`app.clients` directory because it's also accessible at that path. the
name's super vague and doesn't explain what it is anyway
2020-11-17 13:34:58 +00:00
Rebecca Law
df325c78b7 There seems to be a UTC time bug around the dao_get_uploads_by_service_id function.
To fix the build tonight I'm putting freezing the time for the test. But will investigate further tomorrow.
2020-10-27 17:25:58 +00:00
Leo Hemsted
3bc3ed88b3 use yield_per instead of limit
limit means we only return 50k letters, if there are more than that for
a service we'll skip them and they won't be picked up until the next
day.

If you remove the limit, sqlalchemy prefetches query results so it can
build up ORM results, for example collapsing joined rows into single
objects with chidren. SQLAlchemy streams the data into a buffer, and
normally will still prefetch the entire resultset so it can ensure
integrity of the session, (so that if you modify one result that is
duplicated further down in the results, both rows are updated in the
session for example). However, we don't care about that, but we do care
about preventing the result set taking up too much memory. We can use
`yield_per` to yield from sqlalchemy to the iterator (in this case the
`for letter in letters_awaiting_sending` loop in letters_pdf_tasks.py) -
this means every time we hit 10000 rows, we go back to the database to
get the next 10k. This way, we only ever need 10k rows in memory at a
time.

This has some caveats, mostly around how we handle the data the query
returns. They're a bit hard to parse but I'm pretty sure the notable
limitations are:

* It's dangerous to modify ORM objects returned by yield_per queries
* It's dangerous to join in a yield_per query if you think there will be
  more than one row per item (for example, if you join from notification
  to service, there'll be multiple result rows containing the same
  service, and if these are split over different yield chunks, then we
  may experience undefined behaviour.

These two limitations are focused around there being no guarantee of
having one unique row per item.

For more reading:
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
https://www.mail-archive.com/sqlalchemy@googlegroups.com/msg12443.html
2020-10-26 13:01:34 +00:00
Leo Hemsted
ed182c2a22 return just the columns we need for collating letters
previously we were returning the entire ORM object. Returning columns
has a couple of benefits:

* Means we can join on to services there and then, avoiding second
  queries to get the crown status of the service later in the collate
  flow.
* Massively reduces the amount of data we return - particularly free
  text fields like personalisation that could be potentially quite big.
  5 columns rather than 26 columns.
* Minor thing, but will skip some CPU cycles as sqlalchemy will no
  longer construct an ORM object and try and keep track of changes. We
  know this function doesn't change any of the values to persist them
  back, so this is an unnecessary step from sqlalchemy.

Disadvantages are:

* The dao_get_letters_to_be_printed return interface is now much more
  tightly coupled to the get_key_and_size_of_letters_to_be_sent_to_print
  function that calls it.
2020-10-23 20:01:18 +01:00
Chris Hill-Scott
88cd92b946 Revert "Remove the upload letters permission" 2020-10-23 15:14:37 +01:00
Chris Hill-Scott
e7c1f7c60e Merge pull request #3001 from alphagov/remove-upload-letters-permission
Remove the upload letters permission
2020-10-23 14:27:48 +01:00
Leo Hemsted
4b61060d32 stream notifications when collating zip files
we had issues where we had 150k 2nd class notifications, and the collate
task never ran properly, presumably because the volume of data being
returned was too big.

to try and help with this, we can switch to streaming rather than using
`.all` and building up lists of data. This should help, though the
initial query may be a problem still.
2020-10-23 12:20:26 +01:00
Pea Tyczynska
9ac65ee95c Start sending letters from insolvency service again 2020-10-21 16:56:18 +01:00
Chris Hill-Scott
182bfa7e10 Remove the upload letters permission
As of https://github.com/alphagov/notifications-admin/pull/3690 it’s no
longer referred to.
2020-10-20 11:46:11 +01:00
Pea Tyczynska
30bd311eb1 Temporarily do not send letters from Insolvency Service
to DVLA. This is a temporary measure over the weekend so that
DVLA can catch up with all other letters.

We should revert this on Monday 19.10.2020
2020-10-16 16:13:32 +01:00
Rebecca Law
b2ff4277c9 Adding service_id to the sort order for the letters being sent to print.
We have had a few instances where letters have caused problems. Particularly for precompiled letters, often the issue comes from the same service.
The hope is that by adding a sort order this will help the print provider narrow down the problem.

There is a small degradation of the performance of the query, but it's not enough to concern me.
2020-10-15 09:39:07 +01:00
Chris Hill-Scott
f2314333b5 Merge pull request #2982 from alphagov/improve-efficiency-of-process-missing-rows
Improve efficiency of process missing rows task
2020-10-02 11:23:35 +01:00
Chris Hill-Scott
1d50bfaafc Remove unused column from query 2020-09-26 12:11:15 +01:00
Chris Hill-Scott
aace1bdd8a Allow 20 minutes before checking for missing rows
Since we’ve doubled the number of rows in a job, jobs can take twice as
long to insert all the notifications. We don’t check for missing rows
until we’re pretty confident that the original tasks have finished
processing. This means we need to double the time we wait to still be
as sure.
2020-09-26 11:38:38 +01:00
Chris Hill-Scott
92af5b8d67 Merge pull request #2968 from alphagov/cancel-international-letters
Allow international letters to be cancelled
2020-09-09 14:10:23 +01:00
Chris Hill-Scott
cfda289746 Allow international letters to be cancelled
Our code was assuming that any notifications with `international` set to
`True` were text messages. It was then trying to look up delivery
information for a notification which wasn’t sent to a phone number,
causing an exception.
2020-09-09 10:55:55 +01:00
Rebecca Law
795a035fac When the organisation updates the crown attribute it should update all the services associated with that organisation too. 2020-09-09 10:43:16 +01:00
Rebecca Law
93475912ba Merge pull request #2950 from alphagov/international-letters-for-all
Default international_letters for service permissions.
2020-09-07 07:39:52 +01:00
Katie Smith
b30701d7e1 Set 'international' for letters in ft_billing
`international` for letters in `ft_billing` was always False. Now that
letters can be international, this changes the column value to the value
of `international` for the notification.
2020-08-21 09:19:27 +01:00
Rebecca Law
d9fd541ab7 Add international letters as a default permission when creating a new service 2020-08-11 15:59:09 +01:00
Chris Hill-Scott
32f5f454de Merge pull request #2928 from alphagov/serve-on-slash-guest-list
Rename API URLs for guest list to guest list
2020-08-03 16:44:41 +01:00
Chris Hill-Scott
5b1b82030d Rename test files
To reflect new name of feature.
2020-07-28 12:56:48 +01:00
Chris Hill-Scott
b19451c7c6 Rename DAO file
To reflect new name of feature
2020-07-28 12:56:40 +01:00
Chris Hill-Scott
65346852ed Rename variables and functions in tests
To reflect the new name of the feature.
2020-07-28 12:56:32 +01:00
Chris Hill-Scott
e41022214f Rename backref to service model
To reflect the new name. Appears this is only used by the tests.
2020-07-28 12:56:14 +01:00