Previously, the function would just return a presumed filename. Now that
it actually checks s3, if the file doesn't exist it'll raise an
exception. By default that's a StopIteration at the end of the bucket
iterator, which isn't ideal as this will get supressed if the function
is called within a generator loop further up or anything.
There are a couple of places where we expect the file may not exist, so
we define a custom exception to rescue specifically here. I did consider
subclassing boto's ClientError, but this wasn't straightforward as the
constructor expects to know the operation that failed, which for me is a
signal that it's not an appropriate (re-)use of the class.
Previously we generated the filename we expected a letter PDF to be
stored at in S3, and used that to retrieve it. However, the generated
filename can change over the course of a notification's lifetime e.g.
if the service changes from crown ('.C.') to non-crown ('.N.').
The prefix of the filename is stable: it's based on properties of the
notification - reference and creation - that don't change. This commit
changes the way we interact with letter PDFs in S3:
- Uploading uses the original method to generate the full file name.
The method is renamed to 'generate_' to distinguish it from the new one.
- Downloading uses a new 'find_' method to get the filename using just
its prefix, which makes it agnostic to changes in the filename suffix.
Making this change helps to decouple our code from the requirements DVLA
have on the filenames. While it means more traffic to S3, we rely on S3
in any case to download the files. From experience, we know S3 is highly
reliable and performant, so don't anticipate any issues.
In the tests we favour using moto to mock S3, so that the behaviour is
realistic. There are a couple of places where we just mock the method,
since what it returns isn't important for the test.
Note that, since the new method requires a notification object, we need
to change a query in one place, the columns of which were only selected
to appease the original method to generate a filename.
we don't name letters based on the day we send them on, rather, the day
we create them on. If we process a letter for a second time for whatever
reason, even if it's a couple of days later, it'll still go in a folder
based on the created_at timestamp. There's still a slight confusion,
however - if the timestamp is after 5:30pm, the folder will be for the
day after. However, still the day after creation, so I think created_at
still makes the most sense.
Remove the term `sending_date` to try and make this relationship more
apparent.
`_now`? why would we ever use a different _now? instead say created_at,
because that's what it'll always be set to, even if we're replaying old
letters. We always set the folder name to when the letter was
created_at, or we might not know where to look to find it.
`dont_use_sending_date` doesn't really tell us what might happen if we
don't use it - the answer is we return an empty string. we ignore the
folder entirely. so lets call it that.
Also, remove use of freeze_gun in the tests, to prove that we don't use
the current time in any calculations. Also add an assert to a mock in
the get_pdf_for_templated_letter test, because we were mocking but not
asserting before, so the tests didn't fail when the function signature
changed.
We were determing the filename for precompiled letters before we had
checked if the letters were international. This meant that a letter
could have a filename indicating it was 2nd class, but once we had
sanitised the letter and checked the address we were setting the
notification to international.
This stopped these letters from being picked up to be sent to the DVLA,
since the filename and postage of the letter did not match.
We now regenerate the filename after the letter has been sanitised (and when
we know the postage) and use the updated filename when moving the letter
into the live PDF letters bucket.
This has been replaced by a new task, `sanitise-letter`, to this deletes
all the code in the old task and ensures that when antivirus is not
enabled locally we are calling the new task.
Added a task, `sanitise-letter`, that will be called from antivirus when
a letter has passed virus scan. This calls a new task in
template-preview which will sanitise the PDF.
A second new task, `process-sanitised-letter`, will be called from the
template preview task and deals with updating the notification and
moving it to the relevant bucket.
Also use this metadata to decide whether preview pages need
overlay or not. So far we have always added overlay when validation
has failed. Now we will only show it when validation failed due to
content being outside of printable area.
Now we consistently use the created_at date, so we can always get the right file location and name.
The previous updates to this code were trying to solve the problem if a pdf being created at 17:29, but not ready to upload until 17:31 after the antivirus and validation check.
But in those cases we would have trouble finding the file.
The `get_bucket_name_and_prefix_for_notification` function was looking
at the `sent_at` or `updated_at` at time of a notification to see which
bucket it was in. Precompiled letters sent through the admin app don't
have either of these times - they only have a `created_at` time, so this
lets the function check `created_at` time too.
this is only applicable when getting a single notification by id. it's
also ignored if the notification isn't a letter.
Otherwise, it overwrites the 'body' field in the response.
If the notification's pdf is available, it returns that, base64
encoded. If the pdf is not available, it returns an empty string.
The pdf won't be available if the notification's status is:
* pending-virus-scan
* virus-scan-failed
* validation-failed
* technical-failure
The pdf will be retrieved from the correct s3 bucket based on its type
Using the created at date for the folder is not always going to work because the pdf created_at date could be just before the cut off date but virus scan and validation has yet to happen. By the time the letters is in the created state, the letter goes into the next days bucket.
It can also happen if the letters is stuck in `pending-virus-scan` and we need to restart the task, and the letters is in a different folder.
- Separated the logic of precompiled and template letters.
- Remove the check for research mode, research mode is not relevant to api calls. The test key is used for testing.
Refactor upload_pdf_letter to accept a precompile boolean to save a query to template.
application. If the Anti-virus app fails due to s3 errors or ClamAV
so does not scan (even after retries) the file at all an error needs
to be raised and the notification set to technical-failure.
Files should be moved to a 'folder' a separate one for ERROR and FAILURE.
* Added new letter task to process the error
* Added a new method to letter utils.py to move a file into an error or
failure folder based on the input
* Added tests to test the task and the utils.py method
- precompiled PDFs sent by test key uploaded to scan bucket
- set status to VIRUS-SCAN-FAILED for pdfs failing virus scan rather than PERMANENT-FAILURE
- Make call to AV app for precompiled letters sent via a test key, and set notification status to PENDING-VIRUS-SCAN
- add function to get reference from filename
- add function to move pdf from scan folder to process folder
- add function to delete pdfs from scan bucket for failed virus scans
* Added is_precompiled_letter method to letter/utils.py
* Added tests for letter/utils.py
* Added tests for the rest endpoint
* Moved the Precompiled name to a central location
* Added hidden field to the test method to create a template