This deletes a big ol' chunk of code related to letters. It's not everything—there are still a few things that might be tied to sms/email—but it's the the heart of letters function. SMS and email function should be untouched by this.
Areas affected:
- Things obviously about letters
- PDF tasks, used for precompiling letters
- Virus scanning, used for those PDFs
- FTP, used to send letters to the printer
- Postage stuff
there's not anything we know we need to do now that we resolve stuck
letters automatically. Letters couuld still get into this state, so it's
worth alerting us. However, we don't have anything concrete that we know
how to fix these letters, so we should just remove the runbook entirely.
Since sept 2019 we've had to log on to production around once every
twenty days to restart the virus scan task for a letter. Most of the
time this is just a case of making sure the file is in the scan bucket,
and then triggering the task. If the file isn't in the scan bucket we'd
need to do some more manual investigation to find out exactly where the
file got stuck, but I can only remember times when it's been in the scan
bucket.
So if the file is in the scan bucket, we can just check that with code
and kick the task off automatically.
We have made it so that gov.uk/alerts shows a ‘1 planned test’ banner
for the whole of the day when there has been an operator test on that
day.
We need to remove the banner when the day is over.
The most straightforward way to do this is to republish the site at the
start of every day. The gov.uk/alerts code[1] will work out if there are
or aren’t any planned tests to show that day.
1. 5a274af6d0/app/models/alerts.py (L38-L44)
The `auto-expire-broadcast-messages` task checks for expired broadcasts
at five minute intervals. This change now calls the
`publish-govuk-alerts` task in govuk-alerts if there are expired
broadcasts so that the site is updated.
Co-authored-by: Katie Smith <katie.smith@digital.cabinet-office.gov.uk>
This updates the tickets that are created when the
`check_if_letters_still_pending_virus_check` scheduled task detects
letters in the `pending-virus-check` state.
Since the expiry is sent as part of the message payload, we don't
need to invoke the CBC proxies (and indeed there's no way to do so
for an expired alert). In future we plan to extend this task so it
triggers the regeneration of content on gov.uk/alerts.
It's worth noting that 'finishes_at' can theoretically be None, in
which case it's unclear when the alert should expire. While alerts
from the Admin app should always have an expiry [1], we have many
in the DB that don't, so it's worth checking for this scenario.
[1]: 078ac10c8d/app/models/broadcast_message.py (L255)
Many of the team members do not look at emails from zendesk, adding a current_app.logger.error message for things we care about to give developers a better chance of seeing them.
I have purposely not added an erro log for `check_for_services_with_high_failure_rates_or_sending_to_tv_numbers` because it's not something we need to look at immediately.
`check_if_letters_still_in_created`
The message to Zendesk includes a list of notification ids, this isn't
really necessary and is included in the run book. Creation of the
Zendesk ticket can fail if the message is too long, removing the list of
ids can prevent that from happening.
We have a scheduled task that was checking for jobs still in progress.
We saw a case where a scheduled job was stuck in a `pending` status as a
result of an app shutting down. This changes the `check_job_status` task
so that it also checks for scheduled jobs which are still pending after
30 minutes.
This doesn't just relate to precompiled letters, it's actually just
checking that there are not any letters still waiting for a virus check
that should not be. This change to the naming makes it more accurate
and therefore easy to understand
This doesn't just relate to templated letters, it's actually just
checking that there are not any letters still in created that should not
be. This change to the naming makes it more accurate and therefore easy
to understand
We already trigger a zendesk ticket for these two cases, meaning that
whenever we get this situation, we get 3 emails. One for the zendesk
ticket, one from logit raising the fact an exception was raised and one
from cloudwatch raising the fact an exception was raised.
We don't need all these emails, a zendesk ticket is sufficient.
Downgrading to a warning means this event will still be findable in our
logs however.
previously we made some incorrect assumptions about set-up on staging
and prod - they currently don't have any cbc_proxy aws creds at all.
We shoudn't be attempting canaries or link tests when there's no AWS
infrastructure to connect to.
We also shouldn't bother writing a row into the database at all for the
broadcast_provider_message since we're not even attempting to send, and
we shouldn't get confused between messages that failed and messages we
never wanted to send at all.
this is a pretty big and convoluted refactor unfortunately.
Previously:
There was one global `cbc_proxy_client` object in apps. This class has
the information about how to invoke the bt-ee lambda, and handles all
calls to lambda. This includes calls to the canary too (which is a
separate lambda).
The future:
There's one global `cbc_proxy_client`. This knows about the different
provider functions and lambdas, and you'll need to ask this client for a
proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')`
and it'll return you a proxy that knows what ee's lambda function is,
how to transform any content in a way that is exclusive to ee, and in
future how to parse any response from ee.
The present:
I also cleaned up some duplicate tests.
I'm really not sure about the names of some of these variables - in
particular `cbc_proxy_client` isn't a client - it's more of a java style
factory, where you call a function on it to get the client of your
choice.
at the moment only EE is enabled (this is set in app.config, but also,
only EE have a function defined for them so even if another provider was
enabled without changing the dict in cbc_proxy.py we won't trigger
anything). this commit just adds wrapper tasks that check what providers
are enabled, and invokes the send function for each provider.
The send function doesn't currently distinguish between providers for
now - as we only have EE set up. in the future we'll want to separate
the cbc_proxy_client into separate clients for separate providers.
Different providers have different lambda functions, and have different
requirements. For example, we know that the two different CBC software
solutions handle references to previous messages differently.
For every missing row this was:
- downloading the CSV file from S3
- looping through every row in it until it found the one matching the
index of the missing row
`RecipientCSV` implements `__getitem__`[1] (which maybe it didn’t
before) so we can create it once, then index the relevant row directly.
***
1. 5ae0572d41/notifications_utils/recipients.py (L78-L79)
The task was raising a JobIncompleteError, yet it's not an error the task is performing it's task correctly and calling the appropriate task to restart the job.
Also used apply_sync to create the task instead of send_task.
create-letters-pdf-tasks is for contacting template preview.
letters-tasks is for doing other things around letters (including putting things on template preview queue)
We were just ignoring the errors and our users were not fixing things.
Given that 500 texts cost approx £8 it's not the end of the world.
In the long run we may decide to just stop letting people try and send
messages to TV numbers but this is a quick fix to stop emails coming in
which we ignore.
We don't need to log this as an exception. It's not an exception, it's
behaviour that is not ideal but is still expected so therefore I've
changed it to warn. Also it removes the email we get for the exception
which is not needed as we get the zendesk ticket instead.
I've also fixed the multiline string meaning the link to the runbook is
included in the zendesk ticket.