Commit Graph

900 Commits

Author SHA1 Message Date
Katie Smith
3d4796c924 Add task to resanitise and replace a PDF for precompiled letter
This adds a task which is designed to be used if we want to recreate the
PDF for a precompiled letter (either one that has been created using the
API or one that has been uploaded through the website).

The task takes the `notification_id` of the letter and passes template
preview the details it needs in order to sanitise the original file and
then replace the version in the letters-pdf bucket with the freshly
sanitised version.
2021-11-10 09:51:31 +00:00
Ben Thorner
d1586a8f81 CC DVLA in tickets about outstanding letters
Previously we sent them emails about this manually. We also tried
a Zendesk macro/trigger approach, but using a CC means:

- We can control the behaviour ourselves (Zendesk triggers can only
be edited by admins outside our team).

- We keep the DVLA notification approach consistent and in one place,
so notifications always go to the same people.

- Any further (public) updates to the ticket will also trigger a
notification to DVLA (previous trigger only notified on creation).
2021-10-29 11:46:29 +01:00
David McDonald
5a51ab6131 Bug fix: update normalised_to, not just to after letter sanitise
When a precompiled letter is sent to us, we set the `to` field as
'Provided as PDF' in
1c1023a877/app/v2/notifications/post_notifications.py (L100-L104)

This then also sets `normalised_to` as `providedaspdf`.

However, when template preview sanitises the letter, pulls out the
address and gives it to the API, we were only setting `to` to be
the new address and had forgotten to also amend `normalised_to` to
be the normalised version. This meant that for all these letters
we accidentally left `normalised_to` as `providedaspdf`. The impact
of this was that we can not then search for these letters in the
admin user interface as they rely on the `normalised_to` field
containing the recipient address.

This commit fixes that bug by also setting the `normalised_to`
field
2021-10-27 11:56:25 +01:00
Ben Thorner
d703251b13 Merge pull request #3348 from alphagov/better-callback-stats-180016688
Include status in stats about delivery times
2021-10-22 11:59:24 +01:00
Ben Thorner
f974108934 Include status in stats about delivery times
Previously these metrics weren't very useful because they could be
skewed by long timings for failed notifications, which can take up
to 72 hours to deliver. I'm intentionally not trying to have a dual
running period (with the old and new names) because:

- We don't use the current stats for anything (checking Grafana).

- The current stats get turned into a "bucket" metric in Prometheus
[1][2], which isn't very useful because it can only tell us the mean
time to deliver, but we're actually interested in percentiles.

Switching to a new naming is an opportunity to fix the raw data and
the way it's aggregated, using the same kind of "summary" metric that
we now use for stats about our Celery tasks [3].

[1]: c330a8ac8a/paas/statsd/statsd-mapping.yml (L82)
[2]: https://prometheus.io/docs/practices/histograms/#quantiles
[3]: https://github.com/alphagov/notifications-aws/pull/890
2021-10-20 17:22:59 +01:00
Leo Hemsted
0b8c6ef263 Merge pull request #3339 from alphagov/letter-runbook-link
tweak zendesk message for no ack files alert
2021-10-20 15:23:33 +01:00
Pea Tyczynska
1b6f9505da Call publish-govuk-alerts task when alert expires
The `auto-expire-broadcast-messages` task checks for expired broadcasts
at five minute intervals. This change now calls the
`publish-govuk-alerts` task in govuk-alerts if there are expired
broadcasts so that the site is updated.

Co-authored-by: Katie Smith <katie.smith@digital.cabinet-office.gov.uk>
2021-10-18 08:41:25 +01:00
Katie Smith
04bfd6bfdb Trigger task to publish alerts when sending or cancelling alert
When we send or cancel a broadcast message, we now trigger a task
in govuk-alerts repo that polls our API for alerts and
publishes a fresh list of alerts.

Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>
2021-10-18 08:41:24 +01:00
Leo Hemsted
b8c4e19072 tweak zendesk message for no ack files alert
include a link to a runbook entry.

also the list of acknowledgement files can be very long, so make that
the last thing, and use new lines to space out the message.
2021-10-08 13:45:02 +01:00
Katie Smith
58597653df Update how "sending to TV numbers" Zendesk tickets are created 2021-09-29 11:26:20 +01:00
Katie Smith
0c0c7f4478 Update how "letters still created status" Zendesk tickets are created 2021-09-29 11:23:28 +01:00
Katie Smith
2f66e38fb9 Update how "missing ackfile for letters" Zendesk tickets are created 2021-09-29 11:10:50 +01:00
Katie Smith
64c0a3fb9d Update how 'letters still sending' Zendesk tickets are created
These now use the new Zendesk form.
2021-09-29 11:07:37 +01:00
Katie Smith
b114dadcae Update how pending virus check Zendesk tickets are created
This updates the tickets that are created when the
`check_if_letters_still_pending_virus_check` scheduled task detects
letters in the `pending-virus-check` state.
2021-09-29 11:03:48 +01:00
Katie Smith
9ff0ca0363 Update how live broadcast Zendesk tickets are created
These now use the Notify Form in Zendesk
2021-09-29 10:59:07 +01:00
sakisv
9faa3d34e1 Fix tests
Specifically, no longer test for a p1 zendesk when sending an alert
and drop misleading "p1" from test name when cancelling an alert.

We're no longer creating a P1 from the code, but we _do_ create a
zendesk ticket when sending out an alert.

When cancelling, what we want to test is that we don't create a second
ticket when the alert is cancelled.
2021-09-10 10:14:28 +03:00
Ben Thorner
bf0bf4e31c Favour new "areas" format for PagerDuty alerts
Broadcasts created via the API [1] and the Admin app [2] should
both now have this field set. It's also more informative to show
this, and broadcasts created via the API don't have IDs anyway.

There's a small risk that an old broadcast that gets approved won't
have this data, but it's for information only and we intend to
backfill all old broadcasts in the near future.

[1]: 023a06d5fb
[2]: 7dbe3afa19
2021-08-27 14:22:12 +01:00
Ben Thorner
a7d92b9058 Replace / remove redundant uses of "areas"
In one case ("areas=['manchester']") the format was even invalid,
but in general the original value of the column is pretty much
irrelevant for tests that involve updating it (it's highly unlikely
the column would default to the same value as the test data).
2021-08-27 13:31:49 +01:00
Ben Thorner
312a895822 Merge pull request #3294 from alphagov/auto-expire-alerts-178926353
Auto expire old broadcast messages
2021-07-22 09:53:41 +01:00
Ben Thorner
5e9d8e5fa0 Auto expire old broadcast messages
Since the expiry is sent as part of the message payload, we don't
need to invoke the CBC proxies (and indeed there's no way to do so
for an expired alert). In future we plan to extend this task so it
triggers the regeneration of content on gov.uk/alerts.

It's worth noting that 'finishes_at' can theoretically be None, in
which case it's unclear when the alert should expire. While alerts
from the Admin app should always have an expiry [1], we have many
in the DB that don't, so it's worth checking for this scenario.

[1]: 078ac10c8d/app/models/broadcast_message.py (L255)
2021-07-21 13:05:11 +01:00
Ben Thorner
08f48379b4 Move ID generation into link test method
Unlike the other IDs which are stored in the DB, this isn't relevant
for the Celery task as it invokes a link test. Moving it into the
proxy client will also enable us to generate a second ID in the next
commits, where we start doing a link test for the failover lambda.
2021-07-19 16:00:55 +01:00
Ben Thorner
b6774bf0f7 Generate Vodafone link test sequence nos in proxy
Previously the Celery task to trigger a link test had to know about
the special case of a sequence number for Vodafone. Since we're about
to change the client to perform multiple tests it makes sense to give
it the knowledge of how to generate number itself.

Note that we have to import the db inline to avoid a circular import,
since this module is itself imported by app/__init__.py.

Other invocations of the Vodafone client use stored sequence numbers
from the DB, which are called "message numbers" in that context. Since
the two use cases are very different (even the names are different!),
having them in two places shouldn't cause any confusion.
2021-07-19 15:43:36 +01:00
Leo Hemsted
2ad9a3a380 retry service callbacks on 429
if we're served a 429, put the item on the retry queue and retry the
same as if the service returned a 5xx. 429 is commonly returned for rate
limit exceeding, and retrying on a delay is a typical response to that.
2021-07-13 16:09:17 +01:00
Pea Tyczynska
c28e9451d4 Bump moto version to try solve dependencies version conflict
Also update mock import statements in some test files as they
stopped working with this dependency update.
2021-07-08 15:37:19 +01:00
Rebecca Law
18dd9050a4 - make sure when processing a job that we check the total_sent + job.notification_count against the service.message_limit. 2021-06-28 13:07:48 +01:00
Rebecca Law
fd7486d751 - Merge daily limit functions into one, refactor call for daily limit check from process_job
- refactor tests to standardise test names
- refactor some tests to be more clear
- remove unnecessary tests
- include missing test
2021-06-24 11:05:22 +01:00
Rebecca Law
35b20ba363 Correct the daily limits cache.
Last year we had an issue with the daily limit cache and the query that was populating it. As a result we have not been checking the daily limit properly. This PR should correct all that.

The daily limit cache is not being incremented in app.notifications.process_notifications.persist_notification, this method is and should always be the only method used to create a notification.
We increment the daily limit cache is redis is enabled (and it is always enabled for production) and the key type for the notification is team or normal.

We check if the daily limit is exceed in many places:
 - app.celery.tasks.process_job
 -  app.v2.notifications.post_notifications.post_notification
 - app.v2.notifications.post_notifications.post_precompiled_letter_notification
 - app.service.send_notification.send_one_off_notification
 - app.service.send_notification.send_pdf_letter_notification

If the daily limits cache is not found, set the cache to 0 with an expiry of 24 hours. The daily limit cache key is service_id-yyy-mm-dd-count, so each day a new cache is created.

The best thing about this PR is that the app.service_dao.fetch_todays_total_message_count query has been removed. This query was not performant and had been wrong for ages.
2021-06-22 16:15:36 +01:00
David McDonald
be035664c4 Add operator channel to broadcast settings route
Looks identical to the government channel in terms of the interface
2021-06-09 13:49:06 +01:00
Rebecca Law
1bf5ce08b2 Add a error log for alert tasks.
Many of the team members do not look at emails from zendesk, adding a current_app.logger.error message for things we care about to give developers a better chance of seeing them.
I have purposely not added an erro log for `check_for_services_with_high_failure_rates_or_sending_to_tv_numbers` because it's not something we need to look at immediately.
2021-05-26 11:06:21 +01:00
Katie Smith
829b646931 Allow "government" in broadcast_channel schema
This will allow admin to pass through a value of "government" for the
broadcast_channel. We don't have any logic around the value of service.broadcast_channel,
so no updates are needed to the tasks etc.
2021-05-11 16:56:56 +01:00
Katie Smith
4624328c36 Make service_broadcast_settings.provider non-nullable
We set all existing null values to "all", then make the column
non-nullable. Admin is already passing through the value of "all".
2021-05-10 15:59:22 +01:00
Katie Smith
1767535def Allow service.allowed_broadcast_provider to be "all"
We want to replace the value `None` for
service.allowed_broadcast_provider with the value of "all". As a first
step, we need to allow both values. Once notifications-admin has been
changed to pass through "all" and all the data in the database has been
updated, we can update the code to stop supporting both values.
2021-05-06 15:32:02 +01:00
Katie Smith
8365c749e4 Change letter zip file names for Insolvency Service letters
DVLA would like to be able to identify letters sent by the Insolvency
Service, so we are changing the zipfile name. They need all zipfile
names to have the same structure, so we can't just add a marker to files
sent by that service - we have to change all filenames.

The new format is like this:
`{NOTIFY}.{DATE}.{SEQUENCE_ID}.{UNIQUE_ID}.{SERVICE_ID}.{ORG_NAME}.{EXTENSION}`
2021-05-06 09:18:44 +01:00
Ben Thorner
23f4ae32df Merge pull request #3214 from alphagov/check-broadcast-suspended
Enforce service suspension for broadcasts
2021-04-28 15:01:11 +01:00
Ben Thorner
99bc29418e Move request_id injection into send_task override
This applies the same change we made in other apps [1][2]. Adding
the override here is special, though, because it means the others
will now get triggered, since this app is the start of the chain
of tasks for a request. We will also retain existing request_id
tracing for tasks within this app, since "apply_async" calls the
"send_task" method internally, which is the one we're overriding.

[1]: 6f3c118a1e
[2]: 2e08b7aa95
2021-04-27 10:35:21 +01:00
Ben Thorner
a2af8b052a Split up authorisation vs. sequencing checks
While both of these are integrity errors (since we should never
reach this point in the code + data), this just means the original
method comment is still relevant to what immediately follows it.
2021-04-19 17:13:15 +01:00
Ben Thorner
936c9ebdfe Test sanity checks by calling top-level task
Since the checks are only performed in one place we can easily take
extra care to ensure this in the tests, noting that we don't need to
do any additional setup, except if no exception is raised - I've left
these tests as-is, to avoid doing more setup.

Note that we still check the happy path for when a provider message is
already sending - just in a different test [1].

[1]: 3d71815956/tests/app/celery/test_broadcast_message_tasks.py (L263)
2021-04-19 17:13:14 +01:00
Ben Thorner
ee52e3e2c9 Mirror integrity checks from the API
It makes sense to have these checks [1] here, since in future we may
add other ways of creating a broadcast event and omit them.

[1]: 3d71815956/app/broadcast_message/rest.py (L198)
2021-04-19 17:13:13 +01:00
Ben Thorner
0070473f31 Check for suspension before sending a broadcast
This mirrors the check we do for jobs, which are also a high-impact
task [1]. While this shouldn't be possible, just like other checks
we're adding it here to be doubly certain.

[1]: 3d71815956/app/celery/tasks.py (L74)
2021-04-19 17:13:12 +01:00
Ben Thorner
b2398fcaf4 Rename CBCProxyFatalException
We only actually use this when the data we're working with is in an
unexpected state, which is unrelated to the CBC Proxy. Using this
name also means we can re-use this exception in the next commits.

Note that we may still care if a broadcast message has expired, since
it's not expected that someone would send one in this condition.
2021-04-19 17:13:05 +01:00
Rebecca Law
34a378a60e Update the Zendesk ticket content for
`check_if_letters_still_in_created`

The message to Zendesk includes a list of notification ids, this isn't
really necessary and is included in the run book. Creation of the
Zendesk ticket can fail if the message is too long, removing the list of
ids can prevent that from happening.
2021-04-19 10:47:25 +01:00
Ben Thorner
be02573147 Fix apply_async not working with positional kwargs
Celery's apply_async function accepts 'kwargs' as (get ready to be
confused) either a positional argument, or a keyword argument:

Positional: apply_async(['args'], {'kw': 'args'})

Keyword: apply_async(args=['args'], kwargs={'kw': 'args'})

We rely on the positional form in at least one place [1]. This fixes
the overload of apply_async to cope with both forms, and continue to
pass through any other (confusion time again) keyword args to super(),
such as queue="queue".

Note that we've also decided to stop accepting other positional args,
since this is unnecessarily confusing, and we don't currently rely on
it in our code. This stops it creeping in in future.

[1]: fde927e00e/app/job/rest.py (L186)
2021-04-15 17:21:21 +01:00
Ben Thorner
ec6d87cd0f Simplify argument passing in apply_async
This avoids the need to keep in-sync with any future changes to the
signature, and reduces the amount of irrelevant code to read.
2021-04-13 15:12:45 +01:00
David McDonald
2e6d761691 Merge pull request #3204 from alphagov/broadcast-envars
Broadcast envars
2021-04-12 17:25:15 +01:00
David McDonald
295162c81d Move CBC proxy enable check
This change will make our development environments closer to production
even if they aren't hooked up to the CBC proxy lambda functions.

Now in development, we will create the broadcast event and create tasks
for each broadcast provider event. We will still not create actual
broadcast provider message rows in the DB and talk to the CBC proxies.

This should be helpful in development to catch any issues we introduce
to do with sending broadcast messaging. In time we may wish to have some
fake CBC proxies in the AWS tools account that we can interact with to
make it even more realistic.
2021-04-12 17:05:41 +01:00
Ben Thorner
3e507eea55 Merge pull request #3201 from alphagov/revamp-celery-stats
Migrate towards new metrics for Celery tasks
2021-04-12 15:04:37 +01:00
Ben Thorner
37f91e0214 Add tests for apply_async injecting request_id 2021-04-12 14:50:55 +01:00
Ben Thorner
df6e27d8fd Add test for extracting request_id in __call__
Tasks will fail if we leave the kwarg in, so I think it's quite
important that we test this works. We don't cover this in any other
test because we call the task functions directly, so the request_id
kwarg doesn't get injected beforehand.
2021-04-12 14:50:53 +01:00
Ben Thorner
8954cec5a1 Add tests for celery task superclass
This requires upgrading freezegun, as time.monotonic wasn't frozen
by v1.0. Note that we need to explicitly specify the base class for
the task in the test, the reason for which is quite subtle:

- Normally, by using the 'notify_api' fixture, the base class is set
to NotifyTask automatically by running app.create_app [1].

- However, when run alongside other tests, the imports of files with
other celery tasks cause the base class to be instantiated and cached
as the default Celery one. This means none of our tests actually use
our custom superclass when testing tasks.

Because we can't run 'apply_async' directly (since this would require
an actual Celery broker), we need to manually push/pop the request
Context that's normally done as part of sending a task.

Note also that we use a UUID as the name for a task, since these are
global. We want to avoid the task polluting other tests in future,
as well as make it clear the task is being reused.

[1]: dea5828d0e/app/__init__.py (L113)
2021-04-12 14:50:02 +01:00
Leo Hemsted
4a5b1c23bd only send zendesk P1 for alerts
we don't need to be re-notified when someone clicks cancel
2021-04-08 12:22:18 +01:00