Commit Graph

1124 Commits

Author SHA1 Message Date
Leo Hemsted
49cc1b643f split delete task up into per service
we really don't gain anything by running each service delete in sequence
- we get the services, and then just loop through them deleting per
service. By deleting per service in separate tasks, we can take
advantage of parallelism. the only thing we lose is some log lines but I
don't think we're that interested in them.

only set query limit at the move_notifications dao function - the task
doesn't really care about the technical implementation of how it deletes
the notifications
2021-12-14 15:24:34 +00:00
Ben Thorner
c8cf057eba Record providers we time out notifications for
This will help us monitor issues with delivery receipts and keep
track of provider performance over time.

I'm not concerned about performance here:

- The number of notifications to time out is usually small.
- This task only runs once a day.
- Calls to StatsD are quick and cheap.
2021-12-14 13:04:39 +00:00
Ben Thorner
87cd40d00a Scale timeout task to work on arbitrary volumes
Previously this was limited to 500K notifications. While we don't
expect to reach this limit, it's not impossible e.g. if we had a
repeat of the incident where one of our providers stopped sending
us status updates. Although that's not great, it's worse if our
code can't cope with the unexpectedly high volume.

This reuses the technique we have elsewhere [1] to keep processing
in batches until there's nothing left. Specifying a cutoff point
means the total amount of work to do can't keep growing.

[1]: 2fb432adaf/app/dao/notifications_dao.py (L441)
2021-12-13 17:14:28 +00:00
Ben Thorner
76aeab24ce Rewrite DAO timeout method to take cutoff_time
Previously we specified the period and calculated the cutoff time
in the function. Passing it in means we can run the method multiple
times and avoid getting "new" notifications to time out in the time
it takes to process each batch.
2021-12-13 16:56:21 +00:00
Ben Thorner
ab4cb029df Remove alert for email / sms in created
In response to [1].

[1]: https://github.com/alphagov/notifications-api/pull/3383#discussion_r759379988

It turns out the code that inspired this new alert - in the old
"timeout-sending-notifications" task - was actually redundant as
we already have a task to "replay" notifications still in "created",
which is much better than just alerting about them.

It's possible the replayed notifications will also fail, but in
both cases we should see some kind of error due to this, so I don't
think we're losing anything by not having an alert.
2021-12-06 14:11:42 +00:00
Ben Thorner
04da017558 DRY-up conditionally creating callback tasks
This removes 3 duplicate instances of the same code, which is still
tested implicitly via test_process_ses_receipt_tasks [1]. In the
next commit we'll make this test more explicit, to reflect that it's
now being reused elsewhere and shouldn't change arbitrarily.

We do lose the "print" statement from the command instance of the
code, but I think that's a very tolerable loss.

[1]: 16ec8ccb8a/tests/app/celery/test_process_ses_receipts_tasks.py (L94)
2021-12-06 14:11:34 +00:00
Ben Thorner
0318229216 Stop 'timing out' old 'created' notifications
This is being replaced with a new alert and runbook [1]. It's not
always appropriate to change the status to 'technical-failure', and
the new alert means we'll act to fix the underlying issue promptly.

We'll look at tidying up the remaining code in the next commits.

[1]: https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-email-or-sms-still-in-created
2021-12-06 14:00:36 +00:00
Ben Thorner
f96ba5361a Add new task to alert about created email / sms
This will log an error when email or SMS notifications have been
stuck in 'created' for too long - normally they should be 'sending'
in seconds, noting that we have a goal of < 10s wait time for most
notifications being processed our platform.

In the next commits we'll decouple similar functionality from the
existing 'timeout-sending-notifications' task.
2021-12-06 14:00:31 +00:00
Leo Hemsted
f6d210f1e6 put delete tasks on the reporting worker
they share a lot with the reporting tasks (creating ft_billing and
ft_notification_status), in that they're run nightly, take a long time,
and we see error messages if they get run multiple times (due to
visibility timeout).

The periodic app has two concurrent processes - previously there was
just one delete task, which would use one of those processes, while the
other process would pick up anything else on the queue (at that time of
night, the regular provider switch checks and scheduled job checks).
However, when we switched to running the three delete notification types
separately, we saw visibility timeout issues - three tasks would be
created, all three would be picked up by one celery instance, the two
worker processes would start on two of them, and the third would sit on
the box, wait longer than the visibility timeout to be picked up (and
acknowledged), and so SQS would assume the task was lost and replay it.

it's queues all the way down!

By putting them on the reporting worker we can take advantage of tuning
that app (for example setting the prefetch multiplier to one) which is
designed to run large tasks. We've also got more concurrent workers on
this box, so we can run all three tasks at once.
2021-12-03 13:28:16 +00:00
Leo Hemsted
6bbec9f103 make delete notification tasks parallel by notification type
we used to do this until apr 2020. Let's try doing it again.
Back then, we had problems with timing. We did two things in spring
2020:

We moved to using an intermediary temp table [1]
We stopped the tasks being parallelised [2]

However, it turned out the real time saving was from changing what
services we delete for [3]. The task was actually CPU-bound rather than
DB-bound, so that's probably why having the tasks in parallel wasn't
helping, since they were all competing for the same CPU. It's worth
trying the parallel steps again now that we're no longer CPU bound.

Note: Temporary tables are in their own postgres schema, and are only
viewable by the current session (session == connection. Each celery
worker process has its own db connection). We don't need to worry about
separate workers both trying to use the same table at once.

I've also added a "DROP ON COMMIT" directive to the table definition
just to ensure it doesn't persist past the task even if there's an
exception. (This also drops on rollback).

Cronitor looks at the three functions separately so we don't need to worry
about the main task taking milliseconds where it used to take hours as
it isn't monitored itself.

I've also removed some unnecessary redundant exception logs.

[1] https://github.com/alphagov/notifications-api/pull/2767
[2] https://github.com/alphagov/notifications-api/pull/2798
[3] https://github.com/alphagov/notifications-api/pull/3381
2021-12-01 14:28:08 +00:00
Ben Thorner
bffca39223 Merge pull request #3373 from alphagov/centralise-celery-180213914
Use central NotifyCelery base class in utils
2021-11-17 12:09:28 +00:00
Ben Thorner
666ac1ab4f Log activity on all periodic Celery tasks
As stated in the comment, this would have been helpful during an
incident to give further reassurance that a task had at least
started running - at the time the only evidence for this was the
Cronitor dashboard itself, which we don't often look at.

I've removed other, equivalent "starting" logs, but kept those
that provide additional information in the log message.
2021-11-17 09:48:03 +00:00
Ben Thorner
528223ed61 Use central NotifyCelery base class in utils
Note that the new base class doesn't include a bespoke feature we
had here: 'log_on_worker_shutdown'. We've agreed it's reasonable
to remove it for now as it was introduced many years ago and its
use case is unclear - we can always add it back if needed.
2021-11-16 13:58:12 +00:00
Chris Hill-Scott
0236318189 Republish gov.uk/alerts every night to clear down planned tests
We have made it so that gov.uk/alerts shows a ‘1 planned test’ banner
for the whole of the day when there has been an operator test on that
day.

We need to remove the banner when the day is over.

The most straightforward way to do this is to republish the site at the
start of every day. The gov.uk/alerts code[1] will work out if there are
or aren’t any planned tests to show that day.

1. 5a274af6d0/app/models/alerts.py (L38-L44)
2021-11-15 14:23:32 +00:00
Ben Thorner
48e1482d90 Merge pull request #3366 from alphagov/celery-extend-request-id-180213914
Extend request tracing to cover Celery logs
2021-11-12 11:10:38 +00:00
Ben Thorner
d66c68d6d6 Merge pull request #3364 from alphagov/celery-headers-request-id-180213914
Move Celery task Request ID injection into headers
2021-11-12 11:10:29 +00:00
Ben Thorner
ac06529128 Enable request tracing on Celery success/fail logs
Previously these logs wouldn't have a Request ID attached since the
Celery hooks run after the __call__ method where we enable request
tracing for normal application logs. For the failure log especially
it will be useful to have this feature.
2021-11-10 18:04:20 +00:00
Ben Thorner
369a9f7521 Refactor queue_name and request_id into properties
This reduces the complexity of the original functions, which will
go up a bit in the next commit.
2021-11-10 18:04:19 +00:00
Ben Thorner
89a8dd1a03 Move Celery task Request ID injection into headers
Previously we passed along this piece of state via the kwargs for
a task, but this runs the risk of the task accidentally receiving
the extra kwarg unless we've covered all the code paths that could
invoke it directly e.g. retries don't invoke __call__.

This switches to using Celery "headers" to pass the extra state. It
turns out that a Celery has two "header" concepts, which leads to
some confusion and even a bug with the framework [1]:

- In older (pre v4.4) versions of Celery, the "headers" specified
by apply_async() would become _the_ headers in the message that
gets passed around workers, etc. These would be available later on
via "self.request.headers".

- Since Celery protocol v2, the meaning of "headers" in the message
changed to become (basically) _all_ metadata about the task [2],
with the "headers" option in apply_async() being merged [3] into
the big dict of metadata.

This makes using headers a bit confusing unfortunately, since the
data structure we put in is subtly different to what comes out in
the request context. Nonetheless, it still works. I've added some
comments to try and clarify it.

Note that one of the original tests is no longer necessary, since we
don't need to worry about argument passing styles with headers.

[1]: https://github.com/celery/celery/issues/4875
[2]: 663e4d3a0b (diff-07a65448b2db3252a9711766beec23372715cd7597c3e309bf53859eabc0107fR343)
[3]: 681a922220/celery/app/amqp.py (L495)
2021-11-10 18:03:40 +00:00
Katie Smith
3d4796c924 Add task to resanitise and replace a PDF for precompiled letter
This adds a task which is designed to be used if we want to recreate the
PDF for a precompiled letter (either one that has been created using the
API or one that has been uploaded through the website).

The task takes the `notification_id` of the letter and passes template
preview the details it needs in order to sanitise the original file and
then replace the version in the letters-pdf bucket with the freshly
sanitised version.
2021-11-10 09:51:31 +00:00
Ben Thorner
cdb43fbaf6 Only loop timeout task if there's more work
Previously this would repeat the task even the current iteration of
the loop had processed a non-full batch. This could cause the task
to error incorrectly if one or two notifications breach the timeout
threshold in between iterations.
2021-11-09 15:41:14 +00:00
Ben Thorner
77c8c0a501 Optimise query to get notifications to "time out"
From experimenting in production we found a "!=" caused the engine
to use a sequential scan, whereas explicitly listing all the types
ensured an index scan was used.

We also found that querying for many (over 100K) items leads to
the task stalling - no logs, but no evidence of it running either -
so we also add a limit to the query.

Since the query now only returns a subset of notifications, we need
to ensure the subsequent "update" query operates on the same batch.
Also, as a temporary measure, we have a loop in the task code to
ensure it operates on the total set of notifications to "time out",
which we assume is less than 500K for the time being.
2021-11-09 13:50:32 +00:00
Ben Thorner
3ecbdbb260 Temporarily disable task argument checking
This was added in Celery 4 [1]. and appears to be incompatible with
our approach of injecting "request_id" into task arguments (example
exception below). Although our other apps are on Celery 5 our logs
don't show any similar issues, probably because all their tasks are
invoked without request IDs. In the longterm we should decide if we
want to enable argument checking and fix the tracing approach, or
stop tracing request IDs in Celery tasks.

[1]: https://docs.celeryproject.org/en/stable/userguide/tasks.html#argument-checking

    2021-11-01T11:37:36 delivery delivery ERROR None "RETRY: Email notification f69a9305-686f-42eb-a2ee-61bc2ba1f5f3 failed" [in /Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py:68]
    Traceback (most recent call last):
      File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 53, in deliver_email
        raise TypeError("test retry")
    TypeError: test retry
    [2021-11-01 11:37:36,385: ERROR/ForkPoolWorker-1] RETRY: Email notification f69a9305-686f-42eb-a2ee-61bc2ba1f5f3 failed
    Traceback (most recent call last):
      File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 53, in deliver_email
        raise TypeError("test retry")
    TypeError: test retry
    [2021-11-01 11:37:36,394: WARNING/ForkPoolWorker-1] Task deliver_email[449cd221-173c-4e18-83ac-229e88c029a5] reject requeue=False: deliver_email() got an unexpected keyword argument 'request_id'
    Traceback (most recent call last):
      File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 53, in deliver_email
        raise TypeError("test retry")
    TypeError: test retry

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/task.py", line 731, in retry
        S.apply_async()
      File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/canvas.py", line 219, in apply_async
        return _apply(args, kwargs, **options)
      File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/task.py", line 537, in apply_async
        check_arguments(*(args or ()), **(kwargs or {}))
    TypeError: deliver_email() got an unexpected keyword argument 'request_id'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/trace.py", line 450, in trace_task
        R = retval = fun(*args, **kwargs)
      File "/Users/benthorner/Documents/Projects/api/app/celery/celery.py", line 74, in __call__
        return super().__call__(*args, **kwargs)
      File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/trace.py", line 731, in __protected_call__
        return self.run(*args, **kwargs)
      File "/Users/benthorner/Documents/Projects/api/app/celery/provider_tasks.py", line 71, in deliver_email
        self.retry(queue=QueueNames.RETRY)
      File "/Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/task.py", line 733, in retry
        raise Reject(exc, requeue=False)
    celery.exceptions.Reject: (TypeError("deliver_email() got an unexpected keyword argument 'request_id'",), False)
2021-11-01 11:39:57 +00:00
Ben Thorner
44b3b42aba Rewrite config to fix deprecation warnings
The new format was introduced in Celery 4 [1] and is due for removal
in Celery 6 [2], hence the warnings e.g.

    [2021-10-26 14:31:57,588: WARNING/MainProcess] /Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/utils.py:206: CDeprecationWarning:
        The 'CELERY_TIMEZONE' setting is deprecated and scheduled for removal in
        version 6.0.0. Use the timezone instead

      alternative=f'Use the {_TO_NEW_KEY[setting]} instead')

This rewrites the config to match our other apps [3][4]. Some of the
settings have been removed entirely:

- "CELERY_ENABLE_UTC = True" - this has been enabled by default since
  Celery 3 [5].

- "CELERY_ACCEPT_CONTENT = ['json']", "CELERY_TASK_SERIALIZER = 'json'"
  - these are the default settings since Celery 4 [6][7].

Finally, this removes a redundant (and broken) bit of development config
- NOTIFICATION_QUEUE_PREFIX - that should be set in environment.sh [8].

[1]: https://docs.celeryproject.org/en/stable/history/whatsnew-4.0.html#lowercase-setting-names
[2]: https://docs.celeryproject.org/en/stable/history/whatsnew-5.0.html#step-2-update-your-configuration-with-the-new-setting-names
[3]: 252ad01d39/app/config.py (L27)
[4]: 03df0d9252/app/__init__.py (L33)
[5]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-enable_utc
[6]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-task_serializer
[7]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-accept_content
[8]: 2edbdec4ee/README.md (environmentsh)
2021-11-01 09:54:05 +00:00
Ben Thorner
c2fe1b04bb Fix test checking for nested exception
Previously this type of exception was raised at the top level and
the task did not retry [1]. Since Celery 4+ the behaviour changed
so that a Retry exception will be raised unless we explicitly say
we want to raise the original one [2].

It's unclear if we actually want to retry this task for any type
of exception, but it's out-of-scope for this PR to decide on this,
so here we just reraise the exception to make it compatible with
the new version of Celery and the existing test.

[1]: https://github.com/alphagov/notifications-api/pull/2832/files#diff-926badba91648d56a973e16bd92da3345b23bc60dc89360119b1df08de52723fL77
[2]: 32b52ca875 (diff-db604dd7cb51e386710260ff2eba378aac19ba11eec97904bbf097b68caeada6L625)
2021-11-01 09:54:03 +00:00
Ben Thorner
d1586a8f81 CC DVLA in tickets about outstanding letters
Previously we sent them emails about this manually. We also tried
a Zendesk macro/trigger approach, but using a CC means:

- We can control the behaviour ourselves (Zendesk triggers can only
be edited by admins outside our team).

- We keep the DVLA notification approach consistent and in one place,
so notifications always go to the same people.

- Any further (public) updates to the ticket will also trigger a
notification to DVLA (previous trigger only notified on creation).
2021-10-29 11:46:29 +01:00
David McDonald
5a51ab6131 Bug fix: update normalised_to, not just to after letter sanitise
When a precompiled letter is sent to us, we set the `to` field as
'Provided as PDF' in
1c1023a877/app/v2/notifications/post_notifications.py (L100-L104)

This then also sets `normalised_to` as `providedaspdf`.

However, when template preview sanitises the letter, pulls out the
address and gives it to the API, we were only setting `to` to be
the new address and had forgotten to also amend `normalised_to` to
be the normalised version. This meant that for all these letters
we accidentally left `normalised_to` as `providedaspdf`. The impact
of this was that we can not then search for these letters in the
admin user interface as they rely on the `normalised_to` field
containing the recipient address.

This commit fixes that bug by also setting the `normalised_to`
field
2021-10-27 11:56:25 +01:00
Ben Thorner
d703251b13 Merge pull request #3348 from alphagov/better-callback-stats-180016688
Include status in stats about delivery times
2021-10-22 11:59:24 +01:00
Ben Thorner
f974108934 Include status in stats about delivery times
Previously these metrics weren't very useful because they could be
skewed by long timings for failed notifications, which can take up
to 72 hours to deliver. I'm intentionally not trying to have a dual
running period (with the old and new names) because:

- We don't use the current stats for anything (checking Grafana).

- The current stats get turned into a "bucket" metric in Prometheus
[1][2], which isn't very useful because it can only tell us the mean
time to deliver, but we're actually interested in percentiles.

Switching to a new naming is an opportunity to fix the raw data and
the way it's aggregated, using the same kind of "summary" metric that
we now use for stats about our Celery tasks [3].

[1]: c330a8ac8a/paas/statsd/statsd-mapping.yml (L82)
[2]: https://prometheus.io/docs/practices/histograms/#quantiles
[3]: https://github.com/alphagov/notifications-aws/pull/890
2021-10-20 17:22:59 +01:00
Leo Hemsted
0b8c6ef263 Merge pull request #3339 from alphagov/letter-runbook-link
tweak zendesk message for no ack files alert
2021-10-20 15:23:33 +01:00
Pea Tyczynska
1b6f9505da Call publish-govuk-alerts task when alert expires
The `auto-expire-broadcast-messages` task checks for expired broadcasts
at five minute intervals. This change now calls the
`publish-govuk-alerts` task in govuk-alerts if there are expired
broadcasts so that the site is updated.

Co-authored-by: Katie Smith <katie.smith@digital.cabinet-office.gov.uk>
2021-10-18 08:41:25 +01:00
Katie Smith
04bfd6bfdb Trigger task to publish alerts when sending or cancelling alert
When we send or cancel a broadcast message, we now trigger a task
in govuk-alerts repo that polls our API for alerts and
publishes a fresh list of alerts.

Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>
2021-10-18 08:41:24 +01:00
Leo Hemsted
b8c4e19072 tweak zendesk message for no ack files alert
include a link to a runbook entry.

also the list of acknowledgement files can be very long, so make that
the last thing, and use new lines to space out the message.
2021-10-08 13:45:02 +01:00
Katie Smith
58597653df Update how "sending to TV numbers" Zendesk tickets are created 2021-09-29 11:26:20 +01:00
Katie Smith
0c0c7f4478 Update how "letters still created status" Zendesk tickets are created 2021-09-29 11:23:28 +01:00
Katie Smith
2f66e38fb9 Update how "missing ackfile for letters" Zendesk tickets are created 2021-09-29 11:10:50 +01:00
Katie Smith
64c0a3fb9d Update how 'letters still sending' Zendesk tickets are created
These now use the new Zendesk form.
2021-09-29 11:07:37 +01:00
Katie Smith
b114dadcae Update how pending virus check Zendesk tickets are created
This updates the tickets that are created when the
`check_if_letters_still_pending_virus_check` scheduled task detects
letters in the `pending-virus-check` state.
2021-09-29 11:03:48 +01:00
Katie Smith
9ff0ca0363 Update how live broadcast Zendesk tickets are created
These now use the Notify Form in Zendesk
2021-09-29 10:59:07 +01:00
sakisv
df739d6b94 Fix flake8 2021-09-10 10:17:28 +03:00
sakisv
65c21f694c Don't raise P1 for broadcasts
This is happening on the AWS side now as part of
alphagov/notifications-broadcasts-infra#267 - but we still want to keep
the zendesk ticket as it contains useful context _and_ provides
visibility to the team.
2021-09-09 16:44:19 +03:00
Ben Thorner
bf0bf4e31c Favour new "areas" format for PagerDuty alerts
Broadcasts created via the API [1] and the Admin app [2] should
both now have this field set. It's also more informative to show
this, and broadcasts created via the API don't have IDs anyway.

There's a small risk that an old broadcast that gets approved won't
have this data, but it's for information only and we intend to
backfill all old broadcasts in the near future.

[1]: 023a06d5fb
[2]: 7dbe3afa19
2021-08-27 14:22:12 +01:00
Ben Thorner
8f39d476bd Start dual running with "areas" and (area) "ids"
This is necessary until:

- The Admin app is using the new "areas(_2)" format to store and
retrieve data.

- We've migrated all existing broadcast messages to use the new
format.

Note that "areas" / "ids" isn't actually used for anything except
printing out the PagerDuty message - it's not sent to the proxy [1].

[1]: 6edc6c70aa/app/celery/broadcast_message_tasks.py (L190-L193)
2021-08-26 15:34:24 +01:00
Ben Thorner
312a895822 Merge pull request #3294 from alphagov/auto-expire-alerts-178926353
Auto expire old broadcast messages
2021-07-22 09:53:41 +01:00
Ben Thorner
5e9d8e5fa0 Auto expire old broadcast messages
Since the expiry is sent as part of the message payload, we don't
need to invoke the CBC proxies (and indeed there's no way to do so
for an expired alert). In future we plan to extend this task so it
triggers the regeneration of content on gov.uk/alerts.

It's worth noting that 'finishes_at' can theoretically be None, in
which case it's unclear when the alert should expire. While alerts
from the Admin app should always have an expiry [1], we have many
in the DB that don't, so it's worth checking for this scenario.

[1]: 078ac10c8d/app/models/broadcast_message.py (L255)
2021-07-21 13:05:11 +01:00
Ben Thorner
cdc150de1b Change link test task to trigger both lambda
This modifies the previous "(_)send_link_test" method to trigger a
link test for a specific lambda. We then call the method with both
the primary and failover lambda in new orchestrator method.

Since the _invoke_lambda function doesn't raise exceptions if it
fails, there's no need to rescue anything in order to ensure the
second link test / invocation runs as well. It doesn't testing for
this, since it boils to an absence of code to raise any exception.

Note that, like the other parent tests, we only check the new method
works with a specific proxy client instance.
2021-07-19 16:00:56 +01:00
Ben Thorner
08f48379b4 Move ID generation into link test method
Unlike the other IDs which are stored in the DB, this isn't relevant
for the Celery task as it invokes a link test. Moving it into the
proxy client will also enable us to generate a second ID in the next
commits, where we start doing a link test for the failover lambda.
2021-07-19 16:00:55 +01:00
Ben Thorner
7fb65761f9 Always log when a lambda is invoked
This replaces the previous single-purpose log about a link test
with a more informative and generic one for all invocations.
2021-07-19 15:45:43 +01:00
Ben Thorner
b6774bf0f7 Generate Vodafone link test sequence nos in proxy
Previously the Celery task to trigger a link test had to know about
the special case of a sequence number for Vodafone. Since we're about
to change the client to perform multiple tests it makes sense to give
it the knowledge of how to generate number itself.

Note that we have to import the db inline to avoid a circular import,
since this module is itself imported by app/__init__.py.

Other invocations of the Vodafone client use stored sequence numbers
from the DB, which are called "message numbers" in that context. Since
the two use cases are very different (even the names are different!),
having them in two places shouldn't cause any confusion.
2021-07-19 15:43:36 +01:00
Leo Hemsted
2ad9a3a380 retry service callbacks on 429
if we're served a 429, put the item on the retry queue and retry the
same as if the service returned a 5xx. 429 is commonly returned for rate
limit exceeding, and retrying on a delay is a typical response to that.
2021-07-13 16:09:17 +01:00