2020-10-29 11:12:28 +00:00
|
|
|
import uuid
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
from datetime import datetime
|
2020-10-29 11:12:28 +00:00
|
|
|
|
2020-07-09 18:22:29 +01:00
|
|
|
from flask import current_app
|
|
|
|
|
from notifications_utils.statsd_decorators import statsd
|
2020-12-04 15:49:50 +00:00
|
|
|
from sqlalchemy.schema import Sequence
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
from celery.exceptions import MaxRetriesExceededError
|
2020-07-09 18:22:29 +01:00
|
|
|
|
2020-12-04 15:49:50 +00:00
|
|
|
from app import cbc_proxy_client, db, notify_celery
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
from app.clients.cbc_proxy import CBCProxyFatalException, CBCProxyRetryableException
|
2020-10-29 11:12:28 +00:00
|
|
|
from app.config import QueueNames
|
2021-02-01 16:37:22 +00:00
|
|
|
from app.models import BroadcastEventMessageType, BroadcastProvider, BroadcastProviderMessageStatus
|
|
|
|
|
from app.dao.broadcast_message_dao import (
|
|
|
|
|
dao_get_broadcast_event_by_id,
|
|
|
|
|
create_broadcast_provider_message,
|
|
|
|
|
update_broadcast_provider_message_status
|
|
|
|
|
)
|
2020-07-09 18:22:29 +01:00
|
|
|
|
2020-12-04 16:00:20 +00:00
|
|
|
from app.utils import format_sequential_number
|
|
|
|
|
|
2020-07-09 18:22:29 +01:00
|
|
|
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
def get_retry_delay(retry_count):
|
|
|
|
|
"""
|
|
|
|
|
Given a count of retries so far, return a delay for the next one.
|
|
|
|
|
`retry_count` should be 0 the first time a task fails.
|
|
|
|
|
"""
|
|
|
|
|
# TODO: replace with celery's built in exponential backoff
|
|
|
|
|
|
|
|
|
|
# 2 to the power of x. 1, 2, 4, 8, 16, 32, ...
|
|
|
|
|
delay = 2**retry_count
|
reduce broadcast retry delay to 4 mins and drop prefetch.
### The facts
* Celery grabs up to 10 tasks from an SQS queue by default
* Each broadcast task takes a couple of seconds to execute, or double
that if it has to go to the failover proxy
* Broadcast tasks delay retry exponentially, up to 300 seconds.
* Tasks are acknowledged when celery starts executing them.
* If a task is not acknowledged before its visibility timeout of 310
seconds, sqs assumes the celery app has died, and puts it back on the
queue.
### The situation
A task stuck in a retry loop was reaching its visbility timeout, and as
such SQS was duplicating it. We're unsure of the exact cause of reaching
its visibility timeout, but there were two contributing factors: The
celery prefetch and the delay of 300 seconds. Essentially, celery grabs
the task, keeps an eye on it locally while waiting for the delay ETA to
come round, then gives the task to a worker to do. However, that worker
might already have up to ten tasks that it's grabbed from SQS. This
means the worker only has 10 seconds to get through all those tasks and
start working on the delayed task, before SQS moves the task back into
available.
(Note that the delay of 300 seconds is translated into a timestamp based
on the time you called self.retry and put the task back on the queue.
Whereas the visibility timeout starts ticking from the time that a
celery worker picked up the task.)
### The fix
#### Set the max retry delay for broadcast tasks to 240 seconds
Setting the max delay to 240 seconds means that instead of a 10 second
buffer before the visibility timeout is tripped, we've got a 70 second
buffer.
#### Set the prefetch limit to 1 for broadcast workers
This means that each worker will have up to 1 currently executing task,
and 1 task pending execution. If it has these, it won't grab any more
off the queue, so they can sit there without their visibility timeout
ticking up.
Setting a prefetch limit to 1 will result in more queries to SQS and a
lower throughput. This might be relevant in, eg, sending emails. But the
broadcast worker is not hyper-time critical.
https://docs.celeryproject.org/en/3.1/getting-started/brokers/sqs.html?highlight=acknowledge#caveats
https://docs.celeryproject.org/en/3.1/userguide/optimizing.html?highlight=prefetch#reserve-one-task-at-a-time
2021-02-05 12:19:44 +00:00
|
|
|
# never wait longer than 4 minutes
|
|
|
|
|
return min(delay, 240)
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
|
|
|
|
|
|
2021-02-01 18:54:01 +00:00
|
|
|
def check_provider_message_should_send(broadcast_event, provider):
|
|
|
|
|
"""
|
|
|
|
|
If any previous event hasn't sent yet for that provider, then we shouldn't send the current event. Instead, fail and
|
|
|
|
|
raise a P1 - so that a notify team member can assess the state of the previous messages, and if necessary, can
|
|
|
|
|
replay the `send_broadcast_provider_message` task if the previous message has now been sent.
|
|
|
|
|
|
|
|
|
|
Note: This is called before the new broadcast_provider_message is created.
|
|
|
|
|
|
|
|
|
|
# Help, I've come across this code following a pagerduty alert, what should I do?
|
|
|
|
|
|
|
|
|
|
1. Find the failing broadcast_provider_message associated with the previous event that caused this to trip.
|
|
|
|
|
2. If that provider message is still failing to send, fix the issue causing that. The task to send that previous
|
|
|
|
|
message might still be retrying in the background - look for logs related to that task.
|
|
|
|
|
3. If that provider message has sent succesfully, you might need to send this task off depending on context. This
|
|
|
|
|
might not always be true though, for example, it may not be necessary to send a cancel if the original alert has
|
|
|
|
|
already expired.
|
|
|
|
|
4. If you need to re-send this task off again, you'll need to run the following command on paas:
|
|
|
|
|
`send_broadcast_provider_message.apply_async(args=(broadcast_event_id, provider), queue=QueueNames.BROADCASTS)`
|
|
|
|
|
"""
|
|
|
|
|
if broadcast_event.transmitted_finishes_at < datetime.utcnow():
|
|
|
|
|
# TODO: This should be a different kind of exception to distinguish "We should know something went wrong, but
|
|
|
|
|
# no immediate action" from "We need to fix this immediately"
|
|
|
|
|
raise CBCProxyFatalException(
|
|
|
|
|
f'Cannot send broadcast_event {broadcast_event.id} ' +
|
|
|
|
|
f'to provider {provider}: ' +
|
|
|
|
|
f'The expiry time of {broadcast_event.transmitted_finishes_at} has already passed'
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
)
|
|
|
|
|
|
2021-02-01 18:54:01 +00:00
|
|
|
# get events sorted from earliest to latest
|
|
|
|
|
events = sorted(broadcast_event.broadcast_message.events, key=lambda x: x.sent_at)
|
|
|
|
|
|
|
|
|
|
for prev_event in events:
|
|
|
|
|
if prev_event.id != broadcast_event.id and prev_event.sent_at < broadcast_event.sent_at:
|
|
|
|
|
# get the record from when that event was sent to the same provider
|
|
|
|
|
prev_provider_message = prev_event.get_provider_message(provider)
|
|
|
|
|
|
|
|
|
|
# the previous message hasn't even got round to running `send_broadcast_provider_message` yet.
|
|
|
|
|
if not prev_provider_message:
|
|
|
|
|
raise CBCProxyFatalException(
|
|
|
|
|
f'Cannot send {broadcast_event.id}. Previous event {prev_event.id} ' +
|
|
|
|
|
f'(type {prev_event.message_type}) has no provider_message for provider {provider} yet.\n' +
|
|
|
|
|
f'You must ensure that the other event sends succesfully, then manually kick off this event ' +
|
|
|
|
|
f'again by re-running send_broadcast_provider_message for this event and provider.'
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# if there's a previous message that has started but not finished sending (whether it fatally errored or is
|
|
|
|
|
# currently retrying)
|
|
|
|
|
if prev_provider_message.status != BroadcastProviderMessageStatus.ACK:
|
|
|
|
|
raise CBCProxyFatalException(
|
|
|
|
|
f'Cannot send {broadcast_event.id}. Previous event {prev_event.id} ' +
|
|
|
|
|
f'(type {prev_event.message_type}) has not finished sending to provider {provider} yet.\n' +
|
|
|
|
|
f'It is currently in status "{prev_provider_message.status}".\n' +
|
|
|
|
|
f'You must ensure that the other event sends succesfully, then manually kick off this event ' +
|
|
|
|
|
f'again by re-running send_broadcast_provider_message for this event and provider.'
|
|
|
|
|
)
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
|
|
|
|
|
|
2020-08-04 19:21:22 +01:00
|
|
|
@notify_celery.task(name="send-broadcast-event")
|
|
|
|
|
@statsd(namespace="tasks")
|
2020-10-23 17:54:33 +01:00
|
|
|
def send_broadcast_event(broadcast_event_id):
|
2020-11-25 17:39:32 +00:00
|
|
|
if not current_app.config['CBC_PROXY_ENABLED']:
|
|
|
|
|
current_app.logger.info(f'CBC Proxy disabled, not sending broadcast_event {broadcast_event_id}')
|
|
|
|
|
return
|
|
|
|
|
|
2020-12-02 14:10:46 +00:00
|
|
|
broadcast_event = dao_get_broadcast_event_by_id(broadcast_event_id)
|
2020-12-03 17:44:32 +00:00
|
|
|
for provider in broadcast_event.service.get_available_broadcast_providers():
|
|
|
|
|
send_broadcast_provider_message.apply_async(
|
|
|
|
|
kwargs={'broadcast_event_id': broadcast_event_id, 'provider': provider},
|
2021-01-13 17:21:40 +00:00
|
|
|
queue=QueueNames.BROADCASTS
|
2020-12-03 17:44:32 +00:00
|
|
|
)
|
2020-10-29 11:12:28 +00:00
|
|
|
|
|
|
|
|
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
# max_retries=None: retry forever
|
|
|
|
|
@notify_celery.task(bind=True, name="send-broadcast-provider-message", max_retries=None)
|
2020-10-29 11:12:28 +00:00
|
|
|
@statsd(namespace="tasks")
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
def send_broadcast_provider_message(self, broadcast_event_id, provider):
|
2020-08-04 19:21:22 +01:00
|
|
|
broadcast_event = dao_get_broadcast_event_by_id(broadcast_event_id)
|
|
|
|
|
|
2021-02-01 18:54:01 +00:00
|
|
|
check_provider_message_should_send(broadcast_event, provider)
|
|
|
|
|
|
|
|
|
|
# the broadcast_provider_message may already exist if we retried previously
|
2021-02-01 11:24:19 +00:00
|
|
|
broadcast_provider_message = broadcast_event.get_provider_message(provider)
|
|
|
|
|
if broadcast_provider_message is None:
|
|
|
|
|
broadcast_provider_message = create_broadcast_provider_message(broadcast_event, provider)
|
|
|
|
|
|
2020-12-04 17:45:23 +00:00
|
|
|
formatted_message_number = None
|
|
|
|
|
if provider == BroadcastProvider.VODAFONE:
|
2020-12-09 11:41:22 +00:00
|
|
|
formatted_message_number = format_sequential_number(broadcast_provider_message.message_number)
|
2020-11-16 18:48:00 +00:00
|
|
|
|
2020-10-23 17:54:33 +01:00
|
|
|
current_app.logger.info(
|
|
|
|
|
f'invoking cbc proxy to send '
|
|
|
|
|
f'broadcast_event {broadcast_event.reference} '
|
|
|
|
|
f'msgType {broadcast_event.message_type}'
|
|
|
|
|
)
|
2020-10-20 11:57:26 +01:00
|
|
|
|
2020-10-23 17:54:33 +01:00
|
|
|
areas = [
|
2020-10-26 15:16:33 +00:00
|
|
|
{"polygon": polygon}
|
|
|
|
|
for polygon in broadcast_event.transmitted_areas["simple_polygons"]
|
2020-10-23 17:54:33 +01:00
|
|
|
]
|
2020-10-23 16:44:11 +01:00
|
|
|
|
2021-01-29 14:14:15 +00:00
|
|
|
channel = "test"
|
|
|
|
|
if broadcast_event.service.broadcast_channel:
|
|
|
|
|
channel = broadcast_event.service.broadcast_channel
|
|
|
|
|
|
2020-11-17 12:35:22 +00:00
|
|
|
cbc_proxy_provider_client = cbc_proxy_client.get_proxy(provider)
|
|
|
|
|
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
try:
|
|
|
|
|
if broadcast_event.message_type == BroadcastEventMessageType.ALERT:
|
|
|
|
|
cbc_proxy_provider_client.create_and_send_broadcast(
|
|
|
|
|
identifier=str(broadcast_provider_message.id),
|
|
|
|
|
message_number=formatted_message_number,
|
|
|
|
|
headline="GOV.UK Notify Broadcast",
|
|
|
|
|
description=broadcast_event.transmitted_content['body'],
|
|
|
|
|
areas=areas,
|
|
|
|
|
sent=broadcast_event.sent_at_as_cap_datetime_string,
|
|
|
|
|
expires=broadcast_event.transmitted_finishes_at_as_cap_datetime_string,
|
|
|
|
|
channel=channel
|
|
|
|
|
)
|
|
|
|
|
elif broadcast_event.message_type == BroadcastEventMessageType.UPDATE:
|
|
|
|
|
cbc_proxy_provider_client.update_and_send_broadcast(
|
|
|
|
|
identifier=str(broadcast_provider_message.id),
|
|
|
|
|
message_number=formatted_message_number,
|
|
|
|
|
headline="GOV.UK Notify Broadcast",
|
|
|
|
|
description=broadcast_event.transmitted_content['body'],
|
|
|
|
|
areas=areas,
|
|
|
|
|
previous_provider_messages=broadcast_event.get_earlier_provider_messages(provider),
|
|
|
|
|
sent=broadcast_event.sent_at_as_cap_datetime_string,
|
|
|
|
|
expires=broadcast_event.transmitted_finishes_at_as_cap_datetime_string,
|
|
|
|
|
# We think an alert update should always go out on the same channel that created the alert
|
|
|
|
|
# We recognise there is a small risk with this code here that if the services channel was
|
|
|
|
|
# changed between an alert being sent out and then updated, then something might go wrong
|
|
|
|
|
# but we are relying on service channels changing almost never, and not mid incident
|
|
|
|
|
# We may consider in the future, changing this such that we store the channel a broadcast was
|
|
|
|
|
# sent on on the broadcast message itself and pick the value from there instead of the service
|
|
|
|
|
channel=channel
|
|
|
|
|
)
|
|
|
|
|
elif broadcast_event.message_type == BroadcastEventMessageType.CANCEL:
|
|
|
|
|
cbc_proxy_provider_client.cancel_broadcast(
|
|
|
|
|
identifier=str(broadcast_provider_message.id),
|
|
|
|
|
message_number=formatted_message_number,
|
|
|
|
|
previous_provider_messages=broadcast_event.get_earlier_provider_messages(provider),
|
|
|
|
|
sent=broadcast_event.sent_at_as_cap_datetime_string,
|
|
|
|
|
)
|
|
|
|
|
except CBCProxyRetryableException as exc:
|
2021-02-01 18:54:01 +00:00
|
|
|
delay = get_retry_delay(self.request.retries)
|
|
|
|
|
current_app.logger.exception(
|
|
|
|
|
f'Retrying send_broadcast_provider_message for broadcast_event {broadcast_event_id} and ' +
|
|
|
|
|
f'provider {provider} in {delay} seconds'
|
2021-02-01 16:37:22 +00:00
|
|
|
)
|
|
|
|
|
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
self.retry(
|
|
|
|
|
exc=exc,
|
2021-02-01 18:54:01 +00:00
|
|
|
countdown=delay,
|
retry sending broadcasts
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
2021-01-19 10:05:48 +00:00
|
|
|
queue=QueueNames.BROADCASTS,
|
2020-10-23 17:54:33 +01:00
|
|
|
)
|
2020-10-29 11:12:28 +00:00
|
|
|
|
2021-02-01 16:37:22 +00:00
|
|
|
update_broadcast_provider_message_status(broadcast_provider_message, status=BroadcastProviderMessageStatus.ACK)
|
|
|
|
|
|
2020-10-29 11:12:28 +00:00
|
|
|
|
|
|
|
|
@notify_celery.task(name='trigger-link-test')
|
|
|
|
|
def trigger_link_test(provider):
|
|
|
|
|
identifier = str(uuid.uuid4())
|
2020-12-04 16:00:20 +00:00
|
|
|
formatted_seq_number = None
|
2020-12-04 15:49:50 +00:00
|
|
|
if provider == BroadcastProvider.VODAFONE:
|
|
|
|
|
sequence = Sequence('broadcast_provider_message_number_seq')
|
|
|
|
|
sequential_number = db.session.connection().execute(sequence)
|
2020-12-04 16:00:20 +00:00
|
|
|
formatted_seq_number = format_sequential_number(sequential_number)
|
2020-10-29 11:12:28 +00:00
|
|
|
message = f"Sending a link test to CBC proxy for provider {provider} with ID {identifier}"
|
|
|
|
|
current_app.logger.info(message)
|
2020-12-04 16:00:20 +00:00
|
|
|
cbc_proxy_client.get_proxy(provider).send_link_test(identifier, formatted_seq_number)
|