otherwise we run into issues where we dont issue the cancel as we say
"oh look the expiry time just passed, so we shouldnt send this message
as it's already been removed from the cbc".
previously if we were deciding whether to retry or not, it meant that
future events wouldn't have context of what the task is doing. We'd
run into issues with not knowing what references to include when
updating/cancelling in future events.
Instead of deciding whether to retry or not, always retry. Instead, when
any event sends, regardless of whether it is a first time or a retry,
check the status of previous events for that broadcast message. There
are a few things that will mean we don't send.
* If the finishes_at time has already elapsed (ie: we have been trying
to resend this message and haven't had any luck and now the data is
obselete)
* A previous event has no provider message (this means that we never
picked the previous event off the queue for some reason)
* A previous event has a provider message that has anything other than
an ack response. This includes sending (the old message is currently
being sent), and technical-failure/returned-error (the old message is
currently in the retry loop, having experienced issues).
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
We already trigger a zendesk ticket for these two cases, meaning that
whenever we get this situation, we get 3 emails. One for the zendesk
ticket, one from logit raising the fact an exception was raised and one
from cloudwatch raising the fact an exception was raised.
We don't need all these emails, a zendesk ticket is sufficient.
Downgrading to a warning means this event will still be findable in our
logs however.
This falls back to the "test" channel if they do not have a
ServiceBroadcastSetting for the moment, but we intend in future PRs to
enforce that all broadcast services will have this property.
This moves the hardcoding to test channels one step up to where we call
`create_and_send_broadcast`
We can then after this, start to differ whether we give it the 'test' or
'severe' channel based on the services channel setting.
This will allow us to store details of which channel a service should be
sending to.
See the comment about how all broadcast services can have a row in the
table but may not at the moment. This has been done for speed as it's
the quickest way to let us set up different services to send to
different channels for some needed testing with the mobile handset
providers in the coming week.
This used to be hardcoded in the CBC proxy but now we will hardcode it
in the cbc_proxy_client.
In a future PR we can start choosing which channel a broadcast will go
to based on the channel configured for that broadcast service.
This is a boolean column. It will be set to True for broadcasts
created from training broadcast accounts.
This will help us debug, for example by excluding all the stubbed
broadcasts when we have some trouble with real broadcasts.
We’re going to let people pass in fairly complex polygons, but:
- we don’t want to store massive polygons
- we don’t want to pass the CBCs massive polygons
So this commit adds a step to simplify the polygons before storing them.
We think it’s best for us to do this because:
- writing code to do polygon simplification is non-trivial, and we don’t
want to make all potential integrators do it
- the simplification we’ve developed is domain-specific to emergency
alerting, so should throw away less information than
There’s a bit more detail about how we simplify polygons in
https://github.com/alphagov/notifications-admin/pull/3590/files
This gives us some extra confidence that there aren’t any problems with
the data we’re getting from the other service. It doesn’t address any
specific problems we’ve seen, rather it seems like a sensible precaution
to take.
This commit makes the existing endpoint also accept CAP XML, should the
appropriate `Content-Type` header be set.
It uses the translation code we added in a previous commit to convert
the CAP to a dict. We can then validate that dict against with the JSON
schema to ensure it’s something we can work with.
Other systems we’re working with won’t easily be adapted to emit JSON
instead of CAP, so it’s less work for us to do that conversion.
This commit adds to code to parse the XML and turn it into a dict that
we can work with, including converting the polygon string into native
Python lists.
We know there is at least one system which wants to integrate with
Notify to send out emergency alerts, rather than creating them manually.
This commit adds an endpoint to the public API to let them do that.
To start with we’ll just let the system create them in a single call,
meaning they still have to be approved manually. This reduces the risk
of an attacker being able to broadcast an alert via the API, should the
other system be compromised.
We’ve worked with the owners of the other system to define which fields
we should care about initially.
We want to rename the `bt-ee-1-proxy` lambda function to `ee-1-proxy`.
This change will need to be deployed at the same time that we change
the name of the lambda function in the Terraform.
o2 use One-2-many CBC so we can use the O2M/CAP client.
Once differences between CBCs have been worked out we can consolidate O2M clients to reduce duplication.
Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
So:
'billing_contact_email_address' becomes 'billing_contact_email_addresses'
AND
'billing_contact_name' becomes 'billing_contact_names'
This is to signify that each of those fields can contain numerous
items
This has been added in for speed of development but now we are getting
close to integrating with production systems, we will be turning off
these helpful hacks to reduce the risk of someone sending a real
broadcast to citizens.
Note, platforms are still able to approve broadcasts when their service
is in training mode.