This avoids any issues due to large payloads (e.g. with a lot of
polygons in the 'areas' field). While we may miss part of the log
in such cases, this is more than we get already anyway.
This modifies the previous "(_)send_link_test" method to trigger a
link test for a specific lambda. We then call the method with both
the primary and failover lambda in new orchestrator method.
Since the _invoke_lambda function doesn't raise exceptions if it
fails, there's no need to rescue anything in order to ensure the
second link test / invocation runs as well. It doesn't testing for
this, since it boils to an absence of code to raise any exception.
Note that, like the other parent tests, we only check the new method
works with a specific proxy client instance.
Unlike the other IDs which are stored in the DB, this isn't relevant
for the Celery task as it invokes a link test. Moving it into the
proxy client will also enable us to generate a second ID in the next
commits, where we start doing a link test for the failover lambda.
We want to get the point where we're running link tests for each
lambda independently. The tests weren't checking for the failover
mechanism for link tests, so we can just remove it.
Previously the Celery task to trigger a link test had to know about
the special case of a sequence number for Vodafone. Since we're about
to change the client to perform multiple tests it makes sense to give
it the knowledge of how to generate number itself.
Note that we have to import the db inline to avoid a circular import,
since this module is itself imported by app/__init__.py.
Other invocations of the Vodafone client use stored sequence numbers
from the DB, which are called "message numbers" in that context. Since
the two use cases are very different (even the names are different!),
having them in two places shouldn't cause any confusion.
We only actually use this when the data we're working with is in an
unexpected state, which is unrelated to the CBC Proxy. Using this
name also means we can re-use this exception in the next commits.
Note that we may still care if a broadcast message has expired, since
it's not expected that someone would send one in this condition.
boto returns a `StreamingBody`[1] response rather than a json struct.
We're currently just logging things like "Error calling lambda
o2-1-proxy with function error <botocore.response.StreamingBody object
at 0x7f74cd6e02e8>" which is obviously less than ideal. Also make the
tests properly reflect this - annoyingly it appears like we can't use
moto to reliably test this interface as the moto `mock_lambda` decorator
needs you to be running inside a docker container??
[1] https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html#botocore.response.StreamingBody
these will happen if, for example, you have issues connecting to AWS or
permission issues.
Still failover if we get one of these exceptions, as I think it might be
possible to have a problem only related to one of the lambdas.
By creating a new `CBCProxyOne2ManyClient` class for the three One2Many
clients to inherit from. These three clients are the same apart from the
`lambda_name` and the `failover_lambda_name`.
Retry tasks if they fail to send a broadcast event. Note that each task
tries the regular proxy and the failover proxy for that provider. This
runs a bit differently than our other retries:
Retry with exponential backoff. Our other tasks retry with a fixed delay
of 5 minutes between tries. If we can't send a broadcast, we want to try
immediately. So instead, implement an exponential backoff (1, 2, 4, 8,
... seconds delay). We can't delay for longer than 310 seconds due to
visibility timeout settings in SQS, so cap the delay at that amount.
Normally we give up retrying after a set amount of retries (often 4
hours). As broadcast content is much more important than normal
notifications, we don't ever want to give up on sending them to phones...
...UNLESS WE DO!
Sometimes we do want to give up sending a broadcast though! Broadcasts
have an expiry time, when they stop showing up on peoples devices, so if
that has passed then we don't need to send the broadcast out.
Broadcast events can also be superceded by updates or cancels. Check
that the event is the most recent event for that broadcast message, if
not, give up, as we don't want to accidentally send out two conflicting
events for the same message.
This moves the hardcoding to test channels one step up to where we call
`create_and_send_broadcast`
We can then after this, start to differ whether we give it the 'test' or
'severe' channel based on the services channel setting.
This used to be hardcoded in the CBC proxy but now we will hardcode it
in the cbc_proxy_client.
In a future PR we can start choosing which channel a broadcast will go
to based on the channel configured for that broadcast service.
We want to rename the `bt-ee-1-proxy` lambda function to `ee-1-proxy`.
This change will need to be deployed at the same time that we change
the name of the lambda function in the Terraform.
o2 use One-2-many CBC so we can use the O2M/CAP client.
Once differences between CBCs have been worked out we can consolidate O2M clients to reduce duplication.
Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
No point trying, it's the same lambda. As `_invoke_lambda` currently
takes a bytes payload, rather than a json payload, it meant the decision
between encoding the payload in the canary or moving the encoding into
the `_invoke_lambda` function. We decided to go for the former as the
lesser of two evils. We may end up doing the encoding twice in the case
of a failover but this avoids us having to put the encoding in our code
in several places (for example the canary and also soon to be the link
tests).
For all FunctionErrors, and for invoke errors (status > 299) we
want to retry with failover lambda.
We are doing this, because if there is a connection or other error
with one lambda, the failover lambda may still work and it's
worth trying.
With time, we will probably have more complex retry flow, depending
on the error and even maybe differing for each MNO (broadcast provider).
We will need a lambda to failover to if the first lambda fails. This
isn't so much a case of the lambda itself failing, as it is a cross
availability zone resource automatically, it's more in case something in
the networking goes down in our AZ and therefore the lambda can't call
out to the CBC. In this case, we will be able to swap to using the
second AZ by calling the second lambda.
This is to prepare us for where when we try and send/cancel a broadcast
we may need to invoke more than one lambda. This might happen if we call
the invoke the first lambda, we get an error and therefore we try and
invoke a failover/second lambda. Then `_invoke_lambda` will be
responsible for the call to AWS whereas `_invoke_lambda_with_failover`
will be responsible more for picking the lambda and deciding on retry
behaviour if failure cases.
This will make it impossible to create a new client without at least
having to define these properties. Which should get someone thinking
about language support…
If we’re sending non-GSM characters, we need to mark the language in the
XML as Welsh (`cy-GB` in CAP, `Welsh` in IBAG).
Currently, the CBC proxy checks the content we’re sending, and then uses
an approximation based on ASCII to determine whether we’re sending any
non-GSM characters, and if so, sets the language appropriately.
Instead, we should can functionality from the notifications-utils repo
to determine the language. If any non-GSM characters are used, then the
we can set the language to Welsh.
We’ll need to update the proxy to look at this new language flag.
In IBAG format for broadcasts, we need to give sequential number
of previous message, and it needs to be formatted as a hex padded
with zeroes to be 8 character long.
This commit adds the necessary formatting.
previously we made some incorrect assumptions about set-up on staging
and prod - they currently don't have any cbc_proxy aws creds at all.
We shoudn't be attempting canaries or link tests when there's no AWS
infrastructure to connect to.
We also shouldn't bother writing a row into the database at all for the
broadcast_provider_message since we're not even attempting to send, and
we shouldn't get confused between messages that failed and messages we
never wanted to send at all.
this is a pretty big and convoluted refactor unfortunately.
Previously:
There was one global `cbc_proxy_client` object in apps. This class has
the information about how to invoke the bt-ee lambda, and handles all
calls to lambda. This includes calls to the canary too (which is a
separate lambda).
The future:
There's one global `cbc_proxy_client`. This knows about the different
provider functions and lambdas, and you'll need to ask this client for a
proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')`
and it'll return you a proxy that knows what ee's lambda function is,
how to transform any content in a way that is exclusive to ee, and in
future how to parse any response from ee.
The present:
I also cleaned up some duplicate tests.
I'm really not sure about the names of some of these variables - in
particular `cbc_proxy_client` isn't a client - it's more of a java style
factory, where you call a function on it to get the client of your
choice.
replacing get_earlier_provider_messages. The old function returned the
previous references for earlier events for a broadcast_message. However,
these depend on the message sent to a specific provider, so the function
needs to change. It now takes in a provider, and only returns
broadcast_provider_messages sent to that provider. If there are earlier
broadcast_events without a provider_message for the chosen provider, it
raises an exception - you cannot cancel a message if all the previous
events have not been created properly (as we wouldn't know what
references to cancel).
at the moment only EE is enabled (this is set in app.config, but also,
only EE have a function defined for them so even if another provider was
enabled without changing the dict in cbc_proxy.py we won't trigger
anything). this commit just adds wrapper tasks that check what providers
are enabled, and invokes the send function for each provider.
The send function doesn't currently distinguish between providers for
now - as we only have EE set up. in the future we'll want to separate
the cbc_proxy_client into separate clients for separate providers.
Different providers have different lambda functions, and have different
requirements. For example, we know that the two different CBC software
solutions handle references to previous messages differently.
moved the lambda invocation to a separate function to keep DRY
asserts on exception types need to be outside of with blocks, or they
won't trip (as the exception will stop execution of the inner with
block). the asserts were also the wrong way round so fixed that.