previously we made some incorrect assumptions about set-up on staging
and prod - they currently don't have any cbc_proxy aws creds at all.
We shoudn't be attempting canaries or link tests when there's no AWS
infrastructure to connect to.
We also shouldn't bother writing a row into the database at all for the
broadcast_provider_message since we're not even attempting to send, and
we shouldn't get confused between messages that failed and messages we
never wanted to send at all.
At the moment we log everytime we get a bounce from SES, however we
don't link it to a particular notification so it's hard to know for what
sub reason a notifcation did not deliver by looking at the logs.
This commit changes this by now looking the bounce reason after we have
found the notification ID and including them together. So if you know
search for a notification ID in Kibana, you will see full logs for why
it failed to deliver.
this is a pretty big and convoluted refactor unfortunately.
Previously:
There was one global `cbc_proxy_client` object in apps. This class has
the information about how to invoke the bt-ee lambda, and handles all
calls to lambda. This includes calls to the canary too (which is a
separate lambda).
The future:
There's one global `cbc_proxy_client`. This knows about the different
provider functions and lambdas, and you'll need to ask this client for a
proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')`
and it'll return you a proxy that knows what ee's lambda function is,
how to transform any content in a way that is exclusive to ee, and in
future how to parse any response from ee.
The present:
I also cleaned up some duplicate tests.
I'm really not sure about the names of some of these variables - in
particular `cbc_proxy_client` isn't a client - it's more of a java style
factory, where you call a function on it to get the client of your
choice.
replacing get_earlier_provider_messages. The old function returned the
previous references for earlier events for a broadcast_message. However,
these depend on the message sent to a specific provider, so the function
needs to change. It now takes in a provider, and only returns
broadcast_provider_messages sent to that provider. If there are earlier
broadcast_events without a provider_message for the chosen provider, it
raises an exception - you cannot cancel a message if all the previous
events have not been created properly (as we wouldn't know what
references to cancel).
(instead of using the id from broadcast_event)
we need every XML blob we send to have a different ID. if we're sending
different XML blobs for each provider, then each one should have a
different identifier. So, instead of taking the identifier from the
broadcast_event, take it from the broadcast_provider_message instead.
Note: We're still going to the broadcast_event for most fields, to
ensure they stay consistent between different providers. The last thing
we want is for different phone networks to get different content
at the moment only EE is enabled (this is set in app.config, but also,
only EE have a function defined for them so even if another provider was
enabled without changing the dict in cbc_proxy.py we won't trigger
anything). this commit just adds wrapper tasks that check what providers
are enabled, and invokes the send function for each provider.
The send function doesn't currently distinguish between providers for
now - as we only have EE set up. in the future we'll want to separate
the cbc_proxy_client into separate clients for separate providers.
Different providers have different lambda functions, and have different
requirements. For example, we know that the two different CBC software
solutions handle references to previous messages differently.
We don't retry any callbacks when it receives a 4xx status. We should
probably be aware of this happening and at the moment there is nothing
in our logs to easily identify whether the request failed and is being
retried or if it failed and is not being retried. This will enable us to
search our logs easily and figure out how much it's happening.
It's quite likely that we should in the future allow callbacks to retry
if they get a 429 http response (rate limiting) but we should do this in
a smart way (exponential backoff) and so this is a first step to being
aware of how big a problem it is in case we want to do something about
it.
A BroadcastEvent knows when an event was sent and should expire
We pass through these values directly to the CBC Proxy, because
BroadcastEvent knows how they should be formatted
Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
"areas" and "simple_polygons" in "transmitted_areas" do not have the
same length
as an example, choosing the area "england" results in a single item in
"areas" but many polygons in "simple_polygons"
therefore zipping these two together gives a list of areas:
* of length 1
* containing only new grimsby
which is not what we want
as the CBC does not care about the areaDesc field within CAP, we should
omit it from the function invocation and delegate the contents of
areaDesc to the CBC Proxy implementation
Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>
Co-authored-by: David <david.mcdonald@digital.cabinet-office.gov.uk>
We are phasing out our cbc-proxy stub which displayed CAP XML messages
We are in the process of testing with real CBCs, so maintaining our own
stub is not useful
This commit
* removes the HTTP POST requests to the CBC proxy
* writes up the update/cancel methods of the cbc_client (not impl)
Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
previously we were returning the entire ORM object. Returning columns
has a couple of benefits:
* Means we can join on to services there and then, avoiding second
queries to get the crown status of the service later in the collate
flow.
* Massively reduces the amount of data we return - particularly free
text fields like personalisation that could be potentially quite big.
5 columns rather than 26 columns.
* Minor thing, but will skip some CPU cycles as sqlalchemy will no
longer construct an ORM object and try and keep track of changes. We
know this function doesn't change any of the values to persist them
back, so this is an unnecessary step from sqlalchemy.
Disadvantages are:
* The dao_get_letters_to_be_printed return interface is now much more
tightly coupled to the get_key_and_size_of_letters_to_be_sent_to_print
function that calls it.
we had issues where we had 150k 2nd class notifications, and the collate
task never ran properly, presumably because the volume of data being
returned was too big.
to try and help with this, we can switch to streaming rather than using
`.all` and building up lists of data. This should help, though the
initial query may be a problem still.
This is at request of DVLA. They would prefer to have zip files with the
same number of arguments in the name. After being offered a few
different options, such as including an org and service id for all zips,
they chose to just remove the 'INSOLVENCY' tag.
For more context see PR that added the tag
https://github.com/alphagov/notifications-api/pull/3006
When letters are sent to DVLA, we will now put them in a separate
ZIP file for each service, so that if there are printing issues
due to bad files from one service, other services will hopefully
not be affected by that.
When we create a broadcast message, we should invoke the cbc proxy to
send a cap message
Either a function will be invoked within AWS, or a noop function call
is made, depending on the environment
We have only implemented CB message creation in the CBC Proxy, without
polygons, therefore we:
* only invoke the CBC Proxy during message creation
* only send description, identifier, and hard-coded headline
Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>
Co-authored-by: Katie <katie.smith@digital.cabinet-office.gov.uk>
For every missing row this was:
- downloading the CSV file from S3
- looping through every row in it until it found the one matching the
index of the missing row
`RecipientCSV` implements `__getitem__`[1] (which maybe it didn’t
before) so we can create it once, then index the relevant row directly.
***
1. 5ae0572d41/notifications_utils/recipients.py (L78-L79)
we don't name letters based on the day we send them on, rather, the day
we create them on. If we process a letter for a second time for whatever
reason, even if it's a couple of days later, it'll still go in a folder
based on the created_at timestamp. There's still a slight confusion,
however - if the timestamp is after 5:30pm, the folder will be for the
day after. However, still the day after creation, so I think created_at
still makes the most sense.
Remove the term `sending_date` to try and make this relationship more
apparent.
`_now`? why would we ever use a different _now? instead say created_at,
because that's what it'll always be set to, even if we're replaying old
letters. We always set the folder name to when the letter was
created_at, or we might not know where to look to find it.
`dont_use_sending_date` doesn't really tell us what might happen if we
don't use it - the answer is we return an empty string. we ignore the
folder entirely. so lets call it that.
Also, remove use of freeze_gun in the tests, to prove that we don't use
the current time in any calculations. Also add an assert to a mock in
the get_pdf_for_templated_letter test, because we were mocking but not
asserting before, so the tests didn't fail when the function signature
changed.
We were determing the filename for precompiled letters before we had
checked if the letters were international. This meant that a letter
could have a filename indicating it was 2nd class, but once we had
sanitised the letter and checked the address we were setting the
notification to international.
This stopped these letters from being picked up to be sent to the DVLA,
since the filename and postage of the letter did not match.
We now regenerate the filename after the letter has been sanitised (and when
we know the postage) and use the updated filename when moving the letter
into the live PDF letters bucket.
use the new endpoint from cbc proxy. create a new task that just
serializes the event and sends it across rather than sending a template
and the broadcast message.
some changes to serialize to make it json friendly etc. it also expects
sent_at and transmitted_finishes_at to always be set (we set them in the
code but don't enforce it n the DB right now), as they're required by
utils template. not sure whether we'll update db constraints to be more
strict or utils template to be more permissive just yet, wait until we
find out more about the requirements of the CBCs we integrate with.
We have hit throttling limits from SES approximately once a week during
a spike of traffic from GOV.UK. The rate limiting usually only lasts a
couple of minutes but generates enough exceptions to cause a p1 but with
no potential action for the responder.
Therefore we downgrade the warning for this case to a warning and assume
traffic will level back out such that the problem resolves itself.
Note, we will still get exceptions if we go over our daily limit, rather
than our per minute sending limit, which does require immediate action
by someone responding.
If we were to continually go over our per second sending rate for a long
continous period of time, then there is a chance we may not be aware but
given the risk of this happening is low I think it's an acceptable risk
for the moment.
The task was raising a JobIncompleteError, yet it's not an error the task is performing it's task correctly and calling the appropriate task to restart the job.
Also used apply_sync to create the task instead of send_task.
This will help us better understand how far through the task has got if
it gets interrupted halfway (as was the case this morning and we
struggled to understand).
solves `AttributeError: 'DummySession' object has no attribute 'query'`
if you don't do this you get really hard to diagnose errors in unrelated
tests, due to strange import order problems or something
task takes a brodcast_message_id, and makes a post to the cbc-proxy
for now, hardcode the url to the notify stub. the stub requires template
as the admin/api get it, so use the marshmallow schema to json dump it.
Note - this also required us to tweak the BroadcastMessage.serialize
function so that it converts uuids in to ids - flask's jsonify function
does that for free but requests.post doesn't sadly.
if the request fails (either 4xx or 5xx) just raise an exception and let
it bubble up for now - in the future we'll add retry logic