Commit Graph

4341 Commits

Author SHA1 Message Date
Leo Hemsted
fd335e3d8b move available provider logic to the service model
make sure it's in an accessible place so we don't end up duplicating our
work
2020-12-03 22:50:50 +00:00
Leo Hemsted
72f8a15d4f respect service broadcast provider restrictions when sending 2020-12-03 13:39:09 +00:00
Leo Hemsted
0ef063ab14 return allowed_broadcast_provider via get by service id 2020-12-03 12:38:31 +00:00
Leo Hemsted
0bbd00d2a5 return service restrictions from the service endpoint 2020-12-03 12:38:04 +00:00
Leo Hemsted
1a083134fa add service broadcast provider restriction table
some services only send to one provider. This is a platform admin
setting to allow us to test integrations and providers manually without
affecting other broadcasts from different services.

one-to-one - a service can either send to all as normal, or send to only
one provider.
2020-12-03 12:38:04 +00:00
Chris Hill-Scott
682cbc5130 Don’t return jobs sent from contact lists
Now that we’re grouping jobs sent from contact lists within their
parent, they shouldn’t also be listed on the jobs page at the top level.

The jobs page uses the uploads API, not the jobs API, so this commit
makes sure that filtering is happening in the proper place.
2020-12-01 15:26:36 +00:00
Chris Hill-Scott
10e1fe6902 Revert "Don’t return jobs sent from contact lists"
This reverts commit 061c0a0050.
2020-12-01 15:18:32 +00:00
Chris Hill-Scott
5bc8d8609b Merge pull request #2842 from alphagov/dont-return-jobs-from-contact-list
Don’t return jobs sent from contact lists
2020-12-01 14:48:04 +00:00
Chris Hill-Scott
061c0a0050 Don’t return jobs sent from contact lists
Now that we’re grouping jobs sent from contact lists within their
parent, they shouldn’t also be listed on the jobs page at the top level.
2020-12-01 11:56:34 +00:00
David McDonald
bb6e671bd1 Fix comparison of uuid to string
`service.id` is a uuid so will not be matched to anything in
`current_app.config.get('HIGH_VOLUME_SERVICE')` because that is a list
of strings.

This is why we are never falling into the first if statement and having
any metrics for high volume services on our dashboards at the moment.

Note, I had taken the existing line from the `post_notification`
endpoint, but that is using a serialised service which already has the
UUID converted to a string.
2020-12-01 11:16:15 +00:00
David McDonald
988c3a335a Remove old metric for delivery sending times
We no longer need a metric that covers both test and live keys as this
is not useful
2020-11-30 15:12:13 +00:00
David McDonald
b1336c97a4 Tweak sending time metrics to only include live notifications
Changes the high volume and not high volume metrics to both only include
non test notifications. This is because when looking at the grafana
metrics, it was impossible to tell what affect the high volume/non high
volume effect was having vs the test/live notification effect.

This leaves us with no break down of high volume/not high volume sending
times for test notifications but I don't think we really need that.
2020-11-30 15:05:40 +00:00
David McDonald
2110bc3eef Break down how long it takes to send a notification
We currently measure the sending time for all. This commit then breaks
it down into
- test keys and non test keys
- high volume services and non high volume services

Breaking it down into test keys and non test keys is important because
we don't care as much about sending test notifications within 10
seconds, only non test keys so we don't want our graphs to reflect poor
performance if it's just test keys affecting this

Breaking it down into high volume and non high volume will allow us to
easily debug issues with slow sending if they are high volume or non
high volume issues
2020-11-30 12:07:32 +00:00
Leo Hemsted
e2fa0116a0 add CBC_PROXY_ENABLED config flag to control if tasks are triggered
previously we made some incorrect assumptions about set-up on staging
and prod - they currently don't have any cbc_proxy aws creds at all.

We shoudn't be attempting canaries or link tests when there's no AWS
infrastructure to connect to.

We also shouldn't bother writing a row into the database at all for the
broadcast_provider_message since we're not even attempting to send, and
we shouldn't get confused between messages that failed and messages we
never wanted to send at all.
2020-11-26 10:16:22 +00:00
Leo Hemsted
54fecf2182 Merge pull request #3035 from alphagov/broadcast-event-response
Send broadcast events per provider
2020-11-25 10:16:30 +00:00
David McDonald
09759e1fe4 Revert "Turn on SMS and email stubs on staging" 2020-11-24 12:12:53 +00:00
Pea Tyczynska
d57b99e307 Turn on SMS and email stubs on staging
This is done because we will be load testing on staging.
2020-11-23 11:49:16 +00:00
David McDonald
43f1f48093 Add notification ID to SES bounce reason
At the moment we log everytime we get a bounce from SES, however we
don't link it to a particular notification so it's hard to know for what
sub reason a notifcation did not deliver by looking at the logs.

This commit changes this by now looking the bounce reason after we have
found the notification ID and including them together. So if you know
search for a notification ID in Kibana, you will see full logs for why
it failed to deliver.
2020-11-20 14:10:13 +00:00
Leo Hemsted
087cc5053d separate cbc proxy into separate clients
this is a pretty big and convoluted refactor unfortunately.

Previously:

There was one global `cbc_proxy_client` object in apps. This class has
the information about how to invoke the bt-ee lambda, and handles all
calls to lambda. This includes calls to the canary too (which is a
separate lambda).

The future:

There's one global `cbc_proxy_client`. This knows about the different
provider functions and lambdas, and you'll need to ask this client for a
proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')`
and it'll return you a proxy that knows what ee's lambda function is,
how to transform any content in a way that is exclusive to ee, and in
future how to parse any response from ee.

The present:

I also cleaned up some duplicate tests.
I'm really not sure about the names of some of these variables - in
particular `cbc_proxy_client` isn't a client - it's more of a java style
factory, where you call a function on it to get the client of your
choice.
2020-11-19 15:50:37 +00:00
Leo Hemsted
0257774cfa add get_earlier_provider_message fn to broadcast_event
replacing get_earlier_provider_messages. The old function returned the
previous references for earlier events for a broadcast_message. However,
these depend on the message sent to a specific provider, so the function
needs to change. It now takes in a provider, and only returns
broadcast_provider_messages sent to that provider. If there are earlier
broadcast_events without a provider_message for the chosen provider, it
raises an exception - you cannot cancel a message if all the previous
events have not been created properly (as we wouldn't know what
references to cancel).
2020-11-19 15:50:37 +00:00
Leo Hemsted
f12c949ae9 create broadcast_provider_message and use id from that instead
(instead of using the id from broadcast_event)

we need every XML blob we send to have a different ID. if we're sending
different XML blobs for each provider, then each one should have a
different identifier. So, instead of taking the identifier from the
broadcast_event, take it from the broadcast_provider_message instead.

Note: We're still going to the broadcast_event for most fields, to
ensure they stay consistent between different providers. The last thing
we want is for different phone networks to get different content
2020-11-19 15:50:37 +00:00
Leo Hemsted
7cc83e04eb move BroadcastProvider from models.py to config.py
It's not something that is tied to a database table, and was causing
circular import issues
2020-11-19 15:50:37 +00:00
Leo Hemsted
bc3512467b send messages to multiple providers
at the moment only EE is enabled (this is set in app.config, but also,
only EE have a function defined for them so even if another provider was
enabled without changing the dict in cbc_proxy.py we won't trigger
anything). this commit just adds wrapper tasks that check what providers
are enabled, and invokes the send function for each provider.

The send function doesn't currently distinguish between providers for
now - as we only have EE set up. in the future we'll want to separate
the cbc_proxy_client into separate clients for separate providers.
Different providers have different lambda functions, and have different
requirements. For example, we know that the two different CBC software
solutions handle references to previous messages differently.
2020-11-19 15:50:37 +00:00
Leo Hemsted
2e665de46d add broadcast provider message table to DB
we need to track the state of sending to different provider separately
(and trigger them off separately, refer to references separately, etc)
2020-11-19 15:50:37 +00:00
Leo Hemsted
3aa602bd6b Merge pull request #3036 from alphagov/cbc-proxy-refactor
Cbc proxy refactor
2020-11-19 15:50:02 +00:00
Pea Tyczynska
60bd9a6f82 Give providers equal shares of traffic
This is done on a temporary basis for billing-related reasons.
2020-11-19 10:28:42 +00:00
David McDonald
13450c8429 Merge pull request #3032 from alphagov/4xx-callbacks
Log when we don't retry a callback
2020-11-18 12:25:38 +00:00
Leo Hemsted
b72640bf5e refactor cbc proxy and fix tests
moved the lambda invocation to a separate function to keep DRY

asserts on exception types need to be outside of with blocks, or they
won't trip (as the exception will stop execution of the inner with
block). the asserts were also the wrong way round so fixed that.
2020-11-17 13:35:04 +00:00
Leo Hemsted
732c203d3e rename clients to notification_provider_clients
i think it's causing havoc with my attempts to mock stuff in the
`app.clients` directory because it's also accessible at that path. the
name's super vague and doesn't explain what it is anyway
2020-11-17 13:34:58 +00:00
David McDonald
224d9bf35a Log when we don't retry a callback
We don't retry any callbacks when it receives a 4xx status. We should
probably be aware of this happening and at the moment there is nothing
in our logs to easily identify whether the request failed and is being
retried or if it failed and is not being retried. This will enable us to
search our logs easily and figure out how much it's happening.

It's quite likely that we should in the future allow callbacks to retry
if they get a 429 http response (rate limiting) but we should do this in
a smart way (exponential backoff) and so this is a first step to being
aware of how big a problem it is in case we want to do something about
it.
2020-11-17 11:26:32 +00:00
Rebecca Law
171bc74c69 Rename check_character_count method to check_is_message_to_long.
Add different error message for email and text if content is too long.
Use utils version with is_message_too_long method implemented for email templates.
2020-11-09 16:06:57 +00:00
Rebecca Law
5bacfc1df9 Change how we validate the length of templates.
We want to add validation for an email that's too long, that way the user knows why the message is failing. At the moment if an email is too long it will get a technical failure, after the retries fail. This way the email post will get a validation error.

Once this: https://github.com/alphagov/notifications-utils/pull/804 is reverted, we can update the utils version.
2020-11-09 15:54:39 +00:00
Toby Lorne
31f845bbff Revert "Specify sslmode in Cloud Foundry environment variables" 2020-11-09 12:04:40 +00:00
Toby Lorne
cfcc3128c2 db: specify sslmode in Cloud Foundry env
Refer to
https://www.postgresql.org/docs/11/libpq-connect.html#LIBPQ-CONNECT-SSLMODE

GOV.UK PaaS gives us the database URI, and we use the default mode of
postgres auth which prefers a TLS connection instead of a plain TCP
connection

We are now specifying the SSL mode in the URI when establishing our
connection to the database, so that:

* We will not connect to the database via a plaintext connection
* We will verify the database connection against a list of trusted CAs

The RDS CA from which the database's certificate is issued is added into
the Cloud Foundry app container via
925681f19b/manifests/cf-manifest/operations.d/350-diego-cell.yml (L17-L22)

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: David <david.mcdonald@digital.cabinet-office.gov.uk>
2020-11-09 10:47:44 +00:00
Toby Lorne
00a1ba4b41 celery: link test less often
This is causing the disk of the CBCs to fill up quickly, and their
logrotate seems a bit flakey

Reducing the rate will ensure the disks fill up less often

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
2020-11-06 13:37:39 +00:00
Katie Smith
c4075f1fc0 Revert "Tailor message-too-long error message depending on the notification type" 2020-11-03 10:55:15 +00:00
Rebecca Law
184db4fa5d Merge pull request #3024 from alphagov/revert-3020-revert-3000-add-save-sms-api-task
Add a task to save-api-sms for high volume services.
2020-11-02 12:36:19 +00:00
Pea Tyczynska
41d1cf453d Update limit to 1MB and update tests
SES rejects email messages bigger than 10485760 bytes (just over 10 MB per message (after base64 encoding)):
https://docs.aws.amazon.com/ses/latest/DeveloperGuide/quotas.html#limits-message

Base64 is apparently wasteful because we use just 64 different values per byte, whereas a byte can represent
256 different characters. That is, we use bytes (which are 8-bit words) as 6-bit words. There is
a waste of 2 bits for each 8 bits of transmission data. To send three bytes of information
(3 times 8 is 24 bits), you need to use four bytes (4 times 6 is again 24 bits). Thus the base64 version
of a file is 4/3 larger than it might be. So we use 33% more storage than we could.
https://lemire.me/blog/2019/01/30/what-is-the-space-overhead-of-base64-encoding/

That brings down our max safe size to 7.5 MB == 7500000 bytes before base64 encoding

But this is not the end! The message we send to SES is structured as follows:
"Message": {
    'Subject': {
        'Data': subject,
    },
    'Body': {'Text': {'Data': body}, 'Html': {'Data': html_body}}
},
Which means that we are sending the contents of email message twice in one request: once in plain text
and once with html tags. That means our plain text content needs to be much shorter to make sure we
fit within the limit, especially since HTML body can be much byte-heavier than plain text body.

Hence, we decided to put the limit at 1MB, which is equivalent of between 250 and 500 pages of text.
That's still an extremely long email, and should be sufficient for all normal use, while at the same
time giving us safe margin while sending the emails through Amazon SES.
2020-10-29 14:07:49 +00:00
Pea Tyczynska
9708b09ba3 Tailor message-too-long error message
depending on the notification type.

Up until now, only sms messages could get message-too-long error,
but now we also need to validate the size of email messages, so
the message content needs to be tailored to the notification type.
2020-10-29 14:07:48 +00:00
Rebecca Law
29b6f84f6c Revert "Revert "Add a task to save-api-sms for high volume services."" 2020-10-29 11:12:46 +00:00
Toby Lorne
dd012d6831 client: cbc_proxy passes through sent/expires
A BroadcastEvent knows when an event was sent and should expire

We pass through these values directly to the CBC Proxy, because
BroadcastEvent knows how they should be formatted

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
2020-10-28 11:37:06 +00:00
Toby Lorne
dda71bf685 celery: add task for triggering link tests
We want to periodically kick off some link tests, so that:

- we are periodically communicating with the CBC Proxies
- each CBC Proxy is periodically communicating with its CBC

This simulation of traffic to the CBC will give us advance warning if
something is going to break, or is broken, before someone tries to send
a real live message

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>
Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>
2020-10-27 15:39:28 +00:00
Toby Lorne
7542709455 clients: cbc_proxy sends message_type
When we ask the CBC Proxy to send a message, we should specify that we
want to send a real message, when we want a real message

We will do this by specifying the message_type which can have 4 types, 3
of which represent a real message:

| Name   | Effect                   |
| ------ | ------------------------ |
| alert  | Create an alert          |
| update | Update an existing alert |
| cancel | Cancel an existing alert |
| test   | Send a link test         |

We will use message_type to represent the table above

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>
Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>
2020-10-27 15:24:02 +00:00
Chris Hill-Scott
5459d96c58 Merge pull request #3016 from alphagov/use-utils-to-format-broadcast-content
Use utils template to format broadcast content
2020-10-27 12:36:26 +00:00
Chris Hill-Scott
28f01bd776 Use utils template to format broadcast content
Brings in:
- [x] https://github.com/alphagov/notifications-utils/pull/801

Formats the content of the template at the time of creating the event.
This means that any downstream code (eg Celery tasks) can assume the
content is already formatted correctly.

Also, these downstream tasks don’t  know which template the broadcast
was created from, so if we support personalisation in the future this is
the most sensible place to bring together the template and the
personalisation.

---

I had to re-create some of the deleted code from utils for stuff like
formatting the timestamp to the CAP standard.
2020-10-27 10:55:26 +00:00
Pea Tyczynska
5cff30aa47 Send a log message informing that we are sending canary to CBC proxy 2020-10-27 10:45:32 +00:00
Toby Lorne
be90455944 Add task to send canary to cbc proxy
Create and schedule a Celery task that tests if we
can send a canary message to cbc proxy.

This will help us know if something happens to our
connection to cbc proxy.

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>
Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>
2020-10-27 10:38:09 +00:00
Toby Lorne
052de84c9e clients: cbc_proxy client has canary method
The CBC Proxy is essentially a lambda function which we invoke with
various arguments.

A way in which this can fail is that the notifications-api app invoking
the function may not be able, any longer, to invoke the function.

This could be caused by, for example:
* an egress restriction preventing access to eu-west-2.lambda.amazonaws.com
* a network partition preventing access to eu-west-2.lambda.amazonaws.com
* the app's credentials have been rotated or revoked

If we invoke a simple "canary" lambda function for which the app should
have access to invoke, and check it for failures, we will know quickly
if something is likely to be broken.

This is especially important for cell broadcasts compared to email/SMS
because we always have a baseline of traffic for email/SMS, and so any
failure is observed almost immediately. This is not true for CB where we
may expect to only see one CB message every week/month/quarter/year, as
opposed to every minute or second for email/SMS.

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>
2020-10-26 17:14:08 +00:00
Toby Lorne
d22adb7813 broadcasts: do not send description to cbc_proxy
"areas" and "simple_polygons" in "transmitted_areas" do not have the
same length

as an example, choosing the area "england" results in a single item in
"areas" but many polygons in "simple_polygons"

therefore zipping these two together gives a list of areas:
* of length 1
* containing only new grimsby

which is not what we want

as the CBC does not care about the areaDesc field within CAP, we should
omit it from the function invocation and delegate the contents of
areaDesc to the CBC Proxy implementation

Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>
Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>
Co-authored-by: David <david.mcdonald@digital.cabinet-office.gov.uk>
2020-10-26 15:16:33 +00:00
Leo Hemsted
902a2dde94 Merge pull request #3014 from alphagov/get-query-in-50k-batches
Optimise dao_get_letters_to_be_printed query
2020-10-26 13:23:17 +00:00