Commit Graph

4522 Commits

Author SHA1 Message Date
Pea Tyczynska
882da84182 Merge pull request #3096 from alphagov/add-notes-to-service
Add notes column to services table
2021-01-19 14:45:42 +00:00
David McDonald
c20fb8abce Remove duplicate keys 2021-01-19 14:01:51 +00:00
David McDonald
c1a77eefa1 Sort exclude list
This will make it easier to review and compare changes and also will
help identify duplicates.
2021-01-19 14:00:03 +00:00
Chris Hill-Scott
a496ef0a97 Merge pull request #2986 from alphagov/cache-on-tasks
Use cache to improve performance of CSV processing
2021-01-19 11:15:18 +00:00
David McDonald
ac6837cde5 Downgrade exception to warning for provider API call
When we send an HTTP request to our SMS providers, there is a
chance we get a 5xx status code back from them. Currently we log this as
two different exception level logs.

If a provider has a funny few minutes, we could end up with
hundreds of exceptions thrown and pagerduty waking someone up in the
middle of the night. These problems tend to pretty quickly fix
themselves as we balance traffic from one SMS to the other SMS provider
within 5 minutes.

By downgrading both exceptions to warning in the case of a
`SmsClientResponseException`, we will reduce the change of waking us up
in the middle of the night for no reason.

If the error is not a `SmsClientResponseException`, then we will still
log at the exception level as before as this is more unexpected and we
may want to be alerted sooner.

What we still want to happen though is that let's say both SMS providers
went down at the same time for 1 hour. We don't want our tasks to just
sit there, retrying every 5 minutes for the whole time without us being
aware (so we can at least raise a statuspage update). Luckily we will
still be alerted because our smoke tests will fail after 10 minutes and
raise a p1:
https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21
2021-01-18 17:00:21 +00:00
Pea Tyczynska
22f2eb7bfe Add notes column to services table 2021-01-18 10:36:51 +00:00
Chris Hill-Scott
4eb4ea1772 Use cache for tasks that save notifications
These tasks need to repeatedly get the same template and service from
the database. We should be able to improve their performance by getting
the template and service from the cache instead, like we do in the REST
endpoint code.
2021-01-18 10:25:24 +00:00
David McDonald
b9ec70acc2 Fix incorrect log line
Should have been `lambda_name` not `self.lambda_name`.
2021-01-15 16:48:04 +00:00
David McDonald
9c01d8018d Merge pull request #3093 from alphagov/broadcast-tasks-onto-worker
Broadcast tasks onto worker
2021-01-15 15:51:35 +00:00
Chris Hill-Scott
e161f6e4a1 Require reference if template not provided
In the admin app we need something to use in show in lieu of template
name when a template isn’t used. Let’s store this in the reference field
for now.
2021-01-15 14:57:36 +00:00
Chris Hill-Scott
0510311d63 Don’t require template when content is provided
So that the admin app can create broadcasts without a template it needs
to be allowed to create broadcasts from content instead.
2021-01-15 14:57:36 +00:00
Chris Hill-Scott
01504ed760 Merge pull request #3090 from alphagov/migration-allow-broadcasts-created-by-api
Allow broadcast messages to be created with API key
2021-01-15 14:05:11 +00:00
Chris Hill-Scott
78e87857e3 Don’t serialize nullable UUID columns to 'None'
We should return a proper `None` instead, so it gets JSONified as `null`
and returns what you’d expect when doing `bool(model.field)`
2021-01-15 13:15:00 +00:00
Chris Hill-Scott
fe420448e1 Allow broadcast messages to be created with API key
When we have a public API there will be no human creating the broadcast
message. Instead it will be created by an API integration, authenticated
by a key (just like for emails or texts).

This updates the database to:
- add a new foreign key from BroadcastMessages to API keys
- add a `reference` column

It doesn’t change the model yet, because the model is used by previous
migrations, so would cause them to fail when run before the new columns
exist. We can make this change in later pull requests.
2021-01-15 12:52:24 +00:00
David McDonald
060ee54a74 Enable Three CBC 2021-01-14 11:52:23 +00:00
David McDonald
ff193387d1 Add proxy client for Three
Three uses the One 2 Many technology so should work in the same way as
our proxy for EE
2021-01-14 11:44:46 +00:00
David McDonald
f3ee2cdd48 Merge pull request #3086 from alphagov/lambda-errors
Failover to second lambda on error
2021-01-14 11:15:34 +00:00
David McDonald
3216a74fbd Don't try failover for canary
No point trying, it's the same lambda. As `_invoke_lambda` currently
takes a bytes payload, rather than a json payload, it meant the decision
between encoding the payload in the canary or moving the encoding into
the `_invoke_lambda` function. We decided to go for the former as the
lesser of two evils. We may end up doing the encoding twice in the case
of a failover but this avoids us having to put the encoding in our code
in several places (for example the canary and also soon to be the link
tests).
2021-01-14 11:00:38 +00:00
Pea Tyczynska
b5a33ded98 Retry with failover lambda for FunctionError and status > 299
For all FunctionErrors, and for invoke errors (status > 299) we
want to retry with failover lambda.

We are doing this, because if there is a connection or other error
with one lambda, the failover lambda may still work and it's
worth trying.

With time, we will probably have more complex retry flow, depending
on the error and even maybe differing for each MNO (broadcast provider).
2021-01-14 10:45:29 +00:00
David McDonald
20627d96ea Put all broadcast tasks on the broadcast worker 2021-01-13 17:21:40 +00:00
David McDonald
78db0f9c2b Add broadcasts worker and queue
This worker will be responsible for handing all broadcasts tasks.

It is based on the internal worker which is currently handling broadcast
tasks.

Concurrency of 2 has been chosen fairly arbitrarily. Gunicorn will be
running 4 worker processes so we will end up with the ability to process
8 tasks per app instance given this.
2021-01-13 16:35:27 +00:00
David McDonald
fb5b05a983 Merge pull request #3089 from alphagov/everyday-2nd-class-alert
Alert on 2nd class letters still in sending everyday
2021-01-13 12:16:55 +00:00
Leo Hemsted
5ade5ba13f Merge pull request #3087 from alphagov/migrate-broadcast
add content to old broadcast messages with no content
2021-01-13 11:39:30 +00:00
David McDonald
c3ef23c771 Alert on 2nd class letters still in sending everyday
In 8285ef5f89
we turned off alerting on 2nd class letters still being in sending on
certain days of the week because we were only sending letters out on
Mon, Wed, Fri.

Now we have swapped back to sending out 2nd class letters on all
workdays so this change can be reverted. Note, I haven't reverted the
commit exactly but more so the behaviour, whilst leaving in some tests
to explicitly test 2nd class letters for the alert in case we change
this again.
2021-01-13 11:21:27 +00:00
Leo Hemsted
54495b4e14 add content to old broadcast messages with no content
new broadcast messages will have content filled whether they have a
tempalte or not, but old ones won't so populate.

Stole the session constructor from 0044_jos_to_notification_hist.py
2021-01-13 10:09:16 +00:00
Pea Tyczynska
1aff854afd Create logs for invoking and finishing lambda, and for retry.
Those logs will give us extra visibility into lambda invocation
process.
2021-01-12 15:34:48 +00:00
David McDonald
24f52721f3 Retry with second lambda if connection error
Note, we assume whenever there is a `FunctionError` that there will be a
payload that contains an `errorMessage` key. It's implied implicitely in
the docs but it's not very explicit.

https://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html#API_Invoke_ResponseSyntax
2021-01-12 15:34:47 +00:00
David McDonald
9da8e54d69 Define failover lambdas
We will need a lambda to failover to if the first lambda fails. This
isn't so much a case of the lambda itself failing, as it is a cross
availability zone resource automatically, it's more in case something in
the networking goes down in our AZ and therefore the lambda can't call
out to the CBC. In this case, we will be able to swap to using the
second AZ by calling the second lambda.
2021-01-12 15:34:47 +00:00
David McDonald
1e537d507b Make lambda_name abstract property
As we require all instances to have it
2021-01-12 15:34:46 +00:00
David McDonald
5a46662c28 Abstract invoking of lambda
This is to prepare us for where when we try and send/cancel a broadcast
we may need to invoke more than one lambda. This might happen if we call
the invoke the first lambda, we get an error and therefore we try and
invoke a failover/second lambda. Then `_invoke_lambda` will be
responsible for the call to AWS whereas `_invoke_lambda_with_failover`
will be responsible more for picking the lambda and deciding on retry
behaviour if failure cases.
2021-01-12 15:34:46 +00:00
Rebecca Law
e05e9bb5e0 Change the sort order for the organisation usage page.
Ensure the archived services are at the bottom of the list. The organisation trial mode page already sorts the archived services to the bottom.
2021-01-12 09:44:35 +00:00
Leo Hemsted
400dfe0217 allow broadcasts to have a template and no content
ensures code remains backwards compatible during the deploy. this commit
should be reverted once all broadcast_message.content fields have been
back-filled.
2021-01-11 15:56:40 +00:00
Leo Hemsted
2e929754ff add content to broadcast_message and make template fields nullable
we want to be able to create broadcast messages without templates. To
start with, these will come from the API, but in future we may want to
let people create via the admin interface without creating a template
too.

populate a non-nullable content field with the values supplied via the
template (or supplied directly if via api).
2021-01-08 18:58:17 +00:00
Chris Hill-Scott
d55b66a6d8 Merge pull request #3075 from alphagov/cache-provider-lookup
Cache provider lookups for 10 seconds
2021-01-04 10:00:25 +00:00
Leo Hemsted
4814c66c1d fix schema metaclasses
marshmallow v0.22.0 added load_instance and include_relationship
options, which we need to keep old ModelSchema code working
2020-12-31 14:13:05 +00:00
Leo Hemsted
a33ec5c7f1 remove deprecated ModelSchema class 2020-12-31 13:56:20 +00:00
David McDonald
57f5bd76de Merge pull request #3081 from alphagov/ses-error-logs
SES error logs
2020-12-31 13:13:20 +00:00
Leo Hemsted
d470c928cd Merge pull request #3072 from alphagov/doc-dl-exc
handle doc dl connection errors correctly
2020-12-31 11:24:00 +00:00
David McDonald
56879d0d22 Make sure error message is logged as part of the exception 2020-12-31 11:08:09 +00:00
Chris Hill-Scott
8834377a5d Merge pull request #3074 from alphagov/serialise-process-type
Serialise process_type for template history
2020-12-31 09:54:00 +00:00
Chris Hill-Scott
55afc9a401 Increase provider lookup cache TTL to 10 seconds
Tested locally with TTL values of:
- 2 seconds
- 5 seconds
- 10 seconds

The benefit really started showing at 10 seconds, where >50% of lookups
hit the cache rather than the database.

For graphs see https://github.com/alphagov/notifications-api/pull/3075#issuecomment-750836404
2020-12-31 09:36:55 +00:00
David McDonald
977554781f Add better logging message for tech failure
So we can easily identify which notification ID failed
2020-12-30 17:28:21 +00:00
David McDonald
2480f91667 Raise better exception on InvalidParameterValue error
There are several reasons why we might get an `InvalidParameterValue`
from the SES API. One, as correctly identified before in
https://github.com/alphagov/notifications-api/pull/713/files
is if we allow an email address on our side that SES rejects.

However, there are other types of errors that could cause an
`InvalidParameterValue`. One example is a `Header too long: 'Subject'`
error that we have seen happen in production. This shouldn't raise an
`InvalidEmailError` as that is not appropriate.

Therefore, we introduce a new exception
`EmailClientNonRetryableException`, that represents any exception back
from an email client that we can use whenever we get a
`InvalidParameterValue` error.

Note, I chose `EmailClientNonRetryableException` rather than
`SESClientNonRetryableException` as our code needs to catch this
exception and it shouldn't be aware of what email client is being used,
it just needs to know that it came from one of the email clients (if in
time we have more than one).

In time, we may wish to extend the approach of having generic
`EmailClient` exceptions and `SMSClient` exceptions as this should be
the most extendable pattern and a good abstraction.
2020-12-30 17:18:16 +00:00
David McDonald
2079202160 Stop logging email addresses for SES errors
We shouldn't be logging PII so we should not log email addresses. We
remove the email address and just log the normal exception message.

Note, this meant before that you could see the email address and more
easily track down the notification ID in the database. Now instead, you
will need to search in the DB for notifications that have gone into
technical failure at the time of the log message (as we still don't
log the notification ID alongside the failure).
2020-12-30 17:18:15 +00:00
Chris Hill-Scott
9825469613 Make language attributes abstract properties
This will make it impossible to create a new client without at least
having to define these properties. Which should get someone thinking
about language support…
2020-12-24 15:19:46 +00:00
Chris Hill-Scott
c3a1d5c506 Pass language through to lambda
If we’re sending non-GSM characters, we need to mark the language in the
XML as Welsh (`cy-GB` in CAP, `Welsh` in IBAG).

Currently, the CBC proxy checks the content we’re sending, and then uses
an approximation based on ASCII to determine whether we’re sending any
non-GSM characters, and if so, sets the language appropriately.

Instead, we should can functionality from the notifications-utils repo
to determine the language. If any non-GSM characters are used, then the
we can set the language to Welsh.

We’ll need to update the proxy to look at this new language flag.
2020-12-24 15:15:32 +00:00
Chris Hill-Scott
b8a191c3e3 Merge pull request #3071 from alphagov/add-flake8-bugbear
Do extra code style checks with flake8-bugbear
2020-12-23 16:20:48 +00:00
Chris Hill-Scott
51b7192750 Cache provider lookups for 2 seconds
For every email or text message we send we have to work out which
provider to send it with. Every time we do this we go and load the list
of providers from the database.

For emails, the result will always be the same.

For text messages the result is randomly chosen to balance the load
between the providers.

For international text messages the result is always the same (we only
have one international text message provider).

This commit adds an in-memory cache with a 2 second TTL so that we’re
not fetching the providers from the database every time, which should
speed things up a bit.

This does mean that, for text messages, the random choice will ‘stick’
for two seconds on each instance, before being re-chosen. I think this
is  OK because it will even out to the same distribution over time.

I really don’t like having to clear the cache in the tests, so would
welcome suggestions on a better way of doing this…
2020-12-23 16:17:27 +00:00
Chris Hill-Scott
b1dc8cc758 Serialise process_type for template history
We already serialise it in the templates response. We should make sure
the field is also present in the history response, if we want to use
cached template versions when processing notifications.
2020-12-23 13:57:38 +00:00
Rebecca Law
a2bb775b6f Merge pull request #3069 from alphagov/add-request-id-if-in-context
Pass request_id onto the task if called from a task
2020-12-23 12:26:04 +00:00