notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2026-02-26 04:49:49 -05:00

Author	SHA1	Message	Date
Ben Thorner	3eeba0266b	Revert "add raw request timings to provider send functions" This reverts commit `f2f2509c9b`. Raw request stats were added to investigate a hunch about a performance issue we were seeing [1], but turned out not to be relevant. We don't use them anymore so we can tidy up. [1]: https://github.com/alphagov/notifications-api/pull/2858	2021-10-28 11:12:18 +01:00
Ben Thorner	d24b45bb67	Constrain log length for CBC proxy payload This avoids any issues due to large payloads (e.g. with a lot of polygons in the 'areas' field). While we may miss part of the log in such cases, this is more than we get already anyway.	2021-07-20 16:06:48 +01:00
Richard Baker	d1791c6bfe	Remove "link test" from cbc-proxy log output The `_invoke_lambda` function is called when sending both link test and real message types. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-07-20 15:35:50 +01:00
Ben Thorner	cdc150de1b	Change link test task to trigger both lambda This modifies the previous "(_)send_link_test" method to trigger a link test for a specific lambda. We then call the method with both the primary and failover lambda in new orchestrator method. Since the _invoke_lambda function doesn't raise exceptions if it fails, there's no need to rescue anything in order to ensure the second link test / invocation runs as well. It doesn't testing for this, since it boils to an absence of code to raise any exception. Note that, like the other parent tests, we only check the new method works with a specific proxy client instance.	2021-07-19 16:00:56 +01:00
Ben Thorner	08f48379b4	Move ID generation into link test method Unlike the other IDs which are stored in the DB, this isn't relevant for the Celery task as it invokes a link test. Moving it into the proxy client will also enable us to generate a second ID in the next commits, where we start doing a link test for the failover lambda.	2021-07-19 16:00:55 +01:00
Ben Thorner	7fb65761f9	Always log when a lambda is invoked This replaces the previous single-purpose log about a link test with a more informative and generic one for all invocations.	2021-07-19 15:45:43 +01:00
Ben Thorner	1e8eda0d15	Avoid failover when running link tests We want to get the point where we're running link tests for each lambda independently. The tests weren't checking for the failover mechanism for link tests, so we can just remove it.	2021-07-19 15:45:42 +01:00
Ben Thorner	b6774bf0f7	Generate Vodafone link test sequence nos in proxy Previously the Celery task to trigger a link test had to know about the special case of a sequence number for Vodafone. Since we're about to change the client to perform multiple tests it makes sense to give it the knowledge of how to generate number itself. Note that we have to import the db inline to avoid a circular import, since this module is itself imported by app/__init__.py. Other invocations of the Vodafone client use stored sequence numbers from the DB, which are called "message numbers" in that context. Since the two use cases are very different (even the names are different!), having them in two places shouldn't cause any confusion.	2021-07-19 15:43:36 +01:00
Ben Thorner	23f4ae32df	Merge pull request #3214 from alphagov/check-broadcast-suspended Enforce service suspension for broadcasts	2021-04-28 15:01:11 +01:00
Rebecca Law	f3fdd3b09b	Add internation api key for firetext. We want to start using Firetext for sending international SMS. They require us to use a different API key for international SMS because it requires a new code path to switch the sender ID to something that the country will accept. This PR does not include switching the sender of international SMS to Firetext but sets us up to do so.	2021-04-20 13:58:55 +01:00
Ben Thorner	b2398fcaf4	Rename CBCProxyFatalException We only actually use this when the data we're working with is in an unexpected state, which is unrelated to the CBC Proxy. Using this name also means we can re-use this exception in the next commits. Note that we may still care if a broadcast message has expired, since it's not expected that someone would send one in this condition.	2021-04-19 17:13:05 +01:00
David McDonald	6d410daae4	Remove the emergency alerts canary See https://github.com/alphagov/notifications-broadcasts-infra/pull/197 for why we no longer need this and we get to delete some code!	2021-03-26 18:31:53 +00:00
David McDonald	41d95378ea	Remove everything for the performance platform We no longer will send them any stats so therefore don't need the code - the code to work out the nightly stats - the performance platform client - any configuration for the client - any nightly tasks that kick off the sending off the stats We will require a change in cronitor as we no longer will have this task run meaning we need to delete the cronitor check.	2021-03-15 12:04:53 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Rebecca Law	19f7a6ce38	Refactor method for deciding the failure type	2021-03-10 14:39:55 +00:00
Leo Hemsted	90e82aff3e	properly log the lambda response correctly boto returns a `StreamingBody`[1] response rather than a json struct. We're currently just logging things like "Error calling lambda o2-1-proxy with function error <botocore.response.StreamingBody object at 0x7f74cd6e02e8>" which is obviously less than ideal. Also make the tests properly reflect this - annoyingly it appears like we can't use moto to reliably test this interface as the moto `mock_lambda` decorator needs you to be running inside a docker container?? [1] https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html#botocore.response.StreamingBody	2021-02-18 11:51:38 +00:00
Katie Smith	6b8ebb3421	Fix linting errors	2021-02-16 09:03:38 +00:00
Leo Hemsted	fed0d4c40e	Merge pull request #3137 from alphagov/revert-revert-revert Bring back retry logic	2021-02-15 12:21:13 +00:00
Leo Hemsted	62cf9f60a9	catch boto exceptions these will happen if, for example, you have issues connecting to AWS or permission issues. Still failover if we get one of these exceptions, as I think it might be possible to have a problem only related to one of the lambdas.	2021-02-12 19:48:32 +00:00
Katie Smith	bc4a2005d4	Refactor the CBC proxy classes By creating a new `CBCProxyOne2ManyClient` class for the three One2Many clients to inherit from. These three clients are the same apart from the `lambda_name` and the `failover_lambda_name`.	2021-02-11 16:56:15 +00:00
Leo Hemsted	4f89be6944	Revert "Merge pull request #3125 from alphagov/revert-retry" This reverts commit `6b9a50beff`, reversing changes made to `33f93dfea2`.	2021-02-09 17:01:04 +00:00
Leo Hemsted	bee0059e53	Revert "Merge pull request #3101 from alphagov/retry-broadcasts" This reverts commit `1bd99c779d`, reversing changes made to `d390eb2cac`.	2021-02-08 11:02:34 +00:00
Leo Hemsted	ac34fb9c05	retry sending broadcasts Retry tasks if they fail to send a broadcast event. Note that each task tries the regular proxy and the failover proxy for that provider. This runs a bit differently than our other retries: Retry with exponential backoff. Our other tasks retry with a fixed delay of 5 minutes between tries. If we can't send a broadcast, we want to try immediately. So instead, implement an exponential backoff (1, 2, 4, 8, ... seconds delay). We can't delay for longer than 310 seconds due to visibility timeout settings in SQS, so cap the delay at that amount. Normally we give up retrying after a set amount of retries (often 4 hours). As broadcast content is much more important than normal notifications, we don't ever want to give up on sending them to phones... ...UNLESS WE DO! Sometimes we do want to give up sending a broadcast though! Broadcasts have an expiry time, when they stop showing up on peoples devices, so if that has passed then we don't need to send the broadcast out. Broadcast events can also be superceded by updates or cancels. Check that the event is the most recent event for that broadcast message, if not, give up, as we don't want to accidentally send out two conflicting events for the same message.	2021-02-03 16:43:01 +00:00
David McDonald	a46b8c3bba	Remove redundant comment	2021-02-01 14:10:39 +00:00
David McDonald	2aad3163e6	Allow CBC proxy client to take channel This moves the hardcoding to test channels one step up to where we call `create_and_send_broadcast` We can then after this, start to differ whether we give it the 'test' or 'severe' channel based on the services channel setting.	2021-02-01 14:10:38 +00:00
David McDonald	a3d966056a	Merge pull request #3110 from alphagov/test-channel Set the default broadcast channel to test	2021-02-01 10:18:35 +00:00
David McDonald	86ea89cf76	Merge pull request #3098 from alphagov/downgrade-to-warning Downgrade SMS provider request exceptions to warnings	2021-01-29 11:52:10 +00:00
David McDonald	f9b1d3d573	Set the default broadcast channel to test This used to be hardcoded in the CBC proxy but now we will hardcode it in the cbc_proxy_client. In a future PR we can start choosing which channel a broadcast will go to based on the channel configured for that broadcast service.	2021-01-27 15:27:11 +00:00
Katie Smith	2681752f15	Rename bt-ee-proxy to ee-proxy We want to rename the `bt-ee-1-proxy` lambda function to `ee-1-proxy`. This change will need to be deployed at the same time that we change the name of the lambda function in the Terraform.	2021-01-26 14:36:20 +00:00
Richard Baker	6256cdf792	Add proxy client for o2 cell croadcasting o2 use One-2-many CBC so we can use the O2M/CAP client. Once differences between CBCs have been worked out we can consolidate O2M clients to reduce duplication. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-01-26 11:11:44 +00:00
David McDonald	ac6837cde5	Downgrade exception to warning for provider API call When we send an HTTP request to our SMS providers, there is a chance we get a 5xx status code back from them. Currently we log this as two different exception level logs. If a provider has a funny few minutes, we could end up with hundreds of exceptions thrown and pagerduty waking someone up in the middle of the night. These problems tend to pretty quickly fix themselves as we balance traffic from one SMS to the other SMS provider within 5 minutes. By downgrading both exceptions to warning in the case of a `SmsClientResponseException`, we will reduce the change of waking us up in the middle of the night for no reason. If the error is not a `SmsClientResponseException`, then we will still log at the exception level as before as this is more unexpected and we may want to be alerted sooner. What we still want to happen though is that let's say both SMS providers went down at the same time for 1 hour. We don't want our tasks to just sit there, retrying every 5 minutes for the whole time without us being aware (so we can at least raise a statuspage update). Luckily we will still be alerted because our smoke tests will fail after 10 minutes and raise a p1: https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21	2021-01-18 17:00:21 +00:00
David McDonald	b9ec70acc2	Fix incorrect log line Should have been `lambda_name` not `self.lambda_name`.	2021-01-15 16:48:04 +00:00
David McDonald	ff193387d1	Add proxy client for Three Three uses the One 2 Many technology so should work in the same way as our proxy for EE	2021-01-14 11:44:46 +00:00
David McDonald	3216a74fbd	Don't try failover for canary No point trying, it's the same lambda. As `_invoke_lambda` currently takes a bytes payload, rather than a json payload, it meant the decision between encoding the payload in the canary or moving the encoding into the `_invoke_lambda` function. We decided to go for the former as the lesser of two evils. We may end up doing the encoding twice in the case of a failover but this avoids us having to put the encoding in our code in several places (for example the canary and also soon to be the link tests).	2021-01-14 11:00:38 +00:00
Pea Tyczynska	b5a33ded98	Retry with failover lambda for FunctionError and status > 299 For all FunctionErrors, and for invoke errors (status > 299) we want to retry with failover lambda. We are doing this, because if there is a connection or other error with one lambda, the failover lambda may still work and it's worth trying. With time, we will probably have more complex retry flow, depending on the error and even maybe differing for each MNO (broadcast provider).	2021-01-14 10:45:29 +00:00
Pea Tyczynska	1aff854afd	Create logs for invoking and finishing lambda, and for retry. Those logs will give us extra visibility into lambda invocation process.	2021-01-12 15:34:48 +00:00
David McDonald	24f52721f3	Retry with second lambda if connection error Note, we assume whenever there is a `FunctionError` that there will be a payload that contains an `errorMessage` key. It's implied implicitely in the docs but it's not very explicit. https://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html#API_Invoke_ResponseSyntax	2021-01-12 15:34:47 +00:00
David McDonald	9da8e54d69	Define failover lambdas We will need a lambda to failover to if the first lambda fails. This isn't so much a case of the lambda itself failing, as it is a cross availability zone resource automatically, it's more in case something in the networking goes down in our AZ and therefore the lambda can't call out to the CBC. In this case, we will be able to swap to using the second AZ by calling the second lambda.	2021-01-12 15:34:47 +00:00
David McDonald	1e537d507b	Make lambda_name abstract property As we require all instances to have it	2021-01-12 15:34:46 +00:00
David McDonald	5a46662c28	Abstract invoking of lambda This is to prepare us for where when we try and send/cancel a broadcast we may need to invoke more than one lambda. This might happen if we call the invoke the first lambda, we get an error and therefore we try and invoke a failover/second lambda. Then `_invoke_lambda` will be responsible for the call to AWS whereas `_invoke_lambda_with_failover` will be responsible more for picking the lambda and deciding on retry behaviour if failure cases.	2021-01-12 15:34:46 +00:00
David McDonald	57f5bd76de	Merge pull request #3081 from alphagov/ses-error-logs SES error logs	2020-12-31 13:13:20 +00:00
Leo Hemsted	d470c928cd	Merge pull request #3072 from alphagov/doc-dl-exc handle doc dl connection errors correctly	2020-12-31 11:24:00 +00:00
David McDonald	56879d0d22	Make sure error message is logged as part of the exception	2020-12-31 11:08:09 +00:00
David McDonald	2480f91667	Raise better exception on InvalidParameterValue error There are several reasons why we might get an `InvalidParameterValue` from the SES API. One, as correctly identified before in https://github.com/alphagov/notifications-api/pull/713/files is if we allow an email address on our side that SES rejects. However, there are other types of errors that could cause an `InvalidParameterValue`. One example is a `Header too long: 'Subject'` error that we have seen happen in production. This shouldn't raise an `InvalidEmailError` as that is not appropriate. Therefore, we introduce a new exception `EmailClientNonRetryableException`, that represents any exception back from an email client that we can use whenever we get a `InvalidParameterValue` error. Note, I chose `EmailClientNonRetryableException` rather than `SESClientNonRetryableException` as our code needs to catch this exception and it shouldn't be aware of what email client is being used, it just needs to know that it came from one of the email clients (if in time we have more than one). In time, we may wish to extend the approach of having generic `EmailClient` exceptions and `SMSClient` exceptions as this should be the most extendable pattern and a good abstraction.	2020-12-30 17:18:16 +00:00
David McDonald	2079202160	Stop logging email addresses for SES errors We shouldn't be logging PII so we should not log email addresses. We remove the email address and just log the normal exception message. Note, this meant before that you could see the email address and more easily track down the notification ID in the database. Now instead, you will need to search in the DB for notifications that have gone into technical failure at the time of the log message (as we still don't log the notification ID alongside the failure).	2020-12-30 17:18:15 +00:00
Chris Hill-Scott	9825469613	Make language attributes abstract properties This will make it impossible to create a new client without at least having to define these properties. Which should get someone thinking about language support…	2020-12-24 15:19:46 +00:00
Chris Hill-Scott	c3a1d5c506	Pass language through to lambda If we’re sending non-GSM characters, we need to mark the language in the XML as Welsh (`cy-GB` in CAP, `Welsh` in IBAG). Currently, the CBC proxy checks the content we’re sending, and then uses an approximation based on ASCII to determine whether we’re sending any non-GSM characters, and if so, sets the language appropriately. Instead, we should can functionality from the notifications-utils repo to determine the language. If any non-GSM characters are used, then the we can set the language to Welsh. We’ll need to update the proxy to look at this new language flag.	2020-12-24 15:15:32 +00:00
Leo Hemsted	325f271e25	handle doc dl connection errors correctly previously we'd see an error message in the logs: `AttributeError: 'NoneType' object has no attribute 'status_code'` because we were assuming the requests exception would always have a response - it won't have a response if it wasn't able to create a connection at all.	2020-12-23 12:21:24 +00:00
Pea Tyczynska	95deb5a52f	Move DATETIME_FORMAT from app to app.utils To avoid cyclical import issues	2020-12-18 17:39:35 +00:00
Pea Tyczynska	ee833bd65b	Fix cancel broadcast by converting reference date to string Datetime oobject is not json serializable, we have to convert it to string for the created_at field of previous broadcast provider messages.	2020-12-18 17:22:21 +00:00

1 2 3 4

187 Commits