notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2026-05-17 23:34:05 -04:00

Author	SHA1	Message	Date
Jim Moffet	2bcae20441	initial sms provider cleanup	2022-06-25 13:05:10 -07:00
Jim Moffet	aa4ec532a4	implement SNS	2022-06-17 11:16:23 -07:00
Jim Moffet	59b72f4853	add devcontainer configs and docker network orchestration	2022-06-13 13:16:32 -07:00
Ben Thorner	c27107fa74	Remove support for Reach provider This provider was never active and support was never completed, so there's little value in keeping all this potentially confusing code.	2022-04-29 12:28:08 +01:00
Ben Thorner	44d90b0a4f	Remove redundant ternary on SMS client FROM_NUMBER Logs over the past 14 days confirm we never call this code with None as the sender, so it's safe to remove the ternary.	2022-04-12 14:59:21 +01:00
Ben Thorner	0d07220923	Fix log for sending SMS to be generic per client This was a mistake in [^1]. [^1]: `3b082477f0 (diff-95316b574f974237b3b7ce453fec09628bd1062087154789ff4d0176ea23c460R52)`	2022-04-05 10:46:14 +01:00
Ben Thorner	e6fffc00da	Add temporary log to check if code is in use In response to: [^1]. [^1]: https://github.com/alphagov/notifications-api/pull/3493#discussion_r838477599	2022-03-31 10:32:18 +01:00
Ben Thorner	7d92a0869a	Remove per-client SMS exception classes In response to: [^1]. The stacktrace conveys the same and more information. We don't do anything different for each exception class, so there's no value in having three of them over one exception. I did think about DRYing-up the duplicate exception behaviour into the base class one. This isn't ideal because the base class would be making assumptions about how inheriting classes make requests, which might change with future providers. Although it might be nice to have more info in the top-level message, we'll still get it in the stacktrace e.g. ValueError: Expected 'code' to be '0' During handling of the above exception, another exception occurred: app.clients.sms.SmsClientResponseException: SMS client error (Invalid response JSON) requests.exceptions.ReadTimeout During handling of the above exception, another exception occurred: app.clients.sms.SmsClientResponseException: SMS client error (Request failed) [^1]: https://github.com/alphagov/notifications-api/pull/3493#discussion_r837363717	2022-03-30 13:38:50 +01:00
Ben Thorner	a2e1d03009	Require "sender" argument to send_sms method In response to [^1]. [^1]: https://github.com/alphagov/notifications-api/pull/3493#discussion_r836616675	2022-03-30 13:38:48 +01:00
Ben Thorner	015152bab2	Add boilerplate for sending SMS via Reach This works in conjunction with the new SMS provider stub [^1]. Local testing: - Run the migrations to add Reach as an inactive provider. - Activate the Reach provider locally and deactivate the others. update provider_details set priority = 100, active = false where notification_type = 'sms'; update provider_details set active = true where identifier = 'reach'; - Tweak your local environment to point at the SMS stub. export REACH_URL="http://host.docker.internal:6300/reach" - Start / restart Celery to pick up the config change. - Send a SMS via the Admin app and see the stub log it. - Reset your environment so you can send normal SMS. update provider_details set active = true where notification_type = 'sms'; update provider_details set active = false where identifier = 'reach'; [^1]: https://github.com/alphagov/notifications-sms-provider-stub/pull/10	2022-03-30 13:38:46 +01:00
Ben Thorner	27ddc4501e	DRY-up overriding shortcode with sender This avoids duplicating the logic when we add a new provider.	2022-03-30 13:37:01 +01:00
Ben Thorner	3b082477f0	DRY-up logging and metrics for sending SMS This avoids duplicating it as we add a new provider and means we can test it all in one place (although it wasn't tested before). I'm not sure why the previous code did "super(..)__init__" in a non-init function - it's a bit late! - so I've just replaced it with a call to the new "init_app" function in the parent class.	2022-03-30 13:37:00 +01:00
Ben Thorner	22e055f4d1	DRY-up recording the outcome of SMS sending This reduces the code to copy when we add a new provider. I don't think we need to log the URL or status code each time: - The URL is always the same. - A "200" status code is implicit in "success". - Other status codes will be reported as exceptions. Removing these specific elements means "record_outcome" is generic and can be de-duplicated in the base class.	2022-03-30 13:36:58 +01:00
Ben Thorner	35f710bdf3	Remove redundant "multi" parameter for MMG client This is never overridden and can't be used in practie because all SMS clients have to use the same interface. Removing it will make it possible to DRY-up some of the code in this method.	2022-03-30 13:36:57 +01:00
Ben Thorner	e6e16a81d0	Simplify getting name of email / sms providers Previously we used a combination of "provider.name" and "get_name()" which was confusing. Using a non-property function also gave me the impression that the name was more dynamic than it actually is.	2022-03-30 13:36:55 +01:00
Ben Thorner	b439fd0718	Add boilerplate for Reach SMS callbacks This is enough to update a notification in DB: 1. First create a notification in the UI and sent it. 2. Then reset its attributes to pretend it's for Reach. update notifications set sent_at = null, sent_by = null, notification_status='sending' where id='some-uuid'; 3. Change "notification_id" to "<some-uuid>" in the code. 4. Call the boilerplate endpoint for Reach callbacks. curl -X POST localhost:6011/notifications/sms/reach Interestingly there's no foreign key constraint on "sent_by" in the DB, so this just works: the notification is updated.	2022-03-24 16:56:33 +00:00
Ben Thorner	3eeba0266b	Revert "add raw request timings to provider send functions" This reverts commit `f2f2509c9b`. Raw request stats were added to investigate a hunch about a performance issue we were seeing [1], but turned out not to be relevant. We don't use them anymore so we can tidy up. [1]: https://github.com/alphagov/notifications-api/pull/2858	2021-10-28 11:12:18 +01:00
Ben Thorner	d24b45bb67	Constrain log length for CBC proxy payload This avoids any issues due to large payloads (e.g. with a lot of polygons in the 'areas' field). While we may miss part of the log in such cases, this is more than we get already anyway.	2021-07-20 16:06:48 +01:00
Richard Baker	d1791c6bfe	Remove "link test" from cbc-proxy log output The `_invoke_lambda` function is called when sending both link test and real message types. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-07-20 15:35:50 +01:00
Ben Thorner	cdc150de1b	Change link test task to trigger both lambda This modifies the previous "(_)send_link_test" method to trigger a link test for a specific lambda. We then call the method with both the primary and failover lambda in new orchestrator method. Since the _invoke_lambda function doesn't raise exceptions if it fails, there's no need to rescue anything in order to ensure the second link test / invocation runs as well. It doesn't testing for this, since it boils to an absence of code to raise any exception. Note that, like the other parent tests, we only check the new method works with a specific proxy client instance.	2021-07-19 16:00:56 +01:00
Ben Thorner	08f48379b4	Move ID generation into link test method Unlike the other IDs which are stored in the DB, this isn't relevant for the Celery task as it invokes a link test. Moving it into the proxy client will also enable us to generate a second ID in the next commits, where we start doing a link test for the failover lambda.	2021-07-19 16:00:55 +01:00
Ben Thorner	7fb65761f9	Always log when a lambda is invoked This replaces the previous single-purpose log about a link test with a more informative and generic one for all invocations.	2021-07-19 15:45:43 +01:00
Ben Thorner	1e8eda0d15	Avoid failover when running link tests We want to get the point where we're running link tests for each lambda independently. The tests weren't checking for the failover mechanism for link tests, so we can just remove it.	2021-07-19 15:45:42 +01:00
Ben Thorner	b6774bf0f7	Generate Vodafone link test sequence nos in proxy Previously the Celery task to trigger a link test had to know about the special case of a sequence number for Vodafone. Since we're about to change the client to perform multiple tests it makes sense to give it the knowledge of how to generate number itself. Note that we have to import the db inline to avoid a circular import, since this module is itself imported by app/__init__.py. Other invocations of the Vodafone client use stored sequence numbers from the DB, which are called "message numbers" in that context. Since the two use cases are very different (even the names are different!), having them in two places shouldn't cause any confusion.	2021-07-19 15:43:36 +01:00
Ben Thorner	23f4ae32df	Merge pull request #3214 from alphagov/check-broadcast-suspended Enforce service suspension for broadcasts	2021-04-28 15:01:11 +01:00
Rebecca Law	f3fdd3b09b	Add internation api key for firetext. We want to start using Firetext for sending international SMS. They require us to use a different API key for international SMS because it requires a new code path to switch the sender ID to something that the country will accept. This PR does not include switching the sender of international SMS to Firetext but sets us up to do so.	2021-04-20 13:58:55 +01:00
Ben Thorner	b2398fcaf4	Rename CBCProxyFatalException We only actually use this when the data we're working with is in an unexpected state, which is unrelated to the CBC Proxy. Using this name also means we can re-use this exception in the next commits. Note that we may still care if a broadcast message has expired, since it's not expected that someone would send one in this condition.	2021-04-19 17:13:05 +01:00
David McDonald	6d410daae4	Remove the emergency alerts canary See https://github.com/alphagov/notifications-broadcasts-infra/pull/197 for why we no longer need this and we get to delete some code!	2021-03-26 18:31:53 +00:00
David McDonald	41d95378ea	Remove everything for the performance platform We no longer will send them any stats so therefore don't need the code - the code to work out the nightly stats - the performance platform client - any configuration for the client - any nightly tasks that kick off the sending off the stats We will require a change in cronitor as we no longer will have this task run meaning we need to delete the cronitor check.	2021-03-15 12:04:53 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Rebecca Law	19f7a6ce38	Refactor method for deciding the failure type	2021-03-10 14:39:55 +00:00
Leo Hemsted	90e82aff3e	properly log the lambda response correctly boto returns a `StreamingBody`[1] response rather than a json struct. We're currently just logging things like "Error calling lambda o2-1-proxy with function error <botocore.response.StreamingBody object at 0x7f74cd6e02e8>" which is obviously less than ideal. Also make the tests properly reflect this - annoyingly it appears like we can't use moto to reliably test this interface as the moto `mock_lambda` decorator needs you to be running inside a docker container?? [1] https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html#botocore.response.StreamingBody	2021-02-18 11:51:38 +00:00
Katie Smith	6b8ebb3421	Fix linting errors	2021-02-16 09:03:38 +00:00
Leo Hemsted	fed0d4c40e	Merge pull request #3137 from alphagov/revert-revert-revert Bring back retry logic	2021-02-15 12:21:13 +00:00
Leo Hemsted	62cf9f60a9	catch boto exceptions these will happen if, for example, you have issues connecting to AWS or permission issues. Still failover if we get one of these exceptions, as I think it might be possible to have a problem only related to one of the lambdas.	2021-02-12 19:48:32 +00:00
Katie Smith	bc4a2005d4	Refactor the CBC proxy classes By creating a new `CBCProxyOne2ManyClient` class for the three One2Many clients to inherit from. These three clients are the same apart from the `lambda_name` and the `failover_lambda_name`.	2021-02-11 16:56:15 +00:00
Leo Hemsted	4f89be6944	Revert "Merge pull request #3125 from alphagov/revert-retry" This reverts commit `6b9a50beff`, reversing changes made to `33f93dfea2`.	2021-02-09 17:01:04 +00:00
Leo Hemsted	bee0059e53	Revert "Merge pull request #3101 from alphagov/retry-broadcasts" This reverts commit `1bd99c779d`, reversing changes made to `d390eb2cac`.	2021-02-08 11:02:34 +00:00
Leo Hemsted	ac34fb9c05	retry sending broadcasts Retry tasks if they fail to send a broadcast event. Note that each task tries the regular proxy and the failover proxy for that provider. This runs a bit differently than our other retries: Retry with exponential backoff. Our other tasks retry with a fixed delay of 5 minutes between tries. If we can't send a broadcast, we want to try immediately. So instead, implement an exponential backoff (1, 2, 4, 8, ... seconds delay). We can't delay for longer than 310 seconds due to visibility timeout settings in SQS, so cap the delay at that amount. Normally we give up retrying after a set amount of retries (often 4 hours). As broadcast content is much more important than normal notifications, we don't ever want to give up on sending them to phones... ...UNLESS WE DO! Sometimes we do want to give up sending a broadcast though! Broadcasts have an expiry time, when they stop showing up on peoples devices, so if that has passed then we don't need to send the broadcast out. Broadcast events can also be superceded by updates or cancels. Check that the event is the most recent event for that broadcast message, if not, give up, as we don't want to accidentally send out two conflicting events for the same message.	2021-02-03 16:43:01 +00:00
David McDonald	a46b8c3bba	Remove redundant comment	2021-02-01 14:10:39 +00:00
David McDonald	2aad3163e6	Allow CBC proxy client to take channel This moves the hardcoding to test channels one step up to where we call `create_and_send_broadcast` We can then after this, start to differ whether we give it the 'test' or 'severe' channel based on the services channel setting.	2021-02-01 14:10:38 +00:00
David McDonald	a3d966056a	Merge pull request #3110 from alphagov/test-channel Set the default broadcast channel to test	2021-02-01 10:18:35 +00:00
David McDonald	86ea89cf76	Merge pull request #3098 from alphagov/downgrade-to-warning Downgrade SMS provider request exceptions to warnings	2021-01-29 11:52:10 +00:00
David McDonald	f9b1d3d573	Set the default broadcast channel to test This used to be hardcoded in the CBC proxy but now we will hardcode it in the cbc_proxy_client. In a future PR we can start choosing which channel a broadcast will go to based on the channel configured for that broadcast service.	2021-01-27 15:27:11 +00:00
Katie Smith	2681752f15	Rename bt-ee-proxy to ee-proxy We want to rename the `bt-ee-1-proxy` lambda function to `ee-1-proxy`. This change will need to be deployed at the same time that we change the name of the lambda function in the Terraform.	2021-01-26 14:36:20 +00:00
Richard Baker	6256cdf792	Add proxy client for o2 cell croadcasting o2 use One-2-many CBC so we can use the O2M/CAP client. Once differences between CBCs have been worked out we can consolidate O2M clients to reduce duplication. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-01-26 11:11:44 +00:00
David McDonald	ac6837cde5	Downgrade exception to warning for provider API call When we send an HTTP request to our SMS providers, there is a chance we get a 5xx status code back from them. Currently we log this as two different exception level logs. If a provider has a funny few minutes, we could end up with hundreds of exceptions thrown and pagerduty waking someone up in the middle of the night. These problems tend to pretty quickly fix themselves as we balance traffic from one SMS to the other SMS provider within 5 minutes. By downgrading both exceptions to warning in the case of a `SmsClientResponseException`, we will reduce the change of waking us up in the middle of the night for no reason. If the error is not a `SmsClientResponseException`, then we will still log at the exception level as before as this is more unexpected and we may want to be alerted sooner. What we still want to happen though is that let's say both SMS providers went down at the same time for 1 hour. We don't want our tasks to just sit there, retrying every 5 minutes for the whole time without us being aware (so we can at least raise a statuspage update). Luckily we will still be alerted because our smoke tests will fail after 10 minutes and raise a p1: https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21	2021-01-18 17:00:21 +00:00
David McDonald	b9ec70acc2	Fix incorrect log line Should have been `lambda_name` not `self.lambda_name`.	2021-01-15 16:48:04 +00:00
David McDonald	ff193387d1	Add proxy client for Three Three uses the One 2 Many technology so should work in the same way as our proxy for EE	2021-01-14 11:44:46 +00:00
David McDonald	3216a74fbd	Don't try failover for canary No point trying, it's the same lambda. As `_invoke_lambda` currently takes a bytes payload, rather than a json payload, it meant the decision between encoding the payload in the canary or moving the encoding into the `_invoke_lambda` function. We decided to go for the former as the lesser of two evils. We may end up doing the encoding twice in the case of a failover but this avoids us having to put the encoding in our code in several places (for example the canary and also soon to be the link tests).	2021-01-14 11:00:38 +00:00

1 2 3 4 5

203 Commits