notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-24 01:11:38 -05:00

Author	SHA1	Message	Date
David McDonald	070b79c27e	Downgrade exceptions to warnings to reduce emails We already trigger a zendesk ticket for these two cases, meaning that whenever we get this situation, we get 3 emails. One for the zendesk ticket, one from logit raising the fact an exception was raised and one from cloudwatch raising the fact an exception was raised. We don't need all these emails, a zendesk ticket is sufficient. Downgrading to a warning means this event will still be findable in our logs however.	2021-02-02 15:10:26 +00:00
David McDonald	86ea89cf76	Merge pull request #3098 from alphagov/downgrade-to-warning Downgrade SMS provider request exceptions to warnings	2021-01-29 11:52:10 +00:00
David McDonald	ac6837cde5	Downgrade exception to warning for provider API call When we send an HTTP request to our SMS providers, there is a chance we get a 5xx status code back from them. Currently we log this as two different exception level logs. If a provider has a funny few minutes, we could end up with hundreds of exceptions thrown and pagerduty waking someone up in the middle of the night. These problems tend to pretty quickly fix themselves as we balance traffic from one SMS to the other SMS provider within 5 minutes. By downgrading both exceptions to warning in the case of a `SmsClientResponseException`, we will reduce the change of waking us up in the middle of the night for no reason. If the error is not a `SmsClientResponseException`, then we will still log at the exception level as before as this is more unexpected and we may want to be alerted sooner. What we still want to happen though is that let's say both SMS providers went down at the same time for 1 hour. We don't want our tasks to just sit there, retrying every 5 minutes for the whole time without us being aware (so we can at least raise a statuspage update). Luckily we will still be alerted because our smoke tests will fail after 10 minutes and raise a p1: https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21	2021-01-18 17:00:21 +00:00
Chris Hill-Scott	4eb4ea1772	Use cache for tasks that save notifications These tasks need to repeatedly get the same template and service from the database. We should be able to improve their performance by getting the template and service from the cache instead, like we do in the REST endpoint code.	2021-01-18 10:25:24 +00:00
David McDonald	20627d96ea	Put all broadcast tasks on the broadcast worker	2021-01-13 17:21:40 +00:00
David McDonald	c3ef23c771	Alert on 2nd class letters still in sending everyday In `8285ef5f89` we turned off alerting on 2nd class letters still being in sending on certain days of the week because we were only sending letters out on Mon, Wed, Fri. Now we have swapped back to sending out 2nd class letters on all workdays so this change can be reverted. Note, I haven't reverted the commit exactly but more so the behaviour, whilst leaving in some tests to explicitly test 2nd class letters for the alert in case we change this again.	2021-01-13 11:21:27 +00:00
David McDonald	977554781f	Add better logging message for tech failure So we can easily identify which notification ID failed	2020-12-30 17:28:21 +00:00
David McDonald	2480f91667	Raise better exception on InvalidParameterValue error There are several reasons why we might get an `InvalidParameterValue` from the SES API. One, as correctly identified before in https://github.com/alphagov/notifications-api/pull/713/files is if we allow an email address on our side that SES rejects. However, there are other types of errors that could cause an `InvalidParameterValue`. One example is a `Header too long: 'Subject'` error that we have seen happen in production. This shouldn't raise an `InvalidEmailError` as that is not appropriate. Therefore, we introduce a new exception `EmailClientNonRetryableException`, that represents any exception back from an email client that we can use whenever we get a `InvalidParameterValue` error. Note, I chose `EmailClientNonRetryableException` rather than `SESClientNonRetryableException` as our code needs to catch this exception and it shouldn't be aware of what email client is being used, it just needs to know that it came from one of the email clients (if in time we have more than one). In time, we may wish to extend the approach of having generic `EmailClient` exceptions and `SMSClient` exceptions as this should be the most extendable pattern and a good abstraction.	2020-12-30 17:18:16 +00:00
Rebecca Law	a2bb775b6f	Merge pull request #3069 from alphagov/add-request-id-if-in-context Pass request_id onto the task if called from a task	2020-12-23 12:26:04 +00:00
Rebecca Law	a1b31a6c20	Check for app_context and request in g to prevent Attribute Errors. We can add a request_id for tasks that are not spawned by an HTTP request, for example scheduled or nightly tasks. That means you can match up all the tasks spawned by a single task, for example, create-night-billing spawns 4 tasks, those would all have the same idea. Not sure if that is helpful or not. Also it might be confusing to have a request_id for logs that were not started from a request so I have left it out.	2020-12-23 09:47:47 +00:00
Rebecca Law	025b51c801	If the request_id exists in the Flask global context, add it to the kwargs for the task. The request_id is set is the task is created from a http request, if that task then calls through to another task this will set the request_id from the global context. We should then be able to follow the creation of a notification all the way from the original http request to the sending task.	2020-12-22 15:21:32 +00:00
David McDonald	fae7e917b5	Fix logging line to include response context We have been getting log lines of the following: `API POST request on https://api.notifications.service.gov.uk/notifications/sms/mmg failed with None` It's not clear what error caused the request to fail because the value of `api_error.response` is always `None`. There appears to be something wrong with this logging. `raise_for_status` will raise an `HTTPError`, so then there should be no reason to then pass that error into another `HTTPError` (which is causing the response to be lost). We can instead simply catch the `HTTPError` and log it's status code. This might not be perfect, but it's definitely an improvement and should give us some more context about why these requests occasionally fail.	2020-12-21 14:39:10 +00:00
Pea Tyczynska	95deb5a52f	Move DATETIME_FORMAT from app to app.utils To avoid cyclical import issues	2020-12-18 17:39:35 +00:00
Pea Tyczynska	45b806f6db	Remove unused args from cancel broadcast call in tasks	2020-12-14 11:31:05 +00:00
Pea M. Tyczynska	a70b7c521e	Merge pull request #3053 from alphagov/ibag-message-number Add sequential message number to broadcast provider messages	2020-12-09 13:02:25 +00:00
Pea Tyczynska	def7a16765	Establish relation between provider message and message number this is so we can access brodcast_provider_message_number from BroadcastProviderMessage object	2020-12-09 11:41:22 +00:00
Pea Tyczynska	8af4b27fd6	Separate functions for cbc clients Also move message_format to the clients.	2020-12-09 11:13:50 +00:00
Pea Tyczynska	553565bc91	Send message format to CBC Either cap or ibag	2020-12-08 11:15:26 +00:00
Leo Hemsted	9502f17d84	flake8 fixes a stricter flake8 bump. mostly things around f strings and format strings, but a couple of bad placeholder names in loops	2020-12-07 15:24:02 +00:00
Pea Tyczynska	2952b70930	Only create sequential numbers for Vodafone messages	2020-12-07 13:13:13 +00:00
Pea Tyczynska	e95dc9450e	Include message number in send_broadcast_provider_message	2020-12-07 13:13:12 +00:00
Pea Tyczynska	a186d2d296	Format sequential number into an 8 char long hex As per Vodafone spec for ibag format message number	2020-12-07 13:13:11 +00:00
Pea Tyczynska	b34bffaae6	Sends sequential number to Vodafone as link test	2020-12-07 13:13:11 +00:00
Leo Hemsted	fd335e3d8b	move available provider logic to the service model make sure it's in an accessible place so we don't end up duplicating our work	2020-12-03 22:50:50 +00:00
Leo Hemsted	72f8a15d4f	respect service broadcast provider restrictions when sending	2020-12-03 13:39:09 +00:00
Leo Hemsted	e2fa0116a0	add CBC_PROXY_ENABLED config flag to control if tasks are triggered previously we made some incorrect assumptions about set-up on staging and prod - they currently don't have any cbc_proxy aws creds at all. We shoudn't be attempting canaries or link tests when there's no AWS infrastructure to connect to. We also shouldn't bother writing a row into the database at all for the broadcast_provider_message since we're not even attempting to send, and we shouldn't get confused between messages that failed and messages we never wanted to send at all.	2020-11-26 10:16:22 +00:00
Leo Hemsted	54fecf2182	Merge pull request #3035 from alphagov/broadcast-event-response Send broadcast events per provider	2020-11-25 10:16:30 +00:00
David McDonald	43f1f48093	Add notification ID to SES bounce reason At the moment we log everytime we get a bounce from SES, however we don't link it to a particular notification so it's hard to know for what sub reason a notifcation did not deliver by looking at the logs. This commit changes this by now looking the bounce reason after we have found the notification ID and including them together. So if you know search for a notification ID in Kibana, you will see full logs for why it failed to deliver.	2020-11-20 14:10:13 +00:00
Leo Hemsted	087cc5053d	separate cbc proxy into separate clients this is a pretty big and convoluted refactor unfortunately. Previously: There was one global `cbc_proxy_client` object in apps. This class has the information about how to invoke the bt-ee lambda, and handles all calls to lambda. This includes calls to the canary too (which is a separate lambda). The future: There's one global `cbc_proxy_client`. This knows about the different provider functions and lambdas, and you'll need to ask this client for a proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')` and it'll return you a proxy that knows what ee's lambda function is, how to transform any content in a way that is exclusive to ee, and in future how to parse any response from ee. The present: I also cleaned up some duplicate tests. I'm really not sure about the names of some of these variables - in particular `cbc_proxy_client` isn't a client - it's more of a java style factory, where you call a function on it to get the client of your choice.	2020-11-19 15:50:37 +00:00
Leo Hemsted	0257774cfa	add get_earlier_provider_message fn to broadcast_event replacing get_earlier_provider_messages. The old function returned the previous references for earlier events for a broadcast_message. However, these depend on the message sent to a specific provider, so the function needs to change. It now takes in a provider, and only returns broadcast_provider_messages sent to that provider. If there are earlier broadcast_events without a provider_message for the chosen provider, it raises an exception - you cannot cancel a message if all the previous events have not been created properly (as we wouldn't know what references to cancel).	2020-11-19 15:50:37 +00:00
Leo Hemsted	f12c949ae9	create broadcast_provider_message and use id from that instead (instead of using the id from broadcast_event) we need every XML blob we send to have a different ID. if we're sending different XML blobs for each provider, then each one should have a different identifier. So, instead of taking the identifier from the broadcast_event, take it from the broadcast_provider_message instead. Note: We're still going to the broadcast_event for most fields, to ensure they stay consistent between different providers. The last thing we want is for different phone networks to get different content	2020-11-19 15:50:37 +00:00
Leo Hemsted	7cc83e04eb	move BroadcastProvider from models.py to config.py It's not something that is tied to a database table, and was causing circular import issues	2020-11-19 15:50:37 +00:00
Leo Hemsted	bc3512467b	send messages to multiple providers at the moment only EE is enabled (this is set in app.config, but also, only EE have a function defined for them so even if another provider was enabled without changing the dict in cbc_proxy.py we won't trigger anything). this commit just adds wrapper tasks that check what providers are enabled, and invokes the send function for each provider. The send function doesn't currently distinguish between providers for now - as we only have EE set up. in the future we'll want to separate the cbc_proxy_client into separate clients for separate providers. Different providers have different lambda functions, and have different requirements. For example, we know that the two different CBC software solutions handle references to previous messages differently.	2020-11-19 15:50:37 +00:00
David McDonald	224d9bf35a	Log when we don't retry a callback We don't retry any callbacks when it receives a 4xx status. We should probably be aware of this happening and at the moment there is nothing in our logs to easily identify whether the request failed and is being retried or if it failed and is not being retried. This will enable us to search our logs easily and figure out how much it's happening. It's quite likely that we should in the future allow callbacks to retry if they get a 429 http response (rate limiting) but we should do this in a smart way (exponential backoff) and so this is a first step to being aware of how big a problem it is in case we want to do something about it.	2020-11-17 11:26:32 +00:00
Rebecca Law	29b6f84f6c	Revert "Revert "Add a task to save-api-sms for high volume services.""	2020-10-29 11:12:46 +00:00
Toby Lorne	dd012d6831	client: cbc_proxy passes through sent/expires A BroadcastEvent knows when an event was sent and should expire We pass through these values directly to the CBC Proxy, because BroadcastEvent knows how they should be formatted Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-10-28 11:37:06 +00:00
Toby Lorne	dda71bf685	celery: add task for triggering link tests We want to periodically kick off some link tests, so that: - we are periodically communicating with the CBC Proxies - each CBC Proxy is periodically communicating with its CBC This simulation of traffic to the CBC will give us advance warning if something is going to break, or is broken, before someone tries to send a real live message Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>	2020-10-27 15:39:28 +00:00
Pea Tyczynska	5cff30aa47	Send a log message informing that we are sending canary to CBC proxy	2020-10-27 10:45:32 +00:00
Toby Lorne	be90455944	Add task to send canary to cbc proxy Create and schedule a Celery task that tests if we can send a canary message to cbc proxy. This will help us know if something happens to our connection to cbc proxy. Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>	2020-10-27 10:38:09 +00:00
Toby Lorne	d22adb7813	broadcasts: do not send description to cbc_proxy "areas" and "simple_polygons" in "transmitted_areas" do not have the same length as an example, choosing the area "england" results in a single item in "areas" but many polygons in "simple_polygons" therefore zipping these two together gives a list of areas: * of length 1 * containing only new grimsby which is not what we want as the CBC does not care about the areaDesc field within CAP, we should omit it from the function invocation and delegate the contents of areaDesc to the CBC Proxy implementation Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk> Co-authored-by: David <david.mcdonald@digital.cabinet-office.gov.uk>	2020-10-26 15:16:33 +00:00
Leo Hemsted	902a2dde94	Merge pull request #3014 from alphagov/get-query-in-50k-batches Optimise dao_get_letters_to_be_printed query	2020-10-26 13:23:17 +00:00
Toby Lorne	fdacb2e0d7	broadcasts: remove references to cbc stub We are phasing out our cbc-proxy stub which displayed CAP XML messages We are in the process of testing with real CBCs, so maintaining our own stub is not useful This commit * removes the HTTP POST requests to the CBC proxy * writes up the update/cancel methods of the cbc_client (not impl) Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-10-26 10:23:49 +00:00
Leo Hemsted	ed182c2a22	return just the columns we need for collating letters previously we were returning the entire ORM object. Returning columns has a couple of benefits: * Means we can join on to services there and then, avoiding second queries to get the crown status of the service later in the collate flow. * Massively reduces the amount of data we return - particularly free text fields like personalisation that could be potentially quite big. 5 columns rather than 26 columns. * Minor thing, but will skip some CPU cycles as sqlalchemy will no longer construct an ORM object and try and keep track of changes. We know this function doesn't change any of the values to persist them back, so this is an unnecessary step from sqlalchemy. Disadvantages are: * The dao_get_letters_to_be_printed return interface is now much more tightly coupled to the get_key_and_size_of_letters_to_be_sent_to_print function that calls it.	2020-10-23 20:01:18 +01:00
Toby Lorne	aa002afd31	clients: cbc_proxy actions accepts areas param related: https://github.com/alphagov/notifications-broadcasts-infra/pull/23 Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-10-23 17:09:00 +01:00
Leo Hemsted	4b61060d32	stream notifications when collating zip files we had issues where we had 150k 2nd class notifications, and the collate task never ran properly, presumably because the volume of data being returned was too big. to try and help with this, we can switch to streaming rather than using `.all` and building up lists of data. This should help, though the initial query may be a problem still.	2020-10-23 12:20:26 +01:00
David McDonald	3dcb97c45a	Remove 'INSOLVENCY' from zip files for insolvency letters This is at request of DVLA. They would prefer to have zip files with the same number of arguments in the name. After being offered a few different options, such as including an org and service id for all zips, they chose to just remove the 'INSOLVENCY' tag. For more context see PR that added the tag https://github.com/alphagov/notifications-api/pull/3006	2020-10-23 09:58:28 +01:00
David McDonald	890a66de46	Merge pull request #3003 from alphagov/staging-preview-cbc Non-production environments invoke CBC Proxy during broadcast event creation	2020-10-22 16:21:11 +01:00
Pea Tyczynska	deb360846d	Mark letters from insolvency service So that DVLA knows to process them separately to avoid issues.	2020-10-21 16:56:18 +01:00
Pea Tyczynska	d745ba310e	Divide letters by service when putting in ZIPs When letters are sent to DVLA, we will now put them in a separate ZIP file for each service, so that if there are printing issues due to bad files from one service, other services will hopefully not be affected by that.	2020-10-21 15:19:06 +01:00
Toby Lorne	2cc0a65851	celery: broadcast msg create invokes cbc proxy When we create a broadcast message, we should invoke the cbc proxy to send a cap message Either a function will be invoked within AWS, or a noop function call is made, depending on the environment We have only implemented CB message creation in the CBC Proxy, without polygons, therefore we: * only invoke the CBC Proxy during message creation * only send description, identifier, and hard-coded headline Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk> Co-authored-by: Katie <katie.smith@digital.cabinet-office.gov.uk>	2020-10-20 11:57:26 +01:00

1 2 3 4 5 ...

851 Commits