notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-23 08:51:30 -05:00

Author	SHA1	Message	Date
Pea Tyczynska	3037bf5fff	Set broadcast message to stubbed when posting broadcast via API	2021-02-09 10:41:36 +00:00
Pea Tyczynska	e0ddb5a39e	Merge pull request #3126 from alphagov/fix-cryptography-build-problem Pin cryptography to a version < 3.4	2021-02-08 17:25:44 +00:00
Pea Tyczynska	7cc8371c7f	Pin cryptography to a version < 3.4 One of our dependencies has a dependency on cryptography, which has recently released version 3.4. This version introduced a circular import error (pyca/cryptography#5756) which was fixed in 3.4.1. However, 3.4.1 has a different error where it fails because it cannot find a rust compiler. The suggested solutions are: Install a newer version of pip which will install a pre-compiled cryptography wheel OR Have rust installed and available on our PATH so that it can be used to build the package. Since we can't change the buildpack's pip version and we cannot install rust ourselves, the only we're left with is to avoid upgrading to 3.4 - at least until PaaS updates their python buildpacks.	2021-02-08 17:05:46 +00:00
Pea Tyczynska	f8b4c9151c	Merge pull request #3122 from alphagov/add-billing-details-orgs Add billing details for organisation	2021-02-08 16:43:08 +00:00
Leo Hemsted	6b9a50beff	Merge pull request #3125 from alphagov/revert-retry Revert cell broadcast retry logic	2021-02-08 11:16:48 +00:00
Leo Hemsted	bee0059e53	Revert "Merge pull request #3101 from alphagov/retry-broadcasts" This reverts commit `1bd99c779d`, reversing changes made to `d390eb2cac`.	2021-02-08 11:02:34 +00:00
Leo Hemsted	49e6ec1ead	Revert "Merge pull request #3123 from alphagov/retry-loop-fix" This reverts commit `541a765811`, reversing changes made to `6a9ac654a6`.	2021-02-08 11:01:33 +00:00
Pea Tyczynska	bbc8cffb5b	No need to alter the columns in services, they are already the right type	2021-02-08 10:45:28 +00:00
Chris Hill-Scott	33f93dfea2	Merge pull request #3124 from alphagov/handle-xml-with-declaration Handle XML files that have a declaration	2021-02-08 10:29:11 +00:00
Chris Hill-Scott	dec16a98f6	Handle XML files that have a declaration `lxml` wants its input in bytes: > XML is explicitly defined as a stream of bytes. It's not Unicode text. > […] rule number one: do not decode your XML data yourself. – https://lxml.de/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings It will accept strings unless, unless the document contains a declaration[1] with an `encoding` attribute. Then it will refuse to parse the document and raises a `ValueError`[2]. We can get fix this by passing `lxml` the bytes from the request, rather than the decoded text. 1. > XML documents may begin with an XML declaration that describes some > information about themselves. An example is > `<?xml version="1.0" encoding="UTF-8"?>`. – https://en.wikipedia.org/wiki/XML#XML_declaration 2. See an example of this exception being raised in production here: https://kibana.logit.io/s/9423a789-282c-4113-908d-0be3b1bc9d1d/app/kibana#/doc/logstash-*/logstash-2021.02.05/syslog?id=AXdzfZVz5ZSa5DKpJiYd&_g=()	2021-02-08 08:51:14 +00:00
Leo Hemsted	541a765811	Merge pull request #3123 from alphagov/retry-loop-fix broadcast event retry loop fix	2021-02-05 14:56:45 +00:00
Pea Tyczynska	aa7bc3d9b4	Serialise org notes and billing details	2021-02-05 14:44:43 +00:00
Pea Tyczynska	02bc87c096	Add billing details and notes for organisation table	2021-02-05 14:44:42 +00:00
Leo Hemsted	d582e35471	dont try and send broadccast event if it's already in technical-failure this gives us an option to manually set a status in the database and avoid things being stuck in a retry loop forever	2021-02-05 12:52:37 +00:00
Leo Hemsted	0ddebc63a8	reduce broadcast retry delay to 4 mins and drop prefetch. ### The facts * Celery grabs up to 10 tasks from an SQS queue by default * Each broadcast task takes a couple of seconds to execute, or double that if it has to go to the failover proxy * Broadcast tasks delay retry exponentially, up to 300 seconds. * Tasks are acknowledged when celery starts executing them. * If a task is not acknowledged before its visibility timeout of 310 seconds, sqs assumes the celery app has died, and puts it back on the queue. ### The situation A task stuck in a retry loop was reaching its visbility timeout, and as such SQS was duplicating it. We're unsure of the exact cause of reaching its visibility timeout, but there were two contributing factors: The celery prefetch and the delay of 300 seconds. Essentially, celery grabs the task, keeps an eye on it locally while waiting for the delay ETA to come round, then gives the task to a worker to do. However, that worker might already have up to ten tasks that it's grabbed from SQS. This means the worker only has 10 seconds to get through all those tasks and start working on the delayed task, before SQS moves the task back into available. (Note that the delay of 300 seconds is translated into a timestamp based on the time you called self.retry and put the task back on the queue. Whereas the visibility timeout starts ticking from the time that a celery worker picked up the task.) ### The fix #### Set the max retry delay for broadcast tasks to 240 seconds Setting the max delay to 240 seconds means that instead of a 10 second buffer before the visibility timeout is tripped, we've got a 70 second buffer. #### Set the prefetch limit to 1 for broadcast workers This means that each worker will have up to 1 currently executing task, and 1 task pending execution. If it has these, it won't grab any more off the queue, so they can sit there without their visibility timeout ticking up. Setting a prefetch limit to 1 will result in more queries to SQS and a lower throughput. This might be relevant in, eg, sending emails. But the broadcast worker is not hyper-time critical. https://docs.celeryproject.org/en/3.1/getting-started/brokers/sqs.html?highlight=acknowledge#caveats https://docs.celeryproject.org/en/3.1/userguide/optimizing.html?highlight=prefetch#reserve-one-task-at-a-time	2021-02-05 12:49:51 +00:00
Leo Hemsted	6a9ac654a6	Merge pull request #3121 from alphagov/dont-update-finishes-at dont update finishes_at when cancelling broadcast	2021-02-04 14:37:49 +00:00
Leo Hemsted	eff0119f5c	dont update finishes at when cancelling broadcast otherwise we run into issues where we dont issue the cancel as we say "oh look the expiry time just passed, so we shouldnt send this message as it's already been removed from the cbc".	2021-02-04 14:25:38 +00:00
Leo Hemsted	1bd99c779d	Merge pull request #3101 from alphagov/retry-broadcasts retry sending broadcasts	2021-02-04 12:01:06 +00:00
Leo Hemsted	1ef3f96bd7	test sending broadcast message for all statuses of existing provider_msg also clean up some comments	2021-02-04 11:50:06 +00:00
Leo Hemsted	e9f9fe8101	be stricter on broadcast message area validation even if there is a json struct, make sure it actually contains polygons, or we'll send to the entire country.	2021-02-03 18:11:54 +00:00
Leo Hemsted	bbae209200	check provider message status etc when sending rather than when retrying previously if we were deciding whether to retry or not, it meant that future events wouldn't have context of what the task is doing. We'd run into issues with not knowing what references to include when updating/cancelling in future events. Instead of deciding whether to retry or not, always retry. Instead, when any event sends, regardless of whether it is a first time or a retry, check the status of previous events for that broadcast message. There are a few things that will mean we don't send. * If the finishes_at time has already elapsed (ie: we have been trying to resend this message and haven't had any luck and now the data is obselete) * A previous event has no provider message (this means that we never picked the previous event off the queue for some reason) * A previous event has a provider message that has anything other than an ack response. This includes sending (the old message is currently being sent), and technical-failure/returned-error (the old message is currently in the retry loop, having experienced issues).	2021-02-03 18:11:52 +00:00
Leo Hemsted	96a0935d1c	update broadcast provider message status on success/error so we can distinguish errorring messages that are currently retrying from those that sent succesfully.	2021-02-03 18:03:16 +00:00
Leo Hemsted	3dcbfc3612	re-use existing provider message if task retries previously it would crash with a unique constraint error. now, grab the previous message.	2021-02-03 18:01:54 +00:00
Leo Hemsted	ac34fb9c05	retry sending broadcasts Retry tasks if they fail to send a broadcast event. Note that each task tries the regular proxy and the failover proxy for that provider. This runs a bit differently than our other retries: Retry with exponential backoff. Our other tasks retry with a fixed delay of 5 minutes between tries. If we can't send a broadcast, we want to try immediately. So instead, implement an exponential backoff (1, 2, 4, 8, ... seconds delay). We can't delay for longer than 310 seconds due to visibility timeout settings in SQS, so cap the delay at that amount. Normally we give up retrying after a set amount of retries (often 4 hours). As broadcast content is much more important than normal notifications, we don't ever want to give up on sending them to phones... ...UNLESS WE DO! Sometimes we do want to give up sending a broadcast though! Broadcasts have an expiry time, when they stop showing up on peoples devices, so if that has passed then we don't need to send the broadcast out. Broadcast events can also be superceded by updates or cancels. Check that the event is the most recent event for that broadcast message, if not, give up, as we don't want to accidentally send out two conflicting events for the same message.	2021-02-03 16:43:01 +00:00
David McDonald	d390eb2cac	Merge pull request #3112 from alphagov/channel-restriction Set broadcast channel as a service setting	2021-02-03 11:46:04 +00:00
David McDonald	f441d5b4ce	Add comment about service channels for updating	2021-02-03 11:37:02 +00:00
David McDonald	41b54b81ee	Merge pull request #3116 from alphagov/less-emails Downgrade exceptions to warnings to reduce emails	2021-02-02 15:21:34 +00:00
David McDonald	070b79c27e	Downgrade exceptions to warnings to reduce emails We already trigger a zendesk ticket for these two cases, meaning that whenever we get this situation, we get 3 emails. One for the zendesk ticket, one from logit raising the fact an exception was raised and one from cloudwatch raising the fact an exception was raised. We don't need all these emails, a zendesk ticket is sufficient. Downgrading to a warning means this event will still be findable in our logs however.	2021-02-02 15:10:26 +00:00
David McDonald	f90b479c8d	Use service setting to pick broadcast channel This falls back to the "test" channel if they do not have a ServiceBroadcastSetting for the moment, but we intend in future PRs to enforce that all broadcast services will have this property.	2021-02-01 14:10:41 +00:00
David McDonald	b2ed9efe85	Extend test cases for every MNO Seems to be no harm in extending these for every mobile network just to give us slightly better coverage	2021-02-01 14:10:40 +00:00
David McDonald	a46b8c3bba	Remove redundant comment	2021-02-01 14:10:39 +00:00
David McDonald	2aad3163e6	Allow CBC proxy client to take channel This moves the hardcoding to test channels one step up to where we call `create_and_send_broadcast` We can then after this, start to differ whether we give it the 'test' or 'severe' channel based on the services channel setting.	2021-02-01 14:10:38 +00:00
David McDonald	91f5be835a	Add DB table for service broadcast settings This will allow us to store details of which channel a service should be sending to. See the comment about how all broadcast services can have a row in the table but may not at the moment. This has been done for speed as it's the quickest way to let us set up different services to send to different channels for some needed testing with the mobile handset providers in the coming week.	2021-02-01 14:10:37 +00:00
David McDonald	a3d966056a	Merge pull request #3110 from alphagov/test-channel Set the default broadcast channel to test	2021-02-01 10:18:35 +00:00
Katie Smith	a31cdb44a6	Merge pull request #3113 from alphagov/bump-utils-43.8.1 Bump utils to 43.8.1	2021-02-01 09:06:39 +00:00
Pea Tyczynska	df4ba22912	Merge pull request #3114 from alphagov/preview-wont-stub Don't stub broadcasts on preview	2021-01-29 16:00:11 +00:00
Katie Smith	fc9ecaba1d	Bump utils to 43.8.1 This brings in the change to stop TV numbers from being treated as international numbers.	2021-01-29 15:53:35 +00:00
Pea Tyczynska	552e543bc2	Don't stub broadcasts on preview So that MNOs can use training mode accounts to test end-to-end broadcast sending. This will enable them to approve their own broadcasts.	2021-01-29 15:49:50 +00:00
David McDonald	86ea89cf76	Merge pull request #3098 from alphagov/downgrade-to-warning Downgrade SMS provider request exceptions to warnings	2021-01-29 11:52:10 +00:00
Katie Smith	7b60aeb14d	Merge pull request #3109 from alphagov/letter-rates Add February 2021 letter rates	2021-01-29 10:28:16 +00:00
Katie Smith	d7387869c4	Add February 2021 letter rates All rates are changing, so we add an end date for the current rates and insert new rates for every post_class, sheet count and crown status.	2021-01-28 14:35:21 +00:00
Chris Hill-Scott	837e464081	Merge pull request #3060 from alphagov/add-broadcast-api-endpoint Add public API endpoint to create broadcast messages	2021-01-28 12:59:41 +00:00
Pea Tyczynska	51c0ece130	Merge pull request #3108 from alphagov/stub-training-broadcasts Stub training broadcasts	2021-01-28 11:58:47 +00:00
Katie Smith	4ed79dae48	Merge pull request #3105 from alphagov/rename-bt-ee Rename bt-ee-proxy to ee-proxy	2021-01-27 17:05:26 +00:00
David McDonald	f9b1d3d573	Set the default broadcast channel to test This used to be hardcoded in the CBC proxy but now we will hardcode it in the cbc_proxy_client. In a future PR we can start choosing which channel a broadcast will go to based on the channel configured for that broadcast service.	2021-01-27 15:27:11 +00:00
Pea Tyczynska	d4cc250510	Don't create broadcast provider messages for stubbed broadcasts	2021-01-27 10:20:44 +00:00
Pea Tyczynska	26d6b4a958	Mark broadcast message as stubbed when sent from training account	2021-01-27 10:20:43 +00:00
Pea Tyczynska	a93a35de8d	Add 'stubbed' column to broadcast_message table This is a boolean column. It will be set to True for broadcasts created from training broadcast accounts. This will help us debug, for example by excluding all the stubbed broadcasts when we have some trouble with real broadcasts.	2021-01-27 10:20:43 +00:00
Chris Hill-Scott	ca6c46c4dd	Add logging for succesful broadcast message creation	2021-01-26 16:24:45 +00:00
Chris Hill-Scott	0398ac57f1	Use correct HTTP status code for bad content type 415 is the status code for ‘Unsupported media type’ https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/415	2021-01-26 16:24:45 +00:00

1 2 3 4 5 ...

7896 Commits