notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2026-05-19 16:20:55 -04:00

Author	SHA1	Message	Date
Rebecca Law	933bad857a	Merge pull request #3146 from alphagov/use-number-for-international-text Send text messages from a number for international	2021-02-17 13:31:42 +00:00
Rebecca Law	77b76ea0a4	Rename variable, it's a better name now.	2021-02-17 13:15:29 +00:00
Rebecca Law	e77534fb17	Send text message that are to an international number from a number rather than "Notify" Update `send_user_2fa_code` to send from number when recipient is international Update `update_user_attribute` to send from number when recipient is international	2021-02-17 12:14:47 +00:00
Chris Hill-Scott	8e8601338e	Merge pull request #3136 from alphagov/validate-template-length-broadcast-api Validate content length on broadcast API	2021-02-17 11:34:29 +00:00
David McDonald	abb3b3307c	Fix flake8	2021-02-16 10:31:12 +00:00
David McDonald	6fcda6debb	Make set_as_broadcast_service use a single DB commit We don't want things in a half state if there is an error during the method. Therefore, we move it all into a single function that is wrapped in a transaction. Note, we copy the approach of https://github.com/alphagov/notifications-api/blob/master/app/dao/services_dao.py#L293 by having a single new dao function that does all the DB work.	2021-02-16 10:31:11 +00:00
David McDonald	f9c87bafa3	Add `go_live_at` timestamp to set_as_broadcast_service Note, I haven't added anything for the `go_live_user` because it doesn't quite make sense because here a user isn't requesting to go live. So there should be no reason to record this. We will in time though want to add audit events to capture every change to the service broadcast settings, that will actually capture who has done what.	2021-02-16 10:31:10 +00:00
David McDonald	42163813fe	Hardcode service broadcast channel that API shows We are in a weird situation where at the moment, we have services with the broadcast permission that do not have a row in the service_broadcast_settings table and therefore do not have defined whether they should send messages on the 'test' or 'severe' channel. We currently get around this when we send broadcast messages out as such: https://github.com/alphagov/notifications-api/blob/master/app/celery/broadcast_message_tasks.py#L51 We need to something equivalent for the broadcast channel that the API says the service is on. In time, when we have added a row in the service_broadcast_settings table for every service with the broadcast permission then we can remove both of these two hardcodings. Note, one option would have been to move the default of `test` on to the `Service` model rather than having it in both the broadcast_message_tasks file and the `ServiceSchema` class. However, I went for the quickest thing which was to add it here.	2021-02-16 10:31:09 +00:00
David McDonald	d846ed79d2	Improve tests and remove unneeded code Some of the fixtures weren't needed so have been removed. I've also moved from using `client.post` to using `admin_request.post` which saves a bit of code too. Also one small assertion tidied up to make it a bit stronger regarding permissions.	2021-02-16 10:31:09 +00:00
David McDonald	4f7afa3fbe	Set provider restriction	2021-02-16 10:31:08 +00:00
David McDonald	cb70b81ea4	make service live or training	2021-02-16 10:31:07 +00:00
David McDonald	9f4b82f074	Make service a member of the broadcast organisation We will use this to easily identify all our broadcast services. There could be other ways to deal with finding and seeing all broadcast services but this is a good and easy way to start.	2021-02-16 10:31:06 +00:00
David McDonald	cdcbd1e238	Set count as live to false for broadcast services We think it would be a security risk to show the name of services involved in emergency alerts as they be responsible for things such as counter terrorism. On top of that, showing broadcast services in the list of all services could enable someone to use that information to try and trick an admin into letting them access of a particular service given the fact they know the name of it	2021-02-16 10:31:05 +00:00
David McDonald	54b9d20f73	Give broadcast permission to broadcast services	2021-02-16 10:31:04 +00:00
David McDonald	3f16549f64	Use `sample_broadcast_service` for update test We can use the `sample_broadcast_service` as this gives us a broadcast service with service broadcast settings already for us to update rather than needing to create our own settings db row	2021-02-16 10:31:03 +00:00
David McDonald	16ee040923	Add service broadcast settings to `sample_broadcast_service` This will make our `sample_broadcast_service` look more realistic. The one downside to this, is that there will be a short amount of time where we have broadcast services that do not have a row in the service_broadcast_settings table until we have backfilled this data. Our unit tests therefore won't have coverage for this case but I think the risk is small and acceptable for the moment as this will no longer be the case in say a weeks time.	2021-02-16 10:31:02 +00:00
David McDonald	3b5d86c854	Add endpoint to set broadcast service channel	2021-02-16 10:31:01 +00:00
David McDonald	5d62647b9d	Add broadcast channel to service schema This will show which channel is configured, if any, for a service. It mimics what we are doing for the `allowed_broadcast_provider`.	2021-02-16 10:31:00 +00:00
Rebecca Law	dda7f0d47f	Revert "Improve sender task"	2021-02-16 10:19:53 +00:00
Chris Hill-Scott	0bb671df45	Validate content length on broadcast API The maximum content count of a broadcast varies depending on its encoding, so we can’t simply validate it against a schema. This commit moves to using the validation from `notifications-utils`, and raising a custom error response.	2021-02-16 09:30:40 +00:00
Katie Smith	8e91eccc94	Merge pull request #3140 from alphagov/fix-flake8 Fix flake8	2021-02-16 09:28:23 +00:00
Rebecca Law	681ad6db56	Merge pull request #3134 from alphagov/improve-sender-task Improve sender task	2021-02-16 09:26:04 +00:00
Katie Smith	6b8ebb3421	Fix linting errors	2021-02-16 09:03:38 +00:00
Rebecca Law	d67f9fcfd6	Use from_id_and_service_id method from SerialisedTemplate. Minor updates as per requested from review	2021-02-15 12:41:50 +00:00
Leo Hemsted	fed0d4c40e	Merge pull request #3137 from alphagov/revert-revert-revert Bring back retry logic	2021-02-15 12:21:13 +00:00
Chris Hill-Scott	e8a79f5413	Don’t accept cancel or update via broadcast API We don’t support these methods at the moment. Instead we were just ignoring the `msgType` field, so issuing one of these commands would cause a new alert to be broadcast 🙃 We might want to support `Cancel` in the future, but for now let’s reject anything that isn’t `Alert` (CAP terminology for the initial broadcast).	2021-02-15 09:32:33 +00:00
Leo Hemsted	62cf9f60a9	catch boto exceptions these will happen if, for example, you have issues connecting to AWS or permission issues. Still failover if we get one of these exceptions, as I think it might be possible to have a problem only related to one of the lambdas.	2021-02-12 19:48:32 +00:00
David McDonald	a1e539e785	Merge pull request #3132 from alphagov/created-letters-runbook Improvements to our letter checking tasks	2021-02-12 16:30:42 +00:00
Rebecca Law	61af203ad6	User cache for service in send_to_provider methods. This will remove a call to the db if the service exists in the cache.	2021-02-11 16:45:52 +00:00
Rebecca Law	200f8aad81	Use the cached template object. By adding SerialisedTemplate we can avoid a database call for the template. This is useful when sending many many emails/sms for the same template/version.	2021-02-11 16:45:52 +00:00
Rebecca Law	2270832873	Remove validate_and_format for email and phone numbers, using normalised_to instead because that function has already happened when persisting the notificaiton. Remove 2 extra select queries after the update and commit. Once a transaction is committed SQLAlchemy will query for the db model if referenced after a commit.	2021-02-11 16:45:46 +00:00
David McDonald	5f58f6c698	Merge pull request #3128 from alphagov/move-provider-restriction Move provider restriction into broadcast service settings table	2021-02-10 17:32:11 +00:00
Katie Smith	4be3af5410	Merge pull request #3119 from alphagov/new-callbacks-q Put service callback retries on a different queue	2021-02-10 15:56:46 +00:00
David McDonald	5526c89c34	Rename task and function for clarity This doesn't just relate to precompiled letters, it's actually just checking that there are not any letters still waiting for a virus check that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:53 +00:00
David McDonald	1b9d8252ec	Rename task and function for clarity This doesn't just relate to templated letters, it's actually just checking that there are not any letters still in created that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:52 +00:00
David McDonald	3c0e609cc9	Add link to runbook for created letter alert We've got the entry in the runbook, this will make it clear to go and look at it.	2021-02-10 15:23:51 +00:00
Rebecca Law	87cf3afdc9	Update notifications-utils version. Postal address validation now includes `< >` in the invalid characters allowed at the start of an address line.	2021-02-10 10:26:00 +00:00
Leo Hemsted	4f89be6944	Revert "Merge pull request #3125 from alphagov/revert-retry" This reverts commit `6b9a50beff`, reversing changes made to `33f93dfea2`.	2021-02-09 17:01:04 +00:00
David McDonald	b2213dad19	Move provider restriction into broadcast settings This means we will have a much easier way of knowing what the settings are for a broadcast service. Note, we can just move data directly into the newer table as there is nothing on the API or admin app that is putting data in the `service_broadcast_provider_restriction` table, this was being done manually for the few services that needed it.	2021-02-09 15:40:32 +00:00
Katie Smith	5eebcf6452	Put service callback retries on a different queue At the moment, if a service callback fails, it will get put on the retry queue. This causes a potential problem though: If a service's callback server goes down, we may generate a lot of retries and this may then put a lot of items on the retry queue. The retry queue is also responsible for other important parts of Notify such as retrying message delivery and we don't want a service's callback server going down to have an impact on the rest of Notify. Putting the retries on a different queue means that tasks get processed faster than if they were put back on the same 'service-callbacks' queue.	2021-02-09 13:31:16 +00:00
Pea Tyczynska	3037bf5fff	Set broadcast message to stubbed when posting broadcast via API	2021-02-09 10:41:36 +00:00
Pea Tyczynska	f8b4c9151c	Merge pull request #3122 from alphagov/add-billing-details-orgs Add billing details for organisation	2021-02-08 16:43:08 +00:00
Leo Hemsted	bee0059e53	Revert "Merge pull request #3101 from alphagov/retry-broadcasts" This reverts commit `1bd99c779d`, reversing changes made to `d390eb2cac`.	2021-02-08 11:02:34 +00:00
Leo Hemsted	49e6ec1ead	Revert "Merge pull request #3123 from alphagov/retry-loop-fix" This reverts commit `541a765811`, reversing changes made to `6a9ac654a6`.	2021-02-08 11:01:33 +00:00
Chris Hill-Scott	dec16a98f6	Handle XML files that have a declaration `lxml` wants its input in bytes: > XML is explicitly defined as a stream of bytes. It's not Unicode text. > […] rule number one: do not decode your XML data yourself. – https://lxml.de/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings It will accept strings unless, unless the document contains a declaration[1] with an `encoding` attribute. Then it will refuse to parse the document and raises a `ValueError`[2]. We can get fix this by passing `lxml` the bytes from the request, rather than the decoded text. 1. > XML documents may begin with an XML declaration that describes some > information about themselves. An example is > `<?xml version="1.0" encoding="UTF-8"?>`. – https://en.wikipedia.org/wiki/XML#XML_declaration 2. See an example of this exception being raised in production here: https://kibana.logit.io/s/9423a789-282c-4113-908d-0be3b1bc9d1d/app/kibana#/doc/logstash-*/logstash-2021.02.05/syslog?id=AXdzfZVz5ZSa5DKpJiYd&_g=()	2021-02-08 08:51:14 +00:00
Pea Tyczynska	aa7bc3d9b4	Serialise org notes and billing details	2021-02-05 14:44:43 +00:00
Leo Hemsted	d582e35471	dont try and send broadccast event if it's already in technical-failure this gives us an option to manually set a status in the database and avoid things being stuck in a retry loop forever	2021-02-05 12:52:37 +00:00
Leo Hemsted	0ddebc63a8	reduce broadcast retry delay to 4 mins and drop prefetch. ### The facts * Celery grabs up to 10 tasks from an SQS queue by default * Each broadcast task takes a couple of seconds to execute, or double that if it has to go to the failover proxy * Broadcast tasks delay retry exponentially, up to 300 seconds. * Tasks are acknowledged when celery starts executing them. * If a task is not acknowledged before its visibility timeout of 310 seconds, sqs assumes the celery app has died, and puts it back on the queue. ### The situation A task stuck in a retry loop was reaching its visbility timeout, and as such SQS was duplicating it. We're unsure of the exact cause of reaching its visibility timeout, but there were two contributing factors: The celery prefetch and the delay of 300 seconds. Essentially, celery grabs the task, keeps an eye on it locally while waiting for the delay ETA to come round, then gives the task to a worker to do. However, that worker might already have up to ten tasks that it's grabbed from SQS. This means the worker only has 10 seconds to get through all those tasks and start working on the delayed task, before SQS moves the task back into available. (Note that the delay of 300 seconds is translated into a timestamp based on the time you called self.retry and put the task back on the queue. Whereas the visibility timeout starts ticking from the time that a celery worker picked up the task.) ### The fix #### Set the max retry delay for broadcast tasks to 240 seconds Setting the max delay to 240 seconds means that instead of a 10 second buffer before the visibility timeout is tripped, we've got a 70 second buffer. #### Set the prefetch limit to 1 for broadcast workers This means that each worker will have up to 1 currently executing task, and 1 task pending execution. If it has these, it won't grab any more off the queue, so they can sit there without their visibility timeout ticking up. Setting a prefetch limit to 1 will result in more queries to SQS and a lower throughput. This might be relevant in, eg, sending emails. But the broadcast worker is not hyper-time critical. https://docs.celeryproject.org/en/3.1/getting-started/brokers/sqs.html?highlight=acknowledge#caveats https://docs.celeryproject.org/en/3.1/userguide/optimizing.html?highlight=prefetch#reserve-one-task-at-a-time	2021-02-05 12:49:51 +00:00
Leo Hemsted	eff0119f5c	dont update finishes at when cancelling broadcast otherwise we run into issues where we dont issue the cancel as we say "oh look the expiry time just passed, so we shouldnt send this message as it's already been removed from the cbc".	2021-02-04 14:25:38 +00:00
Leo Hemsted	1ef3f96bd7	test sending broadcast message for all statuses of existing provider_msg also clean up some comments	2021-02-04 11:50:06 +00:00

1 2 3 4 5 ...

3817 Commits