notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-12 08:12:27 -05:00

Author	SHA1	Message	Date
David McDonald	1b9d8252ec	Rename task and function for clarity This doesn't just relate to templated letters, it's actually just checking that there are not any letters still in created that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:52 +00:00
David McDonald	3c0e609cc9	Add link to runbook for created letter alert We've got the entry in the runbook, this will make it clear to go and look at it.	2021-02-10 15:23:51 +00:00
David McDonald	070b79c27e	Downgrade exceptions to warnings to reduce emails We already trigger a zendesk ticket for these two cases, meaning that whenever we get this situation, we get 3 emails. One for the zendesk ticket, one from logit raising the fact an exception was raised and one from cloudwatch raising the fact an exception was raised. We don't need all these emails, a zendesk ticket is sufficient. Downgrading to a warning means this event will still be findable in our logs however.	2021-02-02 15:10:26 +00:00
David McDonald	20627d96ea	Put all broadcast tasks on the broadcast worker	2021-01-13 17:21:40 +00:00
Leo Hemsted	e2fa0116a0	add CBC_PROXY_ENABLED config flag to control if tasks are triggered previously we made some incorrect assumptions about set-up on staging and prod - they currently don't have any cbc_proxy aws creds at all. We shoudn't be attempting canaries or link tests when there's no AWS infrastructure to connect to. We also shouldn't bother writing a row into the database at all for the broadcast_provider_message since we're not even attempting to send, and we shouldn't get confused between messages that failed and messages we never wanted to send at all.	2020-11-26 10:16:22 +00:00
Leo Hemsted	087cc5053d	separate cbc proxy into separate clients this is a pretty big and convoluted refactor unfortunately. Previously: There was one global `cbc_proxy_client` object in apps. This class has the information about how to invoke the bt-ee lambda, and handles all calls to lambda. This includes calls to the canary too (which is a separate lambda). The future: There's one global `cbc_proxy_client`. This knows about the different provider functions and lambdas, and you'll need to ask this client for a proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')` and it'll return you a proxy that knows what ee's lambda function is, how to transform any content in a way that is exclusive to ee, and in future how to parse any response from ee. The present: I also cleaned up some duplicate tests. I'm really not sure about the names of some of these variables - in particular `cbc_proxy_client` isn't a client - it's more of a java style factory, where you call a function on it to get the client of your choice.	2020-11-19 15:50:37 +00:00
Leo Hemsted	bc3512467b	send messages to multiple providers at the moment only EE is enabled (this is set in app.config, but also, only EE have a function defined for them so even if another provider was enabled without changing the dict in cbc_proxy.py we won't trigger anything). this commit just adds wrapper tasks that check what providers are enabled, and invokes the send function for each provider. The send function doesn't currently distinguish between providers for now - as we only have EE set up. in the future we'll want to separate the cbc_proxy_client into separate clients for separate providers. Different providers have different lambda functions, and have different requirements. For example, we know that the two different CBC software solutions handle references to previous messages differently.	2020-11-19 15:50:37 +00:00
Toby Lorne	dda71bf685	celery: add task for triggering link tests We want to periodically kick off some link tests, so that: - we are periodically communicating with the CBC Proxies - each CBC Proxy is periodically communicating with its CBC This simulation of traffic to the CBC will give us advance warning if something is going to break, or is broken, before someone tries to send a real live message Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>	2020-10-27 15:39:28 +00:00
Pea Tyczynska	5cff30aa47	Send a log message informing that we are sending canary to CBC proxy	2020-10-27 10:45:32 +00:00
Toby Lorne	be90455944	Add task to send canary to cbc proxy Create and schedule a Celery task that tests if we can send a canary message to cbc proxy. This will help us know if something happens to our connection to cbc proxy. Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>	2020-10-27 10:38:09 +00:00
Chris Hill-Scott	f2314333b5	Merge pull request #2982 from alphagov/improve-efficiency-of-process-missing-rows Improve efficiency of process missing rows task	2020-10-02 11:23:35 +01:00
Chris Hill-Scott	1d50bfaafc	Remove unused column from query	2020-09-26 12:11:15 +01:00
Chris Hill-Scott	9ad33045dd	Improve efficiency of process missing rows task For every missing row this was: - downloading the CSV file from S3 - looping through every row in it until it found the one matching the index of the missing row `RecipientCSV` implements `__getitem__`[1] (which maybe it didn’t before) so we can create it once, then index the relevant row directly. *** 1. `5ae0572d41/notifications_utils/recipients.py (L78-L79)`	2020-09-26 12:06:48 +01:00
Rebecca Law	d7e53cdf50	Adding reference to message for "precompiled letters have been pending-virus-check for over 90 minutes" This saves having to go to the db to get it.	2020-09-24 14:16:16 +01:00
Rebecca Law	dd126df122	We have a scheduled task to check that all the jobs have completed, this will catch if an app is shut down and the job is complete yet, we only wait 10 seconds before forcing the app to shut down. The task was raising a JobIncompleteError, yet it's not an error the task is performing it's task correctly and calling the appropriate task to restart the job. Also used apply_sync to create the task instead of send_task.	2020-07-22 17:00:20 +01:00
Pea Tyczynska	5d6f2da155	Rename task from create_letters_pdf to get_pdf_for_templated_letter In a separate PR we will have to delete vestigial create_letters_pdf tasks that now only redirects to get_pdf_for_templated_letter.	2020-05-11 13:33:05 +01:00
Leo Hemsted	d59b2f4cc9	make create_letters_pdf always use right queue create-letters-pdf-tasks is for contacting template preview. letters-tasks is for doing other things around letters (including putting things on template preview queue)	2020-04-22 16:07:51 +01:00
Katie Smith	bfd40a843c	Delete send-scheduled-notifications task This was added a few years ago, but never used.	2020-04-02 14:45:18 +01:00
David McDonald	87cb02fa1d	Add runbook link to resolve letters pending virus scan	2020-03-30 12:07:05 +01:00
Rebecca Law	51f43563d3	Fix serialisation error when creating the `create_letters_pdf` in `resend_created_notifications_older_than`	2020-03-17 08:48:02 +00:00
David McDonald	3a0aece6a1	Up threshold for sms to telephone numbers We were just ignoring the errors and our users were not fixing things. Given that 500 texts cost approx £8 it's not the end of the world. In the long run we may decide to just stop letting people try and send messages to TV numbers but this is a quick fix to stop emails coming in which we ignore.	2020-01-17 13:26:20 +00:00
David McDonald	61f1469a20	Improve developer experience for zendesk ticket We don't need to log this as an exception. It's not an exception, it's behaviour that is not ideal but is still expected so therefore I've changed it to warn. Also it removes the email we get for the exception which is not needed as we get the zendesk ticket instead. I've also fixed the multiline string meaning the link to the runbook is included in the zendesk ticket.	2019-12-17 11:47:23 +00:00
Leo Hemsted	31d1abd6d1	add task to move sms providers back towards shared load we generally aim to share the load between the two providers equally (more or less). When one provider has struggled, we deprioritise them, this commit adds a function that gradually restores balance. It checks every five minutes, if it's been more than an hour since the providers were last changed then it adjusts them towards a 50/50 split. Except it's not quite 50/50 due to #reasons (we want to slightly favour MMG), it's actually 60/40. That's defined in a new dict in config.py.	2019-12-13 10:02:39 +00:00
Pea Tyczynska	c00f82b81b	Co-Authored-By: Chris Hill-Scott <me@quis.cc> Use .format instead of concatenation to avoid type issues Trying to concatenate uuid onto a string was throwing an error. Also it is not possible to use uuid in parametrize statements it seems as it messes up with running tests on multiple threads	2019-12-11 11:18:42 +00:00
Pea Tyczynska	87bc86efa7	Reference dev runbook for instructions in the zendesk ticket	2019-12-06 17:05:43 +00:00
Pea Tyczynska	1b7b26bf24	Query directly for services with high failure rate	2019-12-06 16:57:56 +00:00
Pea Tyczynska	b8de67ae54	Update error message to include a url to offending service	2019-12-06 16:57:54 +00:00
Pea Tyczynska	cfbb080f57	Simplify failure rate by building separate query	2019-12-06 16:57:44 +00:00
Pea Tyczynska	53efd87e28	Check for services sending sms messages to tv numbers	2019-12-06 16:57:34 +00:00
Pea Tyczynska	d72ab4f4a6	Send zendesk ticket when services found with high failure rates	2019-12-06 16:57:04 +00:00
Leo Hemsted	f7fbd6de5b	make 500s change priorities quicker it's not acceptable for a constantly failing provider to take 50 minutes to drain (5x reducing priority by 10). But similarly, we need _some_ delay, or a handful of concurrent failures will completely turn off a provider, rendering the whole excercise kinda pointless. Setting the delay before it tries to reduce priority again to one minute is nice because it means that if one request times out and returns 502, then any other requests that are in flight at that time will time out before the one minute is up and not switch, but any requests made after the switch that take sixty seconds to time out will affect it.	2019-11-28 13:29:39 +00:00
Leo Hemsted	cfe82f8f4a	make 500 error provider switches also check for recent changes moving the logic and the test from switch provider on slow delivery to dao reduce sms provider priority	2019-11-28 13:29:39 +00:00
Leo Hemsted	2a392e7137	update switch provider scheduled task it now looks at both providers and works out whether to deprioritise one, rather than binary switching from one to the other. If anything has altered the priorities in the last ten minutes it won't take any action. If both providers are slow it also won't take any action.	2019-11-28 13:29:38 +00:00
Leo Hemsted	fa7e0a1e84	add dao_reduce_sms_provider_priority function retrive the sms providers from the DB, and decrease the chosen provider's priority by 10, while increasing the other by 10. add a check in to ensure we never decrease below 0 or increase above 100 - this is per provider, we don't check that the two add up to 100 or anything. If the values are outside of this range (eg: set via the UI) then they'll probably* fix themselves at some point - we've added tests to document these cases. Use with_for_update to ensure that the method can only run once at a time - other invocations of the function will be held on that line until the currently running one ends and commits the transaction. This doesn't affect anyone doing things from the UI.	2019-11-28 13:29:01 +00:00
Leo Hemsted	6f38cbbcf1	randomly choose from providers based on priority todo: make sure if they don't add up to 100 we do something sensible, especially if they're both 0.	2019-11-28 13:29:01 +00:00
Rebecca Law	4fd6f33af2	Merge pull request #2658 from alphagov/fix-letters-in-created-status Alert if a letter doesn't make it past created status	2019-11-27 13:38:51 +00:00
Rebecca Law	853df6fbfb	Fix reference to old time frame for task.	2019-11-27 13:26:53 +00:00
Rebecca Law	e0b4b258aa	Shortened the length of time to check for messages with the wrong state. There is a chance that the there is an outstanding retry task that has yet to run but the task that are replayed here protect against the task running twice. So this just means it might get sent sooner than later.	2019-11-21 15:51:27 +00:00
Rebecca Law	ac4f0e8027	After a comment from @idavidmcdonald, I asked myself why are not creating the task to upload the pdf and update the notification. The assumption was that S3 would throw an exception if the object was uploaded twice. That's not the case the default behaviour is that if a file already exists it will be overwritten. So it is completely safe to run the task from the alert. It can also mean that we don't need to wait 4hours 15 minutes. Shall I decease the amount of time before restarting the task?	2019-11-19 16:04:21 +00:00
Rebecca Law	918975b0a6	Use sender_id from CSV metadata. When we upload a CSV for a job, we add the sender_id as metadata to the file that is uploaded on S3. There is more than one place where we process rows from that CSV. - process_job - scheduled_job - check_for_missing_rows_in_completed_jobs - check_job_status All of these places need to use the sender_id, now the sender_id is always read from the file metadata. In a subsequent PR we can remove the optional sender_id parameter from process_job task.	2019-11-15 15:42:29 +00:00
Rebecca Law	6155f7666e	Testing with latest	2019-11-15 15:42:24 +00:00
Rebecca Law	516190262a	[WIP]	2019-11-15 15:41:27 +00:00
Rebecca Law	c42420c329	Add an alert when a letter is created but doesn't have a file in S3 for sending. We can tell this is the case because there is no updated_at and billable units are still 0. At this point we are just creating a zendesk ticket - perhaps we can just call the create_letter_pdf task.	2019-11-13 16:39:59 +00:00
Rebecca Law	5aaf5cd588	Add the missing format for the log message when a missing row is processed.	2019-11-07 15:01:23 +00:00
Rebecca Law	559faf3034	Fix the query. Missing the where clause to join the two tables.... OOPS	2019-11-07 10:57:31 +00:00
Rebecca Law	db5a50c5a7	Adding a scheduled task to processing missing rows from job Sometimes a job finishes but has missed a row in the middle. It is a mystery why this is happening, it could be that the task to save the notifications has been dropped. So until we solve the missing let's find missing rows and process them. A new scheduled task has been added to find any "finished" jobs that do not have enough notifications created. If there are missing notifications the job processes those rows for the job. Adding the new task to beat schedule will be done in the next commit. A unique key constraint has been added to Notifications to ensure that the row is not added twice. Any index or constraint can affect performance, but this unique constraint should not affect it enough for us to notice.	2019-11-06 10:49:46 +00:00
Leo Hemsted	975af113e4	Merge pull request #2639 from alphagov/remove-loadtesting-db-migration remove loadtesting from the database	2019-11-06 10:49:46 +00:00
Katie Smith	a790acc091	Create a Zendesk ticket for letters in the wrong state This creates a Zendesk ticket if either the `check_precompiled_letter_state` or `check_templated_letter_state` tasks fail.	2019-06-18 10:58:58 +01:00
Katie Smith	c518f6ca76	Add scheduled task to find old letters which still have 'created' status Added a scheduled task to run once a day and check if there were any letters from before 17.30 that still have a status of 'created'. This logs an exception instead of trying to fix the error because the fix will be different depending on which bucket the letter is in.	2019-06-18 10:58:58 +01:00
Katie Smith	a2f324ad7e	Add scheduled task to find precompiled letters in wrong state Added a task which runs twice a day on weekdays and checks for letters that have been in the state of `pending-virus-check` for over 90 minutes. This is just logging an exception for now, not trying to fix things, since we will need to manually check where the issue was.	2019-06-18 10:58:58 +01:00

1 2 3 4 5 ...

280 Commits