notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-13 00:32:16 -05:00

Author	SHA1	Message	Date
Kenneth Kehl	be443172b0	fix celery bug	2025-07-14 07:49:34 -07:00
Kenneth Kehl	c81264f098	cleanup	2025-07-10 14:26:56 -07:00
Kenneth Kehl	2bfcd4a54d	fix cronitor	2025-07-10 09:43:32 -07:00
ccostino	c0a97985d7	Merge pull request #1820 from GSA/acceptable_finish_time fix acceptable_finish_time	2025-07-09 15:11:45 -04:00
Kenneth Kehl	7e6bd32305	fix acceptable_finish_time	2025-07-09 10:41:08 -07:00
Kenneth Kehl	256ab0f83c	fix cronitor task name	2025-07-09 10:23:57 -07:00
Kenneth Kehl	3d6f112324	fix minor startup error	2025-02-03 10:40:55 -08:00
Kenneth Kehl	64a85868a3	code review feedback and merge from main	2024-09-11 09:39:18 -07:00
Kenneth Kehl	c0ab7c8a68	more exc_info	2024-08-15 11:07:36 -07:00
Kenneth Kehl	146f0cc787	initial	2024-08-15 10:31:02 -07:00
Cliff Hill	fef3173876	Minor fix. Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>	2024-07-26 12:42:36 -04:00
Cliff Hill	17b078982f	Logging where the error is at. Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>	2024-07-26 09:46:37 -04:00
Kenneth Kehl	905df17f65	remove datetime.utcnow()	2024-05-23 13:59:51 -07:00
Cliff Hill	3982f061b6	Made enums.py for all the enums to avoid cyclic imports. Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>	2024-02-28 12:43:31 -05:00
Cliff Hill	ac9591ec7c	More tweaks, trying to get tests to be clean. Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>	2024-02-28 12:43:31 -05:00
Cliff Hill	43f18eed6a	More changes for enums. Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>	2024-02-28 12:41:57 -05:00
Cliff Hill	820ee5a942	Cleaning up a lot of things, getting Enums used everywhere. Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>	2024-02-28 12:40:52 -05:00
Kenneth Kehl	1ecb747c6d	reformat	2023-08-29 14:54:30 -07:00
Kenneth Kehl	39707adf10	Merge branch 'main' of https://github.com/GSA/notifications-api into notify-260	2023-05-30 11:47:36 -07:00
Kenneth Kehl	6f6061455c	notify-162 delete incomplete s3 uploads (#276 ) Co-authored-by: Kenneth Kehl <@kkehl@flexion.us>	2023-05-23 11:31:30 -04:00
Kenneth Kehl	08c1ad75c8	notify-260 remove server-side timezone handling	2023-05-10 08:39:50 -07:00
Kenneth Kehl	001954538e	notify-243 remove statsd	2023-04-25 07:50:56 -07:00
Steven Reilly	ff4190a8eb	Remove letters-related code (#175 ) This deletes a big ol' chunk of code related to letters. It's not everything—there are still a few things that might be tied to sms/email—but it's the the heart of letters function. SMS and email function should be untouched by this. Areas affected: - Things obviously about letters - PDF tasks, used for precompiling letters - Virus scanning, used for those PDFs - FTP, used to send letters to the printer - Postage stuff	2023-03-02 20:20:31 -05:00
stvnrlly	7f5c3c785e	est_date to local_date, too	2022-11-21 12:05:23 -05:00
stvnrlly	9e7ee1c0f8	migrate bst_date to local_date	2022-11-21 11:49:59 -05:00
stvnrlly	99de747a36	fix formatting	2022-11-21 11:29:38 -05:00
stvnrlly	6908bd3cf5	use only convert_utc_to_local_timezone	2022-11-16 12:51:46 -05:00
stvnrlly	53019995d1	more time test times	2022-11-14 14:53:28 -05:00
stvnrlly	b50cb4712f	tz utility swap and many test updates	2022-11-10 12:33:25 -05:00
jimmoffet	c04d1df6b3	fixing tests	2022-10-03 17:16:59 -07:00
Katie Smith	514bd48614	Update flake8-bugbear from 20.11.1 to 22.1.11 And ignore a warning, since I did not think that in this case "Using .strip() with multi-character strings is misleading the reader".	2022-03-02 16:51:09 +00:00
Ben Thorner	7f4b140f97	Rename function to make it consistent This is consistent with the new "on_date" function. It was going off the edge of my screen before in some parts of the code.	2022-02-09 17:39:08 +00:00
Leo Hemsted	246016a894	don't log if we dont delete anything for a service we try and delete for lots of services. this includes services that don't actually have anything to delete that day. that might be because they had a custom data retention so we always go to check them, or because they only sent test notifications (which we'll delete but not include in the count in the log line). we don't really need to see log lines saying that we didn't delete anything for that service - that's just a long list of boring log messages that will hide the actual interesting stuff - which services we did delete content for.	2022-01-21 11:04:37 +00:00
Leo Hemsted	228d72dc8f	update log messages in delete task. less prose, clearer output. (hopefully)	2021-12-14 15:24:35 +00:00
Leo Hemsted	49cc1b643f	split delete task up into per service we really don't gain anything by running each service delete in sequence - we get the services, and then just loop through them deleting per service. By deleting per service in separate tasks, we can take advantage of parallelism. the only thing we lose is some log lines but I don't think we're that interested in them. only set query limit at the move_notifications dao function - the task doesn't really care about the technical implementation of how it deletes the notifications	2021-12-14 15:24:34 +00:00
Ben Thorner	c8cf057eba	Record providers we time out notifications for This will help us monitor issues with delivery receipts and keep track of provider performance over time. I'm not concerned about performance here: - The number of notifications to time out is usually small. - This task only runs once a day. - Calls to StatsD are quick and cheap.	2021-12-14 13:04:39 +00:00
Ben Thorner	87cd40d00a	Scale timeout task to work on arbitrary volumes Previously this was limited to 500K notifications. While we don't expect to reach this limit, it's not impossible e.g. if we had a repeat of the incident where one of our providers stopped sending us status updates. Although that's not great, it's worse if our code can't cope with the unexpectedly high volume. This reuses the technique we have elsewhere [1] to keep processing in batches until there's nothing left. Specifying a cutoff point means the total amount of work to do can't keep growing. [1]: `2fb432adaf/app/dao/notifications_dao.py (L441)`	2021-12-13 17:14:28 +00:00
Ben Thorner	76aeab24ce	Rewrite DAO timeout method to take cutoff_time Previously we specified the period and calculated the cutoff time in the function. Passing it in means we can run the method multiple times and avoid getting "new" notifications to time out in the time it takes to process each batch.	2021-12-13 16:56:21 +00:00
Ben Thorner	04da017558	DRY-up conditionally creating callback tasks This removes 3 duplicate instances of the same code, which is still tested implicitly via test_process_ses_receipt_tasks [1]. In the next commit we'll make this test more explicit, to reflect that it's now being reused elsewhere and shouldn't change arbitrarily. We do lose the "print" statement from the command instance of the code, but I think that's a very tolerable loss. [1]: `16ec8ccb8a/tests/app/celery/test_process_ses_receipts_tasks.py (L94)`	2021-12-06 14:11:34 +00:00
Ben Thorner	0318229216	Stop 'timing out' old 'created' notifications This is being replaced with a new alert and runbook [1]. It's not always appropriate to change the status to 'technical-failure', and the new alert means we'll act to fix the underlying issue promptly. We'll look at tidying up the remaining code in the next commits. [1]: https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-email-or-sms-still-in-created	2021-12-06 14:00:36 +00:00
Leo Hemsted	f6d210f1e6	put delete tasks on the reporting worker they share a lot with the reporting tasks (creating ft_billing and ft_notification_status), in that they're run nightly, take a long time, and we see error messages if they get run multiple times (due to visibility timeout). The periodic app has two concurrent processes - previously there was just one delete task, which would use one of those processes, while the other process would pick up anything else on the queue (at that time of night, the regular provider switch checks and scheduled job checks). However, when we switched to running the three delete notification types separately, we saw visibility timeout issues - three tasks would be created, all three would be picked up by one celery instance, the two worker processes would start on two of them, and the third would sit on the box, wait longer than the visibility timeout to be picked up (and acknowledged), and so SQS would assume the task was lost and replay it. it's queues all the way down! By putting them on the reporting worker we can take advantage of tuning that app (for example setting the prefetch multiplier to one) which is designed to run large tasks. We've also got more concurrent workers on this box, so we can run all three tasks at once.	2021-12-03 13:28:16 +00:00
Leo Hemsted	6bbec9f103	make delete notification tasks parallel by notification type we used to do this until apr 2020. Let's try doing it again. Back then, we had problems with timing. We did two things in spring 2020: We moved to using an intermediary temp table [1] We stopped the tasks being parallelised [2] However, it turned out the real time saving was from changing what services we delete for [3]. The task was actually CPU-bound rather than DB-bound, so that's probably why having the tasks in parallel wasn't helping, since they were all competing for the same CPU. It's worth trying the parallel steps again now that we're no longer CPU bound. Note: Temporary tables are in their own postgres schema, and are only viewable by the current session (session == connection. Each celery worker process has its own db connection). We don't need to worry about separate workers both trying to use the same table at once. I've also added a "DROP ON COMMIT" directive to the table definition just to ensure it doesn't persist past the task even if there's an exception. (This also drops on rollback). Cronitor looks at the three functions separately so we don't need to worry about the main task taking milliseconds where it used to take hours as it isn't monitored itself. I've also removed some unnecessary redundant exception logs. [1] https://github.com/alphagov/notifications-api/pull/2767 [2] https://github.com/alphagov/notifications-api/pull/2798 [3] https://github.com/alphagov/notifications-api/pull/3381	2021-12-01 14:28:08 +00:00
Ben Thorner	cdb43fbaf6	Only loop timeout task if there's more work Previously this would repeat the task even the current iteration of the loop had processed a non-full batch. This could cause the task to error incorrectly if one or two notifications breach the timeout threshold in between iterations.	2021-11-09 15:41:14 +00:00
Ben Thorner	77c8c0a501	Optimise query to get notifications to "time out" From experimenting in production we found a "!=" caused the engine to use a sequential scan, whereas explicitly listing all the types ensured an index scan was used. We also found that querying for many (over 100K) items leads to the task stalling - no logs, but no evidence of it running either - so we also add a limit to the query. Since the query now only returns a subset of notifications, we need to ensure the subsequent "update" query operates on the same batch. Also, as a temporary measure, we have a loop in the task code to ensure it operates on the total set of notifications to "time out", which we assume is less than 500K for the time being.	2021-11-09 13:50:32 +00:00
Ben Thorner	d1586a8f81	CC DVLA in tickets about outstanding letters Previously we sent them emails about this manually. We also tried a Zendesk macro/trigger approach, but using a CC means: - We can control the behaviour ourselves (Zendesk triggers can only be edited by admins outside our team). - We keep the DVLA notification approach consistent and in one place, so notifications always go to the same people. - Any further (public) updates to the ticket will also trigger a notification to DVLA (previous trigger only notified on creation).	2021-10-29 11:46:29 +01:00
Leo Hemsted	b8c4e19072	tweak zendesk message for no ack files alert include a link to a runbook entry. also the list of acknowledgement files can be very long, so make that the last thing, and use new lines to space out the message.	2021-10-08 13:45:02 +01:00
Katie Smith	2f66e38fb9	Update how "missing ackfile for letters" Zendesk tickets are created	2021-09-29 11:10:50 +01:00
Katie Smith	64c0a3fb9d	Update how 'letters still sending' Zendesk tickets are created These now use the new Zendesk form.	2021-09-29 11:07:37 +01:00
Ben Thorner	e3e067c795	Remove redundant @statsd timing decorators These are superseded by timing task execution generically in the NotifyTask superclass [1]. Note that we need to wait until we've gathered enough data under the new metrics before removing these. [1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376	2021-04-12 15:19:18 +01:00
David McDonald	41d95378ea	Remove everything for the performance platform We no longer will send them any stats so therefore don't need the code - the code to work out the nightly stats - the performance platform client - any configuration for the client - any nightly tasks that kick off the sending off the stats We will require a change in cronitor as we no longer will have this task run meaning we need to delete the cronitor check.	2021-03-15 12:04:53 +00:00

1 2

68 Commits