Commit Graph

68 Commits

Author SHA1 Message Date
Kenneth Kehl
be443172b0 fix celery bug 2025-07-14 07:49:34 -07:00
Kenneth Kehl
c81264f098 cleanup 2025-07-10 14:26:56 -07:00
Kenneth Kehl
2bfcd4a54d fix cronitor 2025-07-10 09:43:32 -07:00
ccostino
c0a97985d7 Merge pull request #1820 from GSA/acceptable_finish_time
fix acceptable_finish_time
2025-07-09 15:11:45 -04:00
Kenneth Kehl
7e6bd32305 fix acceptable_finish_time 2025-07-09 10:41:08 -07:00
Kenneth Kehl
256ab0f83c fix cronitor task name 2025-07-09 10:23:57 -07:00
Kenneth Kehl
3d6f112324 fix minor startup error 2025-02-03 10:40:55 -08:00
Kenneth Kehl
64a85868a3 code review feedback and merge from main 2024-09-11 09:39:18 -07:00
Kenneth Kehl
c0ab7c8a68 more exc_info 2024-08-15 11:07:36 -07:00
Kenneth Kehl
146f0cc787 initial 2024-08-15 10:31:02 -07:00
Cliff Hill
fef3173876 Minor fix.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-07-26 12:42:36 -04:00
Cliff Hill
17b078982f Logging where the error is at.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-07-26 09:46:37 -04:00
Kenneth Kehl
905df17f65 remove datetime.utcnow() 2024-05-23 13:59:51 -07:00
Cliff Hill
3982f061b6 Made enums.py for all the enums to avoid cyclic imports.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-02-28 12:43:31 -05:00
Cliff Hill
ac9591ec7c More tweaks, trying to get tests to be clean.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-02-28 12:43:31 -05:00
Cliff Hill
43f18eed6a More changes for enums.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-02-28 12:41:57 -05:00
Cliff Hill
820ee5a942 Cleaning up a lot of things, getting Enums used everywhere.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-02-28 12:40:52 -05:00
Kenneth Kehl
1ecb747c6d reformat 2023-08-29 14:54:30 -07:00
Kenneth Kehl
39707adf10 Merge branch 'main' of https://github.com/GSA/notifications-api into notify-260 2023-05-30 11:47:36 -07:00
Kenneth Kehl
6f6061455c notify-162 delete incomplete s3 uploads (#276)
Co-authored-by: Kenneth Kehl <@kkehl@flexion.us>
2023-05-23 11:31:30 -04:00
Kenneth Kehl
08c1ad75c8 notify-260 remove server-side timezone handling 2023-05-10 08:39:50 -07:00
Kenneth Kehl
001954538e notify-243 remove statsd 2023-04-25 07:50:56 -07:00
Steven Reilly
ff4190a8eb Remove letters-related code (#175)
This deletes a big ol' chunk of code related to letters. It's not everything—there are still a few things that might be tied to sms/email—but it's the the heart of letters function. SMS and email function should be untouched by this.

Areas affected:

- Things obviously about letters
- PDF tasks, used for precompiling letters
- Virus scanning, used for those PDFs
- FTP, used to send letters to the printer
- Postage stuff
2023-03-02 20:20:31 -05:00
stvnrlly
7f5c3c785e est_date to local_date, too 2022-11-21 12:05:23 -05:00
stvnrlly
9e7ee1c0f8 migrate bst_date to local_date 2022-11-21 11:49:59 -05:00
stvnrlly
99de747a36 fix formatting 2022-11-21 11:29:38 -05:00
stvnrlly
6908bd3cf5 use only convert_utc_to_local_timezone 2022-11-16 12:51:46 -05:00
stvnrlly
53019995d1 more time test times 2022-11-14 14:53:28 -05:00
stvnrlly
b50cb4712f tz utility swap and many test updates 2022-11-10 12:33:25 -05:00
jimmoffet
c04d1df6b3 fixing tests 2022-10-03 17:16:59 -07:00
Katie Smith
514bd48614 Update flake8-bugbear from 20.11.1 to 22.1.11
And ignore a warning, since I did not think that in this case "Using
.strip() with multi-character strings is misleading the reader".
2022-03-02 16:51:09 +00:00
Ben Thorner
7f4b140f97 Rename function to make it consistent
This is consistent with the new "on_date" function. It was going
off the edge of my screen before in some parts of the code.
2022-02-09 17:39:08 +00:00
Leo Hemsted
246016a894 don't log if we dont delete anything for a service
we try and delete for lots of services. this includes services that
don't actually have anything to delete that day. that might be because
they had a custom data retention so we always go to check them, or
because they only sent test notifications (which we'll delete but not
include in the count in the log line). we don't really need to see log
lines saying that we didn't delete anything for that service - that's
just a long list of boring log messages that will hide the actual
interesting stuff - which services we did delete content for.
2022-01-21 11:04:37 +00:00
Leo Hemsted
228d72dc8f update log messages in delete task.
less prose, clearer output. (hopefully)
2021-12-14 15:24:35 +00:00
Leo Hemsted
49cc1b643f split delete task up into per service
we really don't gain anything by running each service delete in sequence
- we get the services, and then just loop through them deleting per
service. By deleting per service in separate tasks, we can take
advantage of parallelism. the only thing we lose is some log lines but I
don't think we're that interested in them.

only set query limit at the move_notifications dao function - the task
doesn't really care about the technical implementation of how it deletes
the notifications
2021-12-14 15:24:34 +00:00
Ben Thorner
c8cf057eba Record providers we time out notifications for
This will help us monitor issues with delivery receipts and keep
track of provider performance over time.

I'm not concerned about performance here:

- The number of notifications to time out is usually small.
- This task only runs once a day.
- Calls to StatsD are quick and cheap.
2021-12-14 13:04:39 +00:00
Ben Thorner
87cd40d00a Scale timeout task to work on arbitrary volumes
Previously this was limited to 500K notifications. While we don't
expect to reach this limit, it's not impossible e.g. if we had a
repeat of the incident where one of our providers stopped sending
us status updates. Although that's not great, it's worse if our
code can't cope with the unexpectedly high volume.

This reuses the technique we have elsewhere [1] to keep processing
in batches until there's nothing left. Specifying a cutoff point
means the total amount of work to do can't keep growing.

[1]: 2fb432adaf/app/dao/notifications_dao.py (L441)
2021-12-13 17:14:28 +00:00
Ben Thorner
76aeab24ce Rewrite DAO timeout method to take cutoff_time
Previously we specified the period and calculated the cutoff time
in the function. Passing it in means we can run the method multiple
times and avoid getting "new" notifications to time out in the time
it takes to process each batch.
2021-12-13 16:56:21 +00:00
Ben Thorner
04da017558 DRY-up conditionally creating callback tasks
This removes 3 duplicate instances of the same code, which is still
tested implicitly via test_process_ses_receipt_tasks [1]. In the
next commit we'll make this test more explicit, to reflect that it's
now being reused elsewhere and shouldn't change arbitrarily.

We do lose the "print" statement from the command instance of the
code, but I think that's a very tolerable loss.

[1]: 16ec8ccb8a/tests/app/celery/test_process_ses_receipts_tasks.py (L94)
2021-12-06 14:11:34 +00:00
Ben Thorner
0318229216 Stop 'timing out' old 'created' notifications
This is being replaced with a new alert and runbook [1]. It's not
always appropriate to change the status to 'technical-failure', and
the new alert means we'll act to fix the underlying issue promptly.

We'll look at tidying up the remaining code in the next commits.

[1]: https://github.com/alphagov/notifications-manuals/wiki/Support-Runbook#deal-with-email-or-sms-still-in-created
2021-12-06 14:00:36 +00:00
Leo Hemsted
f6d210f1e6 put delete tasks on the reporting worker
they share a lot with the reporting tasks (creating ft_billing and
ft_notification_status), in that they're run nightly, take a long time,
and we see error messages if they get run multiple times (due to
visibility timeout).

The periodic app has two concurrent processes - previously there was
just one delete task, which would use one of those processes, while the
other process would pick up anything else on the queue (at that time of
night, the regular provider switch checks and scheduled job checks).
However, when we switched to running the three delete notification types
separately, we saw visibility timeout issues - three tasks would be
created, all three would be picked up by one celery instance, the two
worker processes would start on two of them, and the third would sit on
the box, wait longer than the visibility timeout to be picked up (and
acknowledged), and so SQS would assume the task was lost and replay it.

it's queues all the way down!

By putting them on the reporting worker we can take advantage of tuning
that app (for example setting the prefetch multiplier to one) which is
designed to run large tasks. We've also got more concurrent workers on
this box, so we can run all three tasks at once.
2021-12-03 13:28:16 +00:00
Leo Hemsted
6bbec9f103 make delete notification tasks parallel by notification type
we used to do this until apr 2020. Let's try doing it again.
Back then, we had problems with timing. We did two things in spring
2020:

We moved to using an intermediary temp table [1]
We stopped the tasks being parallelised [2]

However, it turned out the real time saving was from changing what
services we delete for [3]. The task was actually CPU-bound rather than
DB-bound, so that's probably why having the tasks in parallel wasn't
helping, since they were all competing for the same CPU. It's worth
trying the parallel steps again now that we're no longer CPU bound.

Note: Temporary tables are in their own postgres schema, and are only
viewable by the current session (session == connection. Each celery
worker process has its own db connection). We don't need to worry about
separate workers both trying to use the same table at once.

I've also added a "DROP ON COMMIT" directive to the table definition
just to ensure it doesn't persist past the task even if there's an
exception. (This also drops on rollback).

Cronitor looks at the three functions separately so we don't need to worry
about the main task taking milliseconds where it used to take hours as
it isn't monitored itself.

I've also removed some unnecessary redundant exception logs.

[1] https://github.com/alphagov/notifications-api/pull/2767
[2] https://github.com/alphagov/notifications-api/pull/2798
[3] https://github.com/alphagov/notifications-api/pull/3381
2021-12-01 14:28:08 +00:00
Ben Thorner
cdb43fbaf6 Only loop timeout task if there's more work
Previously this would repeat the task even the current iteration of
the loop had processed a non-full batch. This could cause the task
to error incorrectly if one or two notifications breach the timeout
threshold in between iterations.
2021-11-09 15:41:14 +00:00
Ben Thorner
77c8c0a501 Optimise query to get notifications to "time out"
From experimenting in production we found a "!=" caused the engine
to use a sequential scan, whereas explicitly listing all the types
ensured an index scan was used.

We also found that querying for many (over 100K) items leads to
the task stalling - no logs, but no evidence of it running either -
so we also add a limit to the query.

Since the query now only returns a subset of notifications, we need
to ensure the subsequent "update" query operates on the same batch.
Also, as a temporary measure, we have a loop in the task code to
ensure it operates on the total set of notifications to "time out",
which we assume is less than 500K for the time being.
2021-11-09 13:50:32 +00:00
Ben Thorner
d1586a8f81 CC DVLA in tickets about outstanding letters
Previously we sent them emails about this manually. We also tried
a Zendesk macro/trigger approach, but using a CC means:

- We can control the behaviour ourselves (Zendesk triggers can only
be edited by admins outside our team).

- We keep the DVLA notification approach consistent and in one place,
so notifications always go to the same people.

- Any further (public) updates to the ticket will also trigger a
notification to DVLA (previous trigger only notified on creation).
2021-10-29 11:46:29 +01:00
Leo Hemsted
b8c4e19072 tweak zendesk message for no ack files alert
include a link to a runbook entry.

also the list of acknowledgement files can be very long, so make that
the last thing, and use new lines to space out the message.
2021-10-08 13:45:02 +01:00
Katie Smith
2f66e38fb9 Update how "missing ackfile for letters" Zendesk tickets are created 2021-09-29 11:10:50 +01:00
Katie Smith
64c0a3fb9d Update how 'letters still sending' Zendesk tickets are created
These now use the new Zendesk form.
2021-09-29 11:07:37 +01:00
Ben Thorner
e3e067c795 Remove redundant @statsd timing decorators
These are superseded by timing task execution generically in the
NotifyTask superclass [1]. Note that we need to wait until we've
gathered enough data under the new metrics before removing these.

[1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376
2021-04-12 15:19:18 +01:00
David McDonald
41d95378ea Remove everything for the performance platform
We no longer will send them any stats so therefore don't need the code
- the code to work out the nightly stats
- the performance platform client
- any configuration for the client
- any nightly tasks that kick off the sending off the stats

We will require a change in cronitor as we no longer will have this task
run meaning we need to delete the cronitor check.
2021-03-15 12:04:53 +00:00