notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-30 12:23:04 -05:00

Author	SHA1	Message	Date
David McDonald	5a51ab6131	Bug fix: update normalised_to, not just `to` after letter sanitise When a precompiled letter is sent to us, we set the `to` field as 'Provided as PDF' in `1c1023a877/app/v2/notifications/post_notifications.py (L100-L104)` This then also sets `normalised_to` as `providedaspdf`. However, when template preview sanitises the letter, pulls out the address and gives it to the API, we were only setting `to` to be the new address and had forgotten to also amend `normalised_to` to be the normalised version. This meant that for all these letters we accidentally left `normalised_to` as `providedaspdf`. The impact of this was that we can not then search for these letters in the admin user interface as they rely on the `normalised_to` field containing the recipient address. This commit fixes that bug by also setting the `normalised_to` field	2021-10-27 11:56:25 +01:00
Ben Thorner	d703251b13	Merge pull request #3348 from alphagov/better-callback-stats-180016688 Include status in stats about delivery times	2021-10-22 11:59:24 +01:00
Ben Thorner	f974108934	Include status in stats about delivery times Previously these metrics weren't very useful because they could be skewed by long timings for failed notifications, which can take up to 72 hours to deliver. I'm intentionally not trying to have a dual running period (with the old and new names) because: - We don't use the current stats for anything (checking Grafana). - The current stats get turned into a "bucket" metric in Prometheus [1][2], which isn't very useful because it can only tell us the mean time to deliver, but we're actually interested in percentiles. Switching to a new naming is an opportunity to fix the raw data and the way it's aggregated, using the same kind of "summary" metric that we now use for stats about our Celery tasks [3]. [1]: `c330a8ac8a/paas/statsd/statsd-mapping.yml (L82)` [2]: https://prometheus.io/docs/practices/histograms/#quantiles [3]: https://github.com/alphagov/notifications-aws/pull/890	2021-10-20 17:22:59 +01:00
Leo Hemsted	0b8c6ef263	Merge pull request #3339 from alphagov/letter-runbook-link tweak zendesk message for no ack files alert	2021-10-20 15:23:33 +01:00
Pea Tyczynska	1b6f9505da	Call `publish-govuk-alerts` task when alert expires The `auto-expire-broadcast-messages` task checks for expired broadcasts at five minute intervals. This change now calls the `publish-govuk-alerts` task in govuk-alerts if there are expired broadcasts so that the site is updated. Co-authored-by: Katie Smith <katie.smith@digital.cabinet-office.gov.uk>	2021-10-18 08:41:25 +01:00
Katie Smith	04bfd6bfdb	Trigger task to publish alerts when sending or cancelling alert When we send or cancel a broadcast message, we now trigger a task in govuk-alerts repo that polls our API for alerts and publishes a fresh list of alerts. Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>	2021-10-18 08:41:24 +01:00
Leo Hemsted	b8c4e19072	tweak zendesk message for no ack files alert include a link to a runbook entry. also the list of acknowledgement files can be very long, so make that the last thing, and use new lines to space out the message.	2021-10-08 13:45:02 +01:00
Katie Smith	58597653df	Update how "sending to TV numbers" Zendesk tickets are created	2021-09-29 11:26:20 +01:00
Katie Smith	0c0c7f4478	Update how "letters still created status" Zendesk tickets are created	2021-09-29 11:23:28 +01:00
Katie Smith	2f66e38fb9	Update how "missing ackfile for letters" Zendesk tickets are created	2021-09-29 11:10:50 +01:00
Katie Smith	64c0a3fb9d	Update how 'letters still sending' Zendesk tickets are created These now use the new Zendesk form.	2021-09-29 11:07:37 +01:00
Katie Smith	b114dadcae	Update how pending virus check Zendesk tickets are created This updates the tickets that are created when the `check_if_letters_still_pending_virus_check` scheduled task detects letters in the `pending-virus-check` state.	2021-09-29 11:03:48 +01:00
Katie Smith	9ff0ca0363	Update how live broadcast Zendesk tickets are created These now use the Notify Form in Zendesk	2021-09-29 10:59:07 +01:00
sakisv	df739d6b94	Fix flake8	2021-09-10 10:17:28 +03:00
sakisv	65c21f694c	Don't raise P1 for broadcasts This is happening on the AWS side now as part of alphagov/notifications-broadcasts-infra#267 - but we still want to keep the zendesk ticket as it contains useful context _and_ provides visibility to the team.	2021-09-09 16:44:19 +03:00
Ben Thorner	bf0bf4e31c	Favour new "areas" format for PagerDuty alerts Broadcasts created via the API [1] and the Admin app [2] should both now have this field set. It's also more informative to show this, and broadcasts created via the API don't have IDs anyway. There's a small risk that an old broadcast that gets approved won't have this data, but it's for information only and we intend to backfill all old broadcasts in the near future. [1]: `023a06d5fb` [2]: `7dbe3afa19`	2021-08-27 14:22:12 +01:00
Ben Thorner	8f39d476bd	Start dual running with "areas" and (area) "ids" This is necessary until: - The Admin app is using the new "areas(_2)" format to store and retrieve data. - We've migrated all existing broadcast messages to use the new format. Note that "areas" / "ids" isn't actually used for anything except printing out the PagerDuty message - it's not sent to the proxy [1]. [1]: `6edc6c70aa/app/celery/broadcast_message_tasks.py (L190-L193)`	2021-08-26 15:34:24 +01:00
Ben Thorner	312a895822	Merge pull request #3294 from alphagov/auto-expire-alerts-178926353 Auto expire old broadcast messages	2021-07-22 09:53:41 +01:00
Ben Thorner	5e9d8e5fa0	Auto expire old broadcast messages Since the expiry is sent as part of the message payload, we don't need to invoke the CBC proxies (and indeed there's no way to do so for an expired alert). In future we plan to extend this task so it triggers the regeneration of content on gov.uk/alerts. It's worth noting that 'finishes_at' can theoretically be None, in which case it's unclear when the alert should expire. While alerts from the Admin app should always have an expiry [1], we have many in the DB that don't, so it's worth checking for this scenario. [1]: `078ac10c8d/app/models/broadcast_message.py (L255)`	2021-07-21 13:05:11 +01:00
Ben Thorner	cdc150de1b	Change link test task to trigger both lambda This modifies the previous "(_)send_link_test" method to trigger a link test for a specific lambda. We then call the method with both the primary and failover lambda in new orchestrator method. Since the _invoke_lambda function doesn't raise exceptions if it fails, there's no need to rescue anything in order to ensure the second link test / invocation runs as well. It doesn't testing for this, since it boils to an absence of code to raise any exception. Note that, like the other parent tests, we only check the new method works with a specific proxy client instance.	2021-07-19 16:00:56 +01:00
Ben Thorner	08f48379b4	Move ID generation into link test method Unlike the other IDs which are stored in the DB, this isn't relevant for the Celery task as it invokes a link test. Moving it into the proxy client will also enable us to generate a second ID in the next commits, where we start doing a link test for the failover lambda.	2021-07-19 16:00:55 +01:00
Ben Thorner	7fb65761f9	Always log when a lambda is invoked This replaces the previous single-purpose log about a link test with a more informative and generic one for all invocations.	2021-07-19 15:45:43 +01:00
Ben Thorner	b6774bf0f7	Generate Vodafone link test sequence nos in proxy Previously the Celery task to trigger a link test had to know about the special case of a sequence number for Vodafone. Since we're about to change the client to perform multiple tests it makes sense to give it the knowledge of how to generate number itself. Note that we have to import the db inline to avoid a circular import, since this module is itself imported by app/__init__.py. Other invocations of the Vodafone client use stored sequence numbers from the DB, which are called "message numbers" in that context. Since the two use cases are very different (even the names are different!), having them in two places shouldn't cause any confusion.	2021-07-19 15:43:36 +01:00
Leo Hemsted	2ad9a3a380	retry service callbacks on 429 if we're served a 429, put the item on the retry queue and retry the same as if the service returned a 5xx. 429 is commonly returned for rate limit exceeding, and retrying on a delay is a typical response to that.	2021-07-13 16:09:17 +01:00
Rebecca Law	18dd9050a4	- make sure when processing a job that we check the total_sent + job.notification_count against the service.message_limit.	2021-06-28 13:07:48 +01:00
Rebecca Law	fd7486d751	- Merge daily limit functions into one, refactor call for daily limit check from process_job - refactor tests to standardise test names - refactor some tests to be more clear - remove unnecessary tests - include missing test	2021-06-24 11:05:22 +01:00
Rebecca Law	35b20ba363	Correct the daily limits cache. Last year we had an issue with the daily limit cache and the query that was populating it. As a result we have not been checking the daily limit properly. This PR should correct all that. The daily limit cache is not being incremented in app.notifications.process_notifications.persist_notification, this method is and should always be the only method used to create a notification. We increment the daily limit cache is redis is enabled (and it is always enabled for production) and the key type for the notification is team or normal. We check if the daily limit is exceed in many places: - app.celery.tasks.process_job - app.v2.notifications.post_notifications.post_notification - app.v2.notifications.post_notifications.post_precompiled_letter_notification - app.service.send_notification.send_one_off_notification - app.service.send_notification.send_pdf_letter_notification If the daily limits cache is not found, set the cache to 0 with an expiry of 24 hours. The daily limit cache key is service_id-yyy-mm-dd-count, so each day a new cache is created. The best thing about this PR is that the app.service_dao.fetch_todays_total_message_count query has been removed. This query was not performant and had been wrong for ages.	2021-06-22 16:15:36 +01:00
Rebecca Law	8af10eb1f0	Update the job_status to in-progress sooner. We had a situation where the delivery-worker app instance was terminated before the job was marked as `in-progress`, presumably because the query to check the daily limits was taking too long to complete. If the job was in progress the `check_job_status` task would have restarted the job. Updating the status to in-progress sooner will help.	2021-06-15 07:58:17 +01:00
Rebecca Law	1bf5ce08b2	Add a error log for alert tasks. Many of the team members do not look at emails from zendesk, adding a current_app.logger.error message for things we care about to give developers a better chance of seeing them. I have purposely not added an erro log for `check_for_services_with_high_failure_rates_or_sending_to_tv_numbers` because it's not something we need to look at immediately.	2021-05-26 11:06:21 +01:00
Katie Smith	8365c749e4	Change letter zip file names for Insolvency Service letters DVLA would like to be able to identify letters sent by the Insolvency Service, so we are changing the zipfile name. They need all zipfile names to have the same structure, so we can't just add a marker to files sent by that service - we have to change all filenames. The new format is like this: `{NOTIFY}.{DATE}.{SEQUENCE_ID}.{UNIQUE_ID}.{SERVICE_ID}.{ORG_NAME}.{EXTENSION}`	2021-05-06 09:18:44 +01:00
Ben Thorner	23f4ae32df	Merge pull request #3214 from alphagov/check-broadcast-suspended Enforce service suspension for broadcasts	2021-04-28 15:01:11 +01:00
Ben Thorner	99bc29418e	Move request_id injection into send_task override This applies the same change we made in other apps [1][2]. Adding the override here is special, though, because it means the others will now get triggered, since this app is the start of the chain of tasks for a request. We will also retain existing request_id tracing for tasks within this app, since "apply_async" calls the "send_task" method internally, which is the one we're overriding. [1]: `6f3c118a1e` [2]: `2e08b7aa95`	2021-04-27 10:35:21 +01:00
Katie Smith	e6357c91c9	Add more details to messages in send_broadcast_provider_message task This ensures that the log messages both contain broadcast_event id and broadcast_provider_message id. It also removes the broadcast_event reference since this isn't particularly useful in helping to find an event.	2021-04-20 15:34:49 +01:00
Katie Smith	c9c4bd8b44	Clarify log line when sending a link test It wasn't clear what the ID in the message was. It's not possible to add more details to the message - we don't create a broadcast message or event for a link test.	2021-04-20 14:54:53 +01:00
Ben Thorner	a2af8b052a	Split up authorisation vs. sequencing checks While both of these are integrity errors (since we should never reach this point in the code + data), this just means the original method comment is still relevant to what immediately follows it.	2021-04-19 17:13:15 +01:00
Ben Thorner	ee52e3e2c9	Mirror integrity checks from the API It makes sense to have these checks [1] here, since in future we may add other ways of creating a broadcast event and omit them. [1]: `3d71815956/app/broadcast_message/rest.py (L198)`	2021-04-19 17:13:13 +01:00
Ben Thorner	0070473f31	Check for suspension before sending a broadcast This mirrors the check we do for jobs, which are also a high-impact task [1]. While this shouldn't be possible, just like other checks we're adding it here to be doubly certain. [1]: `3d71815956/app/celery/tasks.py (L74)`	2021-04-19 17:13:12 +01:00
Ben Thorner	b2398fcaf4	Rename CBCProxyFatalException We only actually use this when the data we're working with is in an unexpected state, which is unrelated to the CBC Proxy. Using this name also means we can re-use this exception in the next commits. Note that we may still care if a broadcast message has expired, since it's not expected that someone would send one in this condition.	2021-04-19 17:13:05 +01:00
Rebecca Law	34a378a60e	Update the Zendesk ticket content for `check_if_letters_still_in_created` The message to Zendesk includes a list of notification ids, this isn't really necessary and is included in the run book. Creation of the Zendesk ticket can fail if the message is too long, removing the list of ids can prevent that from happening.	2021-04-19 10:47:25 +01:00
Ben Thorner	be02573147	Fix apply_async not working with positional kwargs Celery's apply_async function accepts 'kwargs' as (get ready to be confused) either a positional argument, or a keyword argument: Positional: apply_async(['args'], {'kw': 'args'}) Keyword: apply_async(args=['args'], kwargs={'kw': 'args'}) We rely on the positional form in at least one place [1]. This fixes the overload of apply_async to cope with both forms, and continue to pass through any other (confusion time again) keyword args to super(), such as queue="queue". Note that we've also decided to stop accepting other positional args, since this is unnecessarily confusing, and we don't currently rely on it in our code. This stops it creeping in in future. [1]: `fde927e00e/app/job/rest.py (L186)`	2021-04-15 17:21:21 +01:00
Ben Thorner	fde927e00e	Merge pull request #3205 from alphagov/celery-consistency-tweaks Small refactor for new Celery / StatsD code	2021-04-15 11:40:42 +01:00
Ben Thorner	f85dad5acf	Merge pull request #3203 from alphagov/remove-statsd-decorators Remove redundant @statsd timing decorators	2021-04-14 10:04:04 +01:00
Ben Thorner	5eb265138b	Remove unnecessary statsd_client parameter It turns out this is available from the app object [1], and we were already assuming this in the tests. [1]: `48c6c822e8/notifications_utils/clients/statsd/statsd_client.py (L52)`	2021-04-13 15:12:55 +01:00
Ben Thorner	ec6d87cd0f	Simplify argument passing in apply_async This avoids the need to keep in-sync with any future changes to the signature, and reduces the amount of irrelevant code to read.	2021-04-13 15:12:45 +01:00
David McDonald	2e6d761691	Merge pull request #3204 from alphagov/broadcast-envars Broadcast envars	2021-04-12 17:25:15 +01:00
David McDonald	295162c81d	Move CBC proxy enable check This change will make our development environments closer to production even if they aren't hooked up to the CBC proxy lambda functions. Now in development, we will create the broadcast event and create tasks for each broadcast provider event. We will still not create actual broadcast provider message rows in the DB and talk to the CBC proxies. This should be helpful in development to catch any issues we introduce to do with sending broadcast messaging. In time we may wish to have some fake CBC proxies in the AWS tools account that we can interact with to make it even more realistic.	2021-04-12 17:05:41 +01:00
Ben Thorner	e3e067c795	Remove redundant @statsd timing decorators These are superseded by timing task execution generically in the NotifyTask superclass [1]. Note that we need to wait until we've gathered enough data under the new metrics before removing these. [1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376	2021-04-12 15:19:18 +01:00
Ben Thorner	3e507eea55	Merge pull request #3201 from alphagov/revamp-celery-stats Migrate towards new metrics for Celery tasks	2021-04-12 15:04:37 +01:00
Ben Thorner	ab8dd6d52c	Duplicate metrics to StatsD for Celery tasks Previously we used a '@statsd' decorator to time and count Celery tasks [1]. Using a decorator isn't ideal since we need to remember to add it to every task we define. In addition, it's not possible to use data like the task name and queue. In order to avoid breaking existing stats, this duplicates them as new StatsD metrics until we have sufficient data to update dashboards using the old ones. Using the CeleryTask superclass to send metrics avoids a future maintenance overhead, and means we can include more useful data in the StatsD metric. Note that the new metrics will sit in StatsD until we add a mapping for them [2]. StatsD automatically produces a 'count' stat for timing metrics, so we don't need to increment a separate counter for successful tasks. [1]: `dea5828d0e/app/celery/tasks.py (L65)` [2]: https://github.com/alphagov/notifications-aws/blob/master/paas/statsd/statsd-mapping.yml	2021-04-08 18:02:53 +01:00
Ben Thorner	248f5a0708	Include queue name in Celery task logs This is mainly so we can use it in the new metrics we send to StatsD in the following commits, but it should also be useful in the logs. I've taken the opportunity to make the log format consistent between success / failure, and with our Template Preview app [1]. [1]: `f456433a5a/app/celery/celery.py (L19)`	2021-04-08 18:02:51 +01:00

1 2 3 4 5 ...

948 Commits