notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-24 01:11:38 -05:00

Author	SHA1	Message	Date
Katie Smith	e6357c91c9	Add more details to messages in send_broadcast_provider_message task This ensures that the log messages both contain broadcast_event id and broadcast_provider_message id. It also removes the broadcast_event reference since this isn't particularly useful in helping to find an event.	2021-04-20 15:34:49 +01:00
Katie Smith	c9c4bd8b44	Clarify log line when sending a link test It wasn't clear what the ID in the message was. It's not possible to add more details to the message - we don't create a broadcast message or event for a link test.	2021-04-20 14:54:53 +01:00
Ben Thorner	a2af8b052a	Split up authorisation vs. sequencing checks While both of these are integrity errors (since we should never reach this point in the code + data), this just means the original method comment is still relevant to what immediately follows it.	2021-04-19 17:13:15 +01:00
Ben Thorner	ee52e3e2c9	Mirror integrity checks from the API It makes sense to have these checks [1] here, since in future we may add other ways of creating a broadcast event and omit them. [1]: `3d71815956/app/broadcast_message/rest.py (L198)`	2021-04-19 17:13:13 +01:00
Ben Thorner	0070473f31	Check for suspension before sending a broadcast This mirrors the check we do for jobs, which are also a high-impact task [1]. While this shouldn't be possible, just like other checks we're adding it here to be doubly certain. [1]: `3d71815956/app/celery/tasks.py (L74)`	2021-04-19 17:13:12 +01:00
Ben Thorner	b2398fcaf4	Rename CBCProxyFatalException We only actually use this when the data we're working with is in an unexpected state, which is unrelated to the CBC Proxy. Using this name also means we can re-use this exception in the next commits. Note that we may still care if a broadcast message has expired, since it's not expected that someone would send one in this condition.	2021-04-19 17:13:05 +01:00
Rebecca Law	34a378a60e	Update the Zendesk ticket content for `check_if_letters_still_in_created` The message to Zendesk includes a list of notification ids, this isn't really necessary and is included in the run book. Creation of the Zendesk ticket can fail if the message is too long, removing the list of ids can prevent that from happening.	2021-04-19 10:47:25 +01:00
Ben Thorner	be02573147	Fix apply_async not working with positional kwargs Celery's apply_async function accepts 'kwargs' as (get ready to be confused) either a positional argument, or a keyword argument: Positional: apply_async(['args'], {'kw': 'args'}) Keyword: apply_async(args=['args'], kwargs={'kw': 'args'}) We rely on the positional form in at least one place [1]. This fixes the overload of apply_async to cope with both forms, and continue to pass through any other (confusion time again) keyword args to super(), such as queue="queue". Note that we've also decided to stop accepting other positional args, since this is unnecessarily confusing, and we don't currently rely on it in our code. This stops it creeping in in future. [1]: `fde927e00e/app/job/rest.py (L186)`	2021-04-15 17:21:21 +01:00
Ben Thorner	fde927e00e	Merge pull request #3205 from alphagov/celery-consistency-tweaks Small refactor for new Celery / StatsD code	2021-04-15 11:40:42 +01:00
Ben Thorner	f85dad5acf	Merge pull request #3203 from alphagov/remove-statsd-decorators Remove redundant @statsd timing decorators	2021-04-14 10:04:04 +01:00
Ben Thorner	5eb265138b	Remove unnecessary statsd_client parameter It turns out this is available from the app object [1], and we were already assuming this in the tests. [1]: `48c6c822e8/notifications_utils/clients/statsd/statsd_client.py (L52)`	2021-04-13 15:12:55 +01:00
Ben Thorner	ec6d87cd0f	Simplify argument passing in apply_async This avoids the need to keep in-sync with any future changes to the signature, and reduces the amount of irrelevant code to read.	2021-04-13 15:12:45 +01:00
David McDonald	2e6d761691	Merge pull request #3204 from alphagov/broadcast-envars Broadcast envars	2021-04-12 17:25:15 +01:00
David McDonald	295162c81d	Move CBC proxy enable check This change will make our development environments closer to production even if they aren't hooked up to the CBC proxy lambda functions. Now in development, we will create the broadcast event and create tasks for each broadcast provider event. We will still not create actual broadcast provider message rows in the DB and talk to the CBC proxies. This should be helpful in development to catch any issues we introduce to do with sending broadcast messaging. In time we may wish to have some fake CBC proxies in the AWS tools account that we can interact with to make it even more realistic.	2021-04-12 17:05:41 +01:00
Ben Thorner	e3e067c795	Remove redundant @statsd timing decorators These are superseded by timing task execution generically in the NotifyTask superclass [1]. Note that we need to wait until we've gathered enough data under the new metrics before removing these. [1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376	2021-04-12 15:19:18 +01:00
Ben Thorner	3e507eea55	Merge pull request #3201 from alphagov/revamp-celery-stats Migrate towards new metrics for Celery tasks	2021-04-12 15:04:37 +01:00
Ben Thorner	ab8dd6d52c	Duplicate metrics to StatsD for Celery tasks Previously we used a '@statsd' decorator to time and count Celery tasks [1]. Using a decorator isn't ideal since we need to remember to add it to every task we define. In addition, it's not possible to use data like the task name and queue. In order to avoid breaking existing stats, this duplicates them as new StatsD metrics until we have sufficient data to update dashboards using the old ones. Using the CeleryTask superclass to send metrics avoids a future maintenance overhead, and means we can include more useful data in the StatsD metric. Note that the new metrics will sit in StatsD until we add a mapping for them [2]. StatsD automatically produces a 'count' stat for timing metrics, so we don't need to increment a separate counter for successful tasks. [1]: `dea5828d0e/app/celery/tasks.py (L65)` [2]: https://github.com/alphagov/notifications-aws/blob/master/paas/statsd/statsd-mapping.yml	2021-04-08 18:02:53 +01:00
Ben Thorner	248f5a0708	Include queue name in Celery task logs This is mainly so we can use it in the new metrics we send to StatsD in the following commits, but it should also be useful in the logs. I've taken the opportunity to make the log format consistent between success / failure, and with our Template Preview app [1]. [1]: `f456433a5a/app/celery/celery.py (L19)`	2021-04-08 18:02:51 +01:00
Ben Thorner	19be4faf45	Switch to monotonic time for task logs This matches the approach we take in utils [1]. Monotonic time is better because it avoids weird negative results due to clock shift. [1]: `5d18ebd796/notifications_utils/statsd_decorators.py (L14)`	2021-04-08 13:00:24 +01:00
Ben Thorner	054205835b	Remove unused metric for SQS apply duration This was added as part of a wider performance investigation [1]. I checked with Leo, who made the change, and while the other metrics are still be useful, there's no reason to keep this one. [1]: `6e32ca5996 (diff-76936416943346b5f691dac57a64acebc6a1227293820d1d9af4791087c9fb9eR23)`	2021-04-08 13:00:21 +01:00
Leo Hemsted	4a5b1c23bd	only send zendesk P1 for alerts we don't need to be re-notified when someone clicks cancel	2021-04-08 12:22:18 +01:00
Leo Hemsted	9bd8c0239c	look for 'live', not 'production' config['NOTIFY_ENVIRONMENT'] is hardcoded to `'live'` in the Live config class. The values as seen on the environment which we send real messages from: ``` >>> json.loads(os.environ['VCAP_APPLICATION'])['space_name'] # what cloudfoundry sets 'production' >>> os.environ['NOTIFY_ENVIRONMENT'] # we set this from cloudfoundry 'production' >>> current_app.config['NOTIFY_ENVIRONMENT'] # hardcoded in the Live config 'live' >>> current_app.config['NOTIFICATION_QUEUE_PREFIX'] # pulled from env var of same name 'live' >>> current_app.config['ENV'] # this is an unrelated flask variable 'production' ```	2021-04-08 12:17:22 +01:00
Leo Hemsted	df393e36c5	send a p1 when a broadcast goes out on production it's important to keep tabs on when these things leave our system. Sending a zendesk ticket that triggers a P1 is probably our simplest way of notifying the team when this happens (it's what we do with out of hours emergencies on the admin app too). We don't have any direct pagerduty integrations from the api app, but we already have the zendesk client hooked up. After broadcasts go live, we may want to change this to a P2 (but even then, there's arguments for keeping it P1 to start with I think). Don't cause a P1 if it goes out on staging as that might be MNOs testing.	2021-04-06 11:32:19 +01:00
Katie Smith	3bfd084a77	Simplify send_delivery_status_to_service code Now that https://github.com/alphagov/notifications-api/pull/3184 has been deployed for a while, the `send_delivery_status_to_service` task will always have `template_id` and `template_version` being passed in. This means we don't need to check if those fields are there.	2021-03-30 08:50:42 +01:00
David McDonald	6d410daae4	Remove the emergency alerts canary See https://github.com/alphagov/notifications-broadcasts-infra/pull/197 for why we no longer need this and we get to delete some code!	2021-03-26 18:31:53 +00:00
Pea Tyczynska	52c529ab3a	Use personalisation to set client_reference for letters which were sent through Notify interface only. This is done to avoid performance dip from additional operation for other notification types.	2021-03-24 14:55:10 +00:00
Ben Thorner	b2b14f39a3	Merge pull request #3183 from alphagov/remove-crown-letter-filename Remove non/crown indicator in letter filenames	2021-03-24 13:06:58 +00:00
Katie Smith	27b3cece7d	Send template id and version with delivery status callback This adds the `template_id` and `template_version` fields to the data sent to services from the `send_delivery_status_to_service` task. We need to account for the task not being passed these fields at first since there might be tasks retrying which don't have that data. Once all tasks have been called with the new fields we can then update the code to assume they are always there. Since we only send delivery status callbacks for SMS and emails, I've removed the tests where we call that task with letters.	2021-03-24 10:55:45 +00:00
Ben Thorner	8219b3c032	Remove non/crown indicator in letter filenames This is not required by DVLA and since [1] we no longer care about the end of letter filenames when collating them, so removing it is safe to do. Note that the name of the ZIP files of collated letters is based on a hash of the filenames, which needed updating in tests. Before merging this we need to do a test run in Staging, so DVLA can check that a mixture of the old / new filenames won't cause issues. [1]: https://github.com/alphagov/notifications-api/pull/3172	2021-03-18 13:05:12 +00:00
Katie Smith	3b78f863d5	Check for incomplete pending jobs We have a scheduled task that was checking for jobs still in progress. We saw a case where a scheduled job was stuck in a `pending` status as a result of an app shutting down. This changes the `check_job_status` task so that it also checks for scheduled jobs which are still pending after 30 minutes.	2021-03-18 08:24:36 +00:00
Ben Thorner	c76e789f1e	Reduce extra S3 ops when working with letter PDFs Previously we did some unnecessary work: - Collate task. This had one S3 request to get a summary of the object, which was then used in another request to get the full object. We only need the size of the object, which is included in the summary [1]. - Archive task. This had one S3 request to get a summary of the object, which was then used to make another request to delete it. We still need both requests, but we can remove the S3.Object in the middle. [1]: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#objectsummary	2021-03-16 12:53:13 +00:00
Leo Hemsted	6784ae62a6	Raise Exception if letter PDF not in S3 Previously, the function would just return a presumed filename. Now that it actually checks s3, if the file doesn't exist it'll raise an exception. By default that's a StopIteration at the end of the bucket iterator, which isn't ideal as this will get supressed if the function is called within a generator loop further up or anything. There are a couple of places where we expect the file may not exist, so we define a custom exception to rescue specifically here. I did consider subclassing boto's ClientError, but this wasn't straightforward as the constructor expects to know the operation that failed, which for me is a signal that it's not an appropriate (re-)use of the class.	2021-03-15 17:18:11 +00:00
Ben Thorner	b43a367d5f	Relax lookup of letter PDFs in S3 buckets Previously we generated the filename we expected a letter PDF to be stored at in S3, and used that to retrieve it. However, the generated filename can change over the course of a notification's lifetime e.g. if the service changes from crown ('.C.') to non-crown ('.N.'). The prefix of the filename is stable: it's based on properties of the notification - reference and creation - that don't change. This commit changes the way we interact with letter PDFs in S3: - Uploading uses the original method to generate the full file name. The method is renamed to 'generate_' to distinguish it from the new one. - Downloading uses a new 'find_' method to get the filename using just its prefix, which makes it agnostic to changes in the filename suffix. Making this change helps to decouple our code from the requirements DVLA have on the filenames. While it means more traffic to S3, we rely on S3 in any case to download the files. From experience, we know S3 is highly reliable and performant, so don't anticipate any issues. In the tests we favour using moto to mock S3, so that the behaviour is realistic. There are a couple of places where we just mock the method, since what it returns isn't important for the test. Note that, since the new method requires a notification object, we need to change a query in one place, the columns of which were only selected to appease the original method to generate a filename.	2021-03-15 13:55:44 +00:00
David McDonald	41d95378ea	Remove everything for the performance platform We no longer will send them any stats so therefore don't need the code - the code to work out the nightly stats - the performance platform client - any configuration for the client - any nightly tasks that kick off the sending off the stats We will require a change in cronitor as we no longer will have this task run meaning we need to delete the cronitor check.	2021-03-15 12:04:53 +00:00
David McDonald	8325431462	Move saving of processing time into separate task We current do this as part of send-daily-performance-platform-stats but now this moves it into its own separate task. This is for two reasons - we will shortly get rid of the send-daily-performance-platform-stats task as we no longer will need to send anything to performance platform - even if we did decide to keep the task send-daily-performance-platform-stats and remove the specific bits that relate to the performance platform, it's probably nicer to rewrite the new task from scratch to make sure it's all clear and easy to understand	2021-03-15 11:44:01 +00:00
Ben Thorner	0379d721e5	Add missing statsd timers to celery tasks All other tasks in app/celery/*_tasks.py have timers on them. Some of these timers will be useful to check before/after performance as a way to reassure ourselves about the impact of [1]. [1]: https://github.com/alphagov/notifications-api/pull/3172	2021-03-12 12:32:22 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Rebecca Law	acfb759cb9	Change DVLA_EMAIL_ADDRESS to a list	2021-02-26 11:21:16 +00:00
David McDonald	82e5a1804b	Merge pull request #3155 from alphagov/migrate-broadcast-settings Backfill services_broadcast_settings table	2021-02-25 12:16:36 +00:00
Pea Tyczynska	4fc3af9811	Add date to personalisation for DVLA email Personalisation was missing date attribute. The email still got sent tonight, just it didn't have a value for date placeholder.	2021-02-24 10:22:22 +00:00
David McDonald	3ea86bfb48	Remove hardcoded default to use test channel There is no need for a default now as every broadcast service has set on it which broadcast channel to use.	2021-02-23 17:15:07 +00:00
Pea Tyczynska	e0c73ac342	Send daily email with letter and sheet volumes to DVLA	2021-02-23 15:13:19 +00:00
Pea Tyczynska	6dab63130d	Make import order alphabetical	2021-02-23 15:13:19 +00:00
Leo Hemsted	0088bcd98b	only retry if the broadcast message task is in sending previously we would retry if the task was queued up for retry but the status is in "received-ack" or "received-err". We don't expect that a task will be retried after getting this status, but if there are duplicate tasks that could happen. Lets plan for the worst by saying "only process a retry if the task is currently in sending". this way, if a duplicate task is on retry and the first task goes through succesfully, the duplicate task will give up.	2021-02-18 12:03:36 +00:00
Ben Thorner	474b93f183	Remove redundant (renamed) letters task This was renamed in [1], and enough time has elapsed that instances of the task should all have finished processing. [1]: `5d6f2da155`	2021-02-17 12:57:50 +00:00
Katie Smith	6b8ebb3421	Fix linting errors	2021-02-16 09:03:38 +00:00
Leo Hemsted	fed0d4c40e	Merge pull request #3137 from alphagov/revert-revert-revert Bring back retry logic	2021-02-15 12:21:13 +00:00
David McDonald	a1e539e785	Merge pull request #3132 from alphagov/created-letters-runbook Improvements to our letter checking tasks	2021-02-12 16:30:42 +00:00
David McDonald	5526c89c34	Rename task and function for clarity This doesn't just relate to precompiled letters, it's actually just checking that there are not any letters still waiting for a virus check that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:53 +00:00
David McDonald	1b9d8252ec	Rename task and function for clarity This doesn't just relate to templated letters, it's actually just checking that there are not any letters still in created that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:52 +00:00

... 3 4 5 6 7 ...

1116 Commits