notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2026-01-12 13:41:19 -05:00

Author	SHA1	Message	Date
David McDonald	a1e539e785	Merge pull request #3132 from alphagov/created-letters-runbook Improvements to our letter checking tasks	2021-02-12 16:30:42 +00:00
David McDonald	5526c89c34	Rename task and function for clarity This doesn't just relate to precompiled letters, it's actually just checking that there are not any letters still waiting for a virus check that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:53 +00:00
David McDonald	1b9d8252ec	Rename task and function for clarity This doesn't just relate to templated letters, it's actually just checking that there are not any letters still in created that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:52 +00:00
Katie Smith	5eebcf6452	Put service callback retries on a different queue At the moment, if a service callback fails, it will get put on the retry queue. This causes a potential problem though: If a service's callback server goes down, we may generate a lot of retries and this may then put a lot of items on the retry queue. The retry queue is also responsible for other important parts of Notify such as retrying message delivery and we don't want a service's callback server going down to have an impact on the rest of Notify. Putting the retries on a different queue means that tasks get processed faster than if they were put back on the same 'service-callbacks' queue.	2021-02-09 13:31:16 +00:00
Richard Baker	6256cdf792	Add proxy client for o2 cell croadcasting o2 use One-2-many CBC so we can use the O2M/CAP client. Once differences between CBCs have been worked out we can consolidate O2M clients to reduce duplication. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-01-26 11:11:44 +00:00
David McDonald	9c01d8018d	Merge pull request #3093 from alphagov/broadcast-tasks-onto-worker Broadcast tasks onto worker	2021-01-15 15:51:35 +00:00
David McDonald	060ee54a74	Enable Three CBC	2021-01-14 11:52:23 +00:00
David McDonald	78db0f9c2b	Add broadcasts worker and queue This worker will be responsible for handing all broadcasts tasks. It is based on the internal worker which is currently handling broadcast tasks. Concurrency of 2 has been chosen fairly arbitrarily. Gunicorn will be running 4 worker processes so we will end up with the ability to process 8 tasks per app instance given this.	2021-01-13 16:35:27 +00:00
Pea Tyczynska	9e4176ac50	Add Vodafone client to list of allowed CBCs	2020-12-08 09:51:21 +00:00
Leo Hemsted	e2fa0116a0	add CBC_PROXY_ENABLED config flag to control if tasks are triggered previously we made some incorrect assumptions about set-up on staging and prod - they currently don't have any cbc_proxy aws creds at all. We shoudn't be attempting canaries or link tests when there's no AWS infrastructure to connect to. We also shouldn't bother writing a row into the database at all for the broadcast_provider_message since we're not even attempting to send, and we shouldn't get confused between messages that failed and messages we never wanted to send at all.	2020-11-26 10:16:22 +00:00
Leo Hemsted	7cc83e04eb	move BroadcastProvider from models.py to config.py It's not something that is tied to a database table, and was causing circular import issues	2020-11-19 15:50:37 +00:00
Leo Hemsted	bc3512467b	send messages to multiple providers at the moment only EE is enabled (this is set in app.config, but also, only EE have a function defined for them so even if another provider was enabled without changing the dict in cbc_proxy.py we won't trigger anything). this commit just adds wrapper tasks that check what providers are enabled, and invokes the send function for each provider. The send function doesn't currently distinguish between providers for now - as we only have EE set up. in the future we'll want to separate the cbc_proxy_client into separate clients for separate providers. Different providers have different lambda functions, and have different requirements. For example, we know that the two different CBC software solutions handle references to previous messages differently.	2020-11-19 15:50:37 +00:00
Pea Tyczynska	60bd9a6f82	Give providers equal shares of traffic This is done on a temporary basis for billing-related reasons.	2020-11-19 10:28:42 +00:00
Toby Lorne	00a1ba4b41	celery: link test less often This is causing the disk of the CBCs to fill up quickly, and their logrotate seems a bit flakey Reducing the rate will ensure the disks fill up less often Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-11-06 13:37:39 +00:00
Rebecca Law	29b6f84f6c	Revert "Revert "Add a task to save-api-sms for high volume services.""	2020-10-29 11:12:46 +00:00
Toby Lorne	dda71bf685	celery: add task for triggering link tests We want to periodically kick off some link tests, so that: - we are periodically communicating with the CBC Proxies - each CBC Proxy is periodically communicating with its CBC This simulation of traffic to the CBC will give us advance warning if something is going to break, or is broken, before someone tries to send a real live message Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk>	2020-10-27 15:39:28 +00:00
Toby Lorne	be90455944	Add task to send canary to cbc proxy Create and schedule a Celery task that tests if we can send a canary message to cbc proxy. This will help us know if something happens to our connection to cbc proxy. Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk> Co-authored-by: Richard <richard.baker@digital.cabinet-office.gov.uk>	2020-10-27 10:38:09 +00:00
Toby Lorne	fdacb2e0d7	broadcasts: remove references to cbc stub We are phasing out our cbc-proxy stub which displayed CAP XML messages We are in the process of testing with real CBCs, so maintaining our own stub is not useful This commit * removes the HTTP POST requests to the CBC proxy * writes up the update/cancel methods of the cbc_client (not impl) Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-10-26 10:23:49 +00:00
Toby Lorne	1badf93f0a	config: correctly get cbc_proxy_client env vars Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-10-22 12:11:22 +01:00
Toby Lorne	33ea75930a	clients: add cbc proxy clients We are going to invoke a lambda to send a message to the CBC We need a CBC Proxy Client to do this The Client will be able to send/update/cancel broadcasts in the CBC Unless we have configured the app with AWS credentials for the CBCProxyClient, we just want to use a client that does nothing: the noop client The AWS access keys are separate for the CBC Proxy vs other Notify AWS things because the CBC Proxy lives in another AWS account Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk> Co-authored-by: Pea <pea.tyczynska@digital.cabinet-office.gov.uk> Co-authored-by: Katie <katie.smith@digital.cabinet-office.gov.uk>	2020-10-20 11:23:16 +01:00
Rebecca Law	cea46a0758	On the crazy weekend we checked for scheduled jobs every minute. This changes that scheduled task to run at the top of the hour, 15, 30, and 45 minutes past	2020-10-01 14:14:08 +01:00
Rebecca Law	e86c6b23c4	Make run-scheduled jobs to run every minute	2020-09-27 11:30:12 +01:00
David McDonald	f9911c7965	Send broadcast invite email for broadcast service invites This means the copy is more accurate and mentions sending emergency alerts rather than previous copy about sending emails texts and letters.	2020-09-15 16:47:41 +01:00
Pete Herlihy	7db4a882ce	Converting resting provider split to 60/40	2020-09-08 10:17:27 +01:00
David McDonald	2daa945ef6	Turn redis back on	2020-08-11 14:21:57 +01:00
David McDonald	7b6bd21f19	Turn off redis for cred rotation Short turn off in all environments to enable us to rotate creds. We will then immediately follow up with enabling it back on.	2020-08-10 10:38:56 +01:00
David McDonald	6f66a63d67	Remove unneeded hardcoding of REDIS_ENABLED This keeps things consistent with the live environment and also how we do it for the admin app where it is entirely up to environment variables whether redis is enabled or not. This changes nothing in terms of functionality as currently in our environment variables redis is enabled for the API in staging.	2020-08-10 10:36:11 +01:00
Toby Lorne	fd389d5c76	cbc: use one stub cbc per environment so we can break preview without breaking staging or production related https://github.com/alphagov/notifications-cbc-proxy/pull/7 Signed-off-by: Toby Lorne <toby.lornewelch-richards@digital.cabinet-office.gov.uk>	2020-07-21 13:39:10 +01:00
Rebecca Law	8bba04e1ce	Update the MMG/Firetext ratio to 70/30 as requested.	2020-07-15 14:07:19 +01:00
Pea M. Tyczynska	fbdfa6416f	Merge pull request #2921 from alphagov/remove-statsd-http-api-decorators Remove statsd http api decorators and turn statsd back on for celery apps	2020-07-14 10:16:44 +01:00
Leo Hemsted	eca37d0853	add send_broadcast_message task task takes a brodcast_message_id, and makes a post to the cbc-proxy for now, hardcode the url to the notify stub. the stub requires template as the admin/api get it, so use the marshmallow schema to json dump it. Note - this also required us to tweak the BroadcastMessage.serialize function so that it converts uuids in to ids - flask's jsonify function does that for free but requests.post doesn't sadly. if the request fails (either 4xx or 5xx) just raise an exception and let it bubble up for now - in the future we'll add retry logic	2020-07-09 18:23:30 +01:00
Pea Tyczynska	fff004da2b	Turn statsd back on, but not for api main app, only for delivery workers.	2020-07-09 10:10:24 +01:00
Pea Tyczynska	e02508d6f7	Turn off the sms stub and email stub in staging	2020-07-02 17:23:50 +01:00
Chris Hill-Scott	3ffdb3093b	Revert "Revert "Merge pull request #2887 from alphagov/cache-the-serialised-things"" This reverts commit `7e85e37e1d`.	2020-06-26 14:10:12 +01:00
Chris Hill-Scott	7e85e37e1d	Revert "Merge pull request #2887 from alphagov/cache-the-serialised-things" This reverts commit `b8c2c6b291`, reversing changes made to `351aca2c5a`.	2020-06-26 13:42:44 +01:00
Chris Hill-Scott	b8c2c6b291	Merge pull request #2887 from alphagov/cache-the-serialised-things Serialise and cache services and API keys	2020-06-26 09:18:45 +01:00
Leo Hemsted	bb05dcf221	disable statsd	2020-06-24 16:35:44 +01:00
Chris Hill-Scott	6a9818b5fd	Cache services and API keys in memory Same as we’ve done for templates. For high volume services this should mean avoiding calls to external services, either the database or Redis. TTL is set to 2 seconds, so that’s the maximum time it will take for revoking an API key or renaming a service to propagate. Some of the tests created services with the same service ID. This caused intermittent failures because the cache relies on unique service IDs (like we have in the real world) to key itself.	2020-06-24 08:46:13 +01:00
David McDonald	48abc8ffb5	Turn on email stub for load testing	2020-06-23 11:09:24 +01:00
Leo Hemsted	728e5eee24	re-enable statsd (and cache statsd DNS lookup) see https://github.com/alphagov/notifications-utils/pull/752 for implementation details. Cache DNS for 15 seconds. Note: This cache is per eventlet, so concurrent requests will handle their requests separately, but the cache does persist between sequential requests. re-enable statsd for all environments.	2020-06-18 11:20:00 +01:00
David McDonald	92c3170fe3	Turn off statsd for the API in all environments We have seen bad latency across all apps in the last week. After running a canary with statsd turned off we have seen these issues dissapear on that canary instance. We therefore will turn off statsd for all instances of the API. We will still need to investigate tomorrow what exactly changed or is causing the issue with statsd as we hadn't made any changes to it ourself. Note, this keeps statsd on for all the other apps but this will be a first step. We will also need to check performance of the other apps after releasing this.	2020-06-17 17:31:36 +01:00
Pea Tyczynska	bb3672100f	Turn off email stub on staging To check if it was our congestion point for the soak test. (Email stub is a bit slower to respond than Amazon SES, and the difference could lead to us having to many connections to db open as we wait for response from the stub)	2020-06-09 11:21:48 +01:00
Pea Tyczynska	2aa6470817	Point API for staging at email and sms stubs for the soak tests. This is done to avoid sending real email and sms and incurring unnecessary charges while we run the soak tests.	2020-06-08 17:14:16 +01:00
Pea Tyczynska	8b7a0b88cb	Ensure that aws ses stub client is not run in production	2020-06-04 15:43:47 +01:00
Pea Tyczynska	6422a88c8c	Stub SES email client to avoid hitting SES during load testing If we set an environment variable, we can stub out calls to SES and send them to our own stub app. If the environment variable is not set, things work as normal. To be used alongside https://github.com/alphagov/notifications-email-provider-stub	2020-06-03 11:11:43 +01:00
Pete Herlihy	8a67e14e2b	Moving SMS supplier resting point to 50/50	2020-06-02 12:27:14 +01:00
Pea Tyczynska	3a00c19390	Polish and test the small task that updates billable units for letter	2020-05-11 13:32:09 +01:00
Pea Tyczynska	24a89c1c19	Modify tasks for getting letter pdf and updating billable units So that they talk with new template preview task for pdf creation	2020-05-11 13:30:59 +01:00
David McDonald	a237162106	Reduce concurrency and prefetch count of reporting celery app We have seen the reporting app run out of memory multiple times when dealing with overnight tasks. The app runs 11 worker threads and we reduce this to 2 worker threads to put less pressure on a single instance. The number 2 was chosen as most of the tasks processed by the reporting app only take a few minutes and only one or two usually take more than an hour. This would mean with 2 processes across our current 2 instances, a long running task should hopefully only wait behind a few short running tasks before being picked up and therefore we shouldn't see large increase in overall time taken to run all our overnight reporting tasks. On top of reducing the concurrency for the reporting app, we also set CELERYD_PREFETCH_MULTIPLIER=1. We do this as suggested by the celery docs because this app deals with long running tasks. https://docs.celeryproject.org/en/3.1/userguide/optimizing.html#optimizing-prefetch-limit The chance in prefetch multiplier should again optimise the overall time it takes to process our tasks by ensuring that tasks are given to instances that have (or will soon have) spare workers to deal with them, rather than committing to putting all the tasks on certain workers in advance. Note, another suggestion for improving suggested by the docs for optimising is to start setting `ACKS_LATE` on the long running tasks. This setting would effectively change us from prefetching 1 task per worker to prefetching 0 tasks per worker and further optimise how we distribute our tasks across instances. However, we decided not to try this setting as we weren't sure whether it would conflict with our visibility_timeout. We decided not to spend the time investigating but it may be worth revisiting in the future, as long as tasks are idempotent. Overall, this commit takes us from potentially having all 18 of our reporting tasks get fetched onto a single instance to now having a process that will ensure tasks are distributed more fairly across instances based on when they have available workers to process the tasks.	2020-04-28 10:47:46 +01:00
Leo Hemsted	ad419f7592	restart reporting worker after each task The reporting worker tasks fetch large amounts of data from the db, do some processing then store back in the database. As the reporting worker only processes the create nightly billing/stats table tasks, which aren't high performance or high volume, we're fine with the performance hit from restarting the worker between every task (which based on limited local testing takes about a second or so). This causes some real funky shit with the app_context (used for accessing current_app.logger). To access flask's global state we use the standard way of importing `from flask import current_app`. However, processes after the first one don't have the current_app available on shut down (they're fine during the actual task running), and are unable to call `with current_app.app_context()` to create it. They _are_ able to call `with app.app_context()` to create it, where `app` is the initial app that we pass in to `NotifyCelery.init_app`. NotifyCelery.init_app is only called once, in the master process - I think the application state is then stored and passed to the celery workers. But then it looks like the teardown might clear it, but it never gets set up again for the new workers? Unsure. To fix this, store a copy of the initial flask app on the NotifyCelery object and then use that from within the shutdown signal logging function. Nothing's ever easy ¯\_(ツ)_/¯	2020-04-24 12:28:25 +01:00

1 2 3 4 5 ...

297 Commits