notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-20 23:41:17 -05:00

Author	SHA1	Message	Date
Leo Hemsted	c4dc0f64c5	dd sqlalchemy connection metrics for celery tasks grab the worker app name and task name rather than the web host and endpoint. also add a fallback for if we're not in a web request or a celery task. I think that'll probably happen when we use alembic, or if we do things from within flask shell	2020-06-12 14:52:22 +01:00
Leo Hemsted	6e32ca5996	add prometheus metrics for connection pools (both web and sql) add the following prometheus metrics to keep track of general app instance health. # sqs_apply_async_duration how long does the actual SQS call (a standard web request to AWS) take. a histogram with default bucket sizes, split up per task that was created. # concurrent_web_request_count how many web requests is this app currently serving. this is split up per process, so we'd expect multiple responses per app instance # db_connection_total_connected how many connections does this app (process) have open to the database. They might be idle. # db_connection_total_checked_out how many connections does this app (process) have open that are currently in use by a web worker # db_connection_open_duration_seconds a histogram per endpoint of how long the db connection was taken from the pool for. won't have any data if a connection was never opened.	2020-06-12 14:52:22 +01:00
Leo Hemsted	ad419f7592	restart reporting worker after each task The reporting worker tasks fetch large amounts of data from the db, do some processing then store back in the database. As the reporting worker only processes the create nightly billing/stats table tasks, which aren't high performance or high volume, we're fine with the performance hit from restarting the worker between every task (which based on limited local testing takes about a second or so). This causes some real funky shit with the app_context (used for accessing current_app.logger). To access flask's global state we use the standard way of importing `from flask import current_app`. However, processes after the first one don't have the current_app available on shut down (they're fine during the actual task running), and are unable to call `with current_app.app_context()` to create it. They _are_ able to call `with app.app_context()` to create it, where `app` is the initial app that we pass in to `NotifyCelery.init_app`. NotifyCelery.init_app is only called once, in the master process - I think the application state is then stored and passed to the celery workers. But then it looks like the teardown might clear it, but it never gets set up again for the new workers? Unsure. To fix this, store a copy of the initial flask app on the NotifyCelery object and then use that from within the shutdown signal logging function. Nothing's ever easy ¯\_(ツ)_/¯	2020-04-24 12:28:25 +01:00
Katie Smith	ceb7cee009	Pass request_id to tasks if available We want to pass the `request_id` to Celery tasks if the task is called from an HTTP request, so that we can add the `request_id` to the logs. This change overwrites `apply_async` to add the `request_id` to the kwargs if available. When we call the task, we then add the `request_id` to g on Flask's application context. Tasks called from `send_task` won't have a `request_id` for now, and this change only affects tasks called from HTTP requests (not from other tasks or from Celery Beat).	2019-10-28 10:59:25 +00:00
Leo Hemsted	6e87b36303	remove duplication shutdown loggers also add **kwargs to make it celery4 compatible	2018-07-20 12:09:00 +01:00
Rebecca Law	385653af44	Added a new exception type for DVLAException. The Notify team needs to investigate when a notification is marked as failed. We will process the whole file and mark the notifications with the appropriate status, if any are failed an exception is raised. The exception will trigger a cloud watch error for the team to investigate.	2018-02-22 15:05:37 +00:00
Alexey Bezhan	599e611278	Create an app context for each celery task run Celery tasks require an active app context in order to use app logger, database or app configuration values. Since there's no builtin support we create this context by using a custom celery task class NotifyTask. However, NotifyTask was using `current_app`, which itself is only available within an app context so the code was pushing an initial context in run_celery.py. This works with the default prefork pool implementation, but raises a "working outside of application context" error with an eventlet pool since that initial application context is local to a thread and eventlet celery worker pool will create multiple green threads. To avoid this, we bind NotifyTask to the app variable with a closure and use that variable instead of `current_app` to create the context for executing the task. This avoids any issues caused by shared initial app context being lost when spawning additional worker threads. We still need to keep the context push in run_celery.py for prefork workers since it's required for logging events outside of tasks (eg logging during `worker_process_shutdown` signal processing).	2018-02-13 16:44:30 +00:00
Richard Chapman	632364633b	Fixed typo in call, should be self.name not Task.name. Task.name always returned None.	2018-02-08 17:47:59 +00:00
Richard Chapman	2d670e8cf0	Updated utils to the latest version. This version of utils has less logging at info level and as such no longer prints out the celery task timing which are found to be use to find out if a tasks has been called but also the timing for the task. Added an extra timing message for celery tasks so that it can be determined if the these are less frequent than the API calls and provide more useful information	2018-02-05 14:58:02 +00:00
Leo Hemsted	2622866622	log unhandled celery exceptions they were always caught locally by celery's base handler, however, we weren't logging them ourselves, which meant it wouldn't be put into the json logs that are sent to cloudwatch.	2017-08-31 12:52:06 +01:00
Leo Hemsted	6c61a3fc2a	Revert celery4 Revert the following three pull requests: https://github.com/alphagov/notifications-api/pull/1085 https://github.com/alphagov/notifications-api/pull/1086 https://github.com/alphagov/notifications-api/pull/1088 celery 4.0.2 looked promising, however, on staging under mild load (5/sec api calls) the performance was actually worse than 3.1.25	2017-07-19 15:17:19 +01:00
Leo Hemsted	d577855eb3	remove `sqs` from region - this is only needed in boto2 We don't use boto2 on the api anymore, not since celery 4.0.2 Note - if you run locally with boto2 still installed you'll see errors that complain about things like: boto.exception.SQSError: SQSError: 403 Forbidden <?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>SignatureDoesNotMatch</Code><Message>Credential should be scoped to a valid region, not 'queue'. </Message><Detail/></Error><RequestId>52207ca4-9131-58cb-89ae-2d45f06623a3</RequestId></ErrorResponse> If so, make sure boto2 is completely uninstalled.	2017-07-13 13:16:28 +01:00
Leo Hemsted	1a03248317	temp fix to sort out circular imports	2017-07-12 13:02:19 +01:00
Martyn Inglis	0e9e8955f7	Finished celery refactor - set up config for queue prefix LEO notes: Also made sure the Test BROKER_URL is preserved so that tests warn you when celery isn't mocked out	2017-07-12 12:37:18 +01:00
Martyn Inglis	786adb5d71	Move Queuenames in with the celery code, revamp config to allow move to celery 4.x	2017-07-12 12:01:52 +01:00
Martyn Inglis	e0106eb1be	hacked celery4.0.2 in. Runs and works - note though this version of master I branched had split head on sqlalchemey. This needs a new master merge to fix	2017-07-12 11:58:04 +01:00
Martyn Inglis	fcc1585fdf	Merge branch 'master' into celery-spike Conflicts: app/__init__.py app/notifications/rest.py config.py wsgi.py	2016-02-12 09:36:49 +00:00
Martyn Inglis	fb41acdac9	Celery tests	2016-02-09 13:31:45 +00:00

18 Commits