Commit Graph

18 Commits

Author SHA1 Message Date
Leo Hemsted
c4dc0f64c5 dd sqlalchemy connection metrics for celery tasks
grab the worker app name and task name rather than the web host and
endpoint. also add a fallback for if we're not in a web request or a
celery task. I think that'll probably happen when we use alembic, or if
we do things from within flask shell
2020-06-12 14:52:22 +01:00
Leo Hemsted
6e32ca5996 add prometheus metrics for connection pools (both web and sql)
add the following prometheus metrics to keep track of general app
instance health.

 # sqs_apply_async_duration

how long does the actual SQS call (a standard web request to AWS) take.
a histogram with default bucket sizes, split up per task that was
created.

 # concurrent_web_request_count

how many web requests is this app currently serving. this is split up
per process, so we'd expect multiple responses per app instance

 # db_connection_total_connected

how many connections does this app (process) have open to the database.
They might be idle.

 # db_connection_total_checked_out

how many connections does this app (process) have open that are
currently in use by a web worker

 # db_connection_open_duration_seconds

a histogram per endpoint of how long the db connection was taken from
the pool for. won't have any data if a connection was never opened.
2020-06-12 14:52:22 +01:00
Leo Hemsted
ad419f7592 restart reporting worker after each task
The reporting worker tasks fetch large amounts of data from the db, do
some processing then store back in the database. As the reporting worker
only processes the create nightly billing/stats table tasks, which
aren't high performance or high volume, we're fine with the performance
hit from restarting the worker between every task (which based on
limited local testing takes about a second or so).

This causes some real funky shit with the app_context (used for
accessing current_app.logger). To access flask's global state we use the
standard way of importing `from flask import current_app`. However,
processes after the first one don't have the current_app available on
shut down (they're fine during the actual task running), and are unable
to call `with current_app.app_context()` to create it. They _are_ able
to call `with app.app_context()` to create it, where `app` is the
initial app that we pass in to `NotifyCelery.init_app`.

NotifyCelery.init_app is only called once, in the master process - I
think the application state is then stored and passed to the celery
workers. But then it looks like the teardown might clear it, but it
never gets set up again for the new workers? Unsure.

To fix this, store a copy of the initial flask app on the NotifyCelery
object and then use that from within the shutdown signal logging
function.

Nothing's ever easy ¯\_(ツ)_/¯
2020-04-24 12:28:25 +01:00
Katie Smith
ceb7cee009 Pass request_id to tasks if available
We want to pass the `request_id` to Celery tasks if the task is called
from an HTTP request, so that we can add the `request_id` to the logs.
This change overwrites `apply_async` to add the `request_id` to the
kwargs if available. When we call the task, we then add the `request_id`
to g on Flask's application context.

Tasks called from `send_task` won't have a `request_id` for now, and
this change only affects tasks called from HTTP requests (not from other
tasks or from Celery Beat).
2019-10-28 10:59:25 +00:00
Leo Hemsted
6e87b36303 remove duplication shutdown loggers
also add **kwargs to make it celery4 compatible
2018-07-20 12:09:00 +01:00
Rebecca Law
385653af44 Added a new exception type for DVLAException.
The Notify team needs to investigate when a notification is marked as failed.
We will process the whole file and mark the notifications with the appropriate status, if any are failed an exception is raised.
The exception will trigger a cloud watch error for the team to investigate.
2018-02-22 15:05:37 +00:00
Alexey Bezhan
599e611278 Create an app context for each celery task run
Celery tasks require an active app context in order to use app logger,
database or app configuration values. Since there's no builtin support
we create this context by using a custom celery task class NotifyTask.

However, NotifyTask was using `current_app`, which itself is only
available within an app context so the code was pushing an initial
context in run_celery.py.

This works with the default prefork pool implementation, but raises
a "working outside of application context" error with an eventlet
pool since that initial  application context is local to a thread
and eventlet celery worker pool will create multiple green threads.

To avoid this, we bind NotifyTask to the app variable with a closure
and use that variable instead of `current_app` to create the context
for executing the task. This avoids any issues caused by shared
initial app context being lost when spawning additional worker threads.

We still need to keep the context push in run_celery.py for prefork
workers since it's required for logging events outside of tasks
(eg logging during `worker_process_shutdown` signal processing).
2018-02-13 16:44:30 +00:00
Richard Chapman
632364633b Fixed typo in call, should be self.name not Task.name.
Task.name always returned None.
2018-02-08 17:47:59 +00:00
Richard Chapman
2d670e8cf0 Updated utils to the latest version. This version of utils has less
logging at info level and as such no longer prints out the celery task
timing which are found to be use to find out if a tasks has been called
but also the timing for the task. Added an extra timing message for
celery tasks so that it can be determined if the these are less frequent
than the API calls and provide more useful information
2018-02-05 14:58:02 +00:00
Leo Hemsted
2622866622 log unhandled celery exceptions
they were always caught locally by celery's base handler, however,
we weren't logging them ourselves, which meant it wouldn't be put into
the json logs that are sent to cloudwatch.
2017-08-31 12:52:06 +01:00
Leo Hemsted
6c61a3fc2a Revert celery4
Revert the following three pull requests:
https://github.com/alphagov/notifications-api/pull/1085
https://github.com/alphagov/notifications-api/pull/1086
https://github.com/alphagov/notifications-api/pull/1088

celery 4.0.2 looked promising, however, on staging under mild load
(5/sec api calls) the performance was actually worse than 3.1.25
2017-07-19 15:17:19 +01:00
Leo Hemsted
d577855eb3 remove sqs from region - this is only needed in boto2
We don't use boto2 on the api anymore, not since celery 4.0.2

Note - if you run locally with boto2 still installed you'll see errors
that complain about things like:

boto.exception.SQSError: SQSError: 403 Forbidden
<?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>SignatureDoesNotMatch</Code><Message>Credential should be scoped to a valid region, not 'queue'. </Message><Detail/></Error><RequestId>52207ca4-9131-58cb-89ae-2d45f06623a3</RequestId></ErrorResponse>

If so, make sure boto2 is completely uninstalled.
2017-07-13 13:16:28 +01:00
Leo Hemsted
1a03248317 temp fix to sort out circular imports 2017-07-12 13:02:19 +01:00
Martyn Inglis
0e9e8955f7 Finished celery refactor - set up config for queue prefix
LEO notes: Also made sure the Test BROKER_URL is preserved so that
tests warn you when celery isn't mocked out
2017-07-12 12:37:18 +01:00
Martyn Inglis
786adb5d71 Move Queuenames in with the celery code, revamp config to allow move to celery 4.x 2017-07-12 12:01:52 +01:00
Martyn Inglis
e0106eb1be hacked celery4.0.2 in. Runs and works
- note though this version of master I branched had split head on sqlalchemey. This needs a new master merge to fix
2017-07-12 11:58:04 +01:00
Martyn Inglis
fcc1585fdf Merge branch 'master' into celery-spike
Conflicts:
	app/__init__.py
	app/notifications/rest.py
	config.py
	wsgi.py
2016-02-12 09:36:49 +00:00
Martyn Inglis
fb41acdac9 Celery tests 2016-02-09 13:31:45 +00:00