Reduce concurrency and prefetch count of reporting celery app

We have seen the reporting app run out of memory multiple times when
dealing with overnight tasks. The app runs 11 worker threads and we
reduce this to 2 worker threads to put less pressure on a single
instance.

The number 2 was chosen as most of the tasks processed by the reporting
app only take a few minutes and only one or two usually take more than
an hour. This would mean with 2 processes across our current 2
instances, a long running task should hopefully only wait behind a few
short running tasks before being picked up and therefore we shouldn't
see large increase in overall time taken to run all our overnight
reporting tasks.

On top of reducing the concurrency for the reporting app, we also set
CELERYD_PREFETCH_MULTIPLIER=1. We do this as suggested by the celery
docs because this app deals with long running tasks.
https://docs.celeryproject.org/en/3.1/userguide/optimizing.html#optimizing-prefetch-limit

The chance in prefetch multiplier should again optimise the overall time
it takes to process our tasks by ensuring that tasks are given to
instances that have (or will soon have) spare workers to deal with them,
rather than committing to putting all the tasks on certain workers in
advance.

Note, another suggestion for improving suggested by the docs for
optimising is to start setting `ACKS_LATE` on the long running tasks.
This setting would effectively change us from prefetching 1 task per
worker to prefetching 0 tasks per worker and further optimise how we
distribute our tasks across instances. However, we decided not to try
this setting as we weren't sure whether it would conflict with our
visibility_timeout. We decided not to spend the time investigating but
it may be worth revisiting in the future, as long as tasks are
idempotent.

Overall, this commit takes us from potentially having all 18 of our
reporting tasks get fetched onto a single instance to now having a
process that will ensure tasks are distributed more fairly across
instances based on when they have available workers to process the
tasks.
This commit is contained in:
David McDonald
2020-04-27 15:42:48 +01:00
parent 5d88a1dbf4
commit a237162106
3 changed files with 10 additions and 2 deletions

View File

@@ -176,6 +176,9 @@ class Config(object):
CELERY_TASK_SERIALIZER = 'json'
# on reporting worker, restart workers after each task is executed to help prevent memory leaks
CELERYD_MAX_TASKS_PER_CHILD = os.getenv('CELERYD_MAX_TASKS_PER_CHILD')
# we can set celeryd_prefetch_multiplier to be 1 for celery apps which handle only long running tasks
if os.getenv('CELERYD_PREFETCH_MULTIPLIER'):
CELERYD_PREFETCH_MULTIPLIER = os.getenv('CELERYD_PREFETCH_MULTIPLIER')
CELERY_IMPORTS = (
'app.celery.tasks',
'app.celery.scheduled_tasks',

View File

@@ -30,7 +30,12 @@
'notify-delivery-worker-research': {},
'notify-delivery-worker-sender': {'disk_quota': '2G', 'memory': '3G'},
'notify-delivery-worker-periodic': {},
'notify-delivery-worker-reporting': {'additional_env_vars': {'CELERYD_MAX_TASKS_PER_CHILD': 1}},
'notify-delivery-worker-reporting': {
'additional_env_vars': {
'CELERYD_MAX_TASKS_PER_CHILD': 1,
'CELERYD_PREFETCH_MULTIPLIER': 1,
}
},
'notify-delivery-worker-priority': {},
'notify-delivery-worker-letters': {},
'notify-delivery-worker-retry-tasks': {},

View File

@@ -29,7 +29,7 @@ case $NOTIFY_APP_NAME in
-Q periodic-tasks 2> /dev/null
;;
delivery-worker-reporting)
exec scripts/run_app_paas.sh celery -A run_celery.notify_celery worker --loglevel=INFO --concurrency=11 -Ofair \
exec scripts/run_app_paas.sh celery -A run_celery.notify_celery worker --loglevel=INFO --concurrency=2 -Ofair \
-Q reporting-tasks 2> /dev/null
;;
delivery-worker-priority)