Reduce concurrency and prefetch count of reporting celery app

We have seen the reporting app run out of memory multiple times when dealing with overnight tasks. The app runs 11 worker threads and we reduce this to 2 worker threads to put less pressure on a single instance. The number 2 was chosen as most of the tasks processed by the reporting app only take a few minutes and only one or two usually take more than an hour. This would mean with 2 processes across our current 2 instances, a long running task should hopefully only wait behind a few short running tasks before being picked up and therefore we shouldn't see large increase in overall time taken to run all our overnight reporting tasks. On top of reducing the concurrency for the reporting app, we also set CELERYD_PREFETCH_MULTIPLIER=1. We do this as suggested by the celery docs because this app deals with long running tasks. https://docs.celeryproject.org/en/3.1/userguide/optimizing.html#optimizing-prefetch-limit The chance in prefetch multiplier should again optimise the overall time it takes to process our tasks by ensuring that tasks are given to instances that have (or will soon have) spare workers to deal with them, rather than committing to putting all the tasks on certain workers in advance. Note, another suggestion for improving suggested by the docs for optimising is to start setting `ACKS_LATE` on the long running tasks. This setting would effectively change us from prefetching 1 task per worker to prefetching 0 tasks per worker and further optimise how we distribute our tasks across instances. However, we decided not to try this setting as we weren't sure whether it would conflict with our visibility_timeout. We decided not to spend the time investigating but it may be worth revisiting in the future, as long as tasks are idempotent. Overall, this commit takes us from potentially having all 18 of our reporting tasks get fetched onto a single instance to now having a process that will ensure tasks are distributed more fairly across instances based on when they have available workers to process the tasks.
2026-01-25 20:12:54 -05:00 · 2020-04-27 15:42:48 +01:00
parent 5d88a1dbf4
commit a237162106
3 changed files with 10 additions and 2 deletions
--- a/app/config.py
+++ b/app/config.py
@@ -176,6 +176,9 @@ class Config(object):
    CELERY_TASK_SERIALIZER = 'json'
    # on reporting worker, restart workers after each task is executed to help prevent memory leaks
    CELERYD_MAX_TASKS_PER_CHILD = os.getenv('CELERYD_MAX_TASKS_PER_CHILD')
+    # we can set celeryd_prefetch_multiplier to be 1 for celery apps which handle only long running tasks
+    if os.getenv('CELERYD_PREFETCH_MULTIPLIER'):
+        CELERYD_PREFETCH_MULTIPLIER = os.getenv('CELERYD_PREFETCH_MULTIPLIER')
    CELERY_IMPORTS = (
        'app.celery.tasks',
        'app.celery.scheduled_tasks',
--- a/manifest.yml.j2
+++ b/manifest.yml.j2
@@ -30,7 +30,12 @@
  'notify-delivery-worker-research': {},
  'notify-delivery-worker-sender': {'disk_quota': '2G', 'memory': '3G'},
  'notify-delivery-worker-periodic': {},
-  'notify-delivery-worker-reporting': {'additional_env_vars': {'CELERYD_MAX_TASKS_PER_CHILD': 1}},
+  'notify-delivery-worker-reporting': {
+    'additional_env_vars': {
+      'CELERYD_MAX_TASKS_PER_CHILD': 1,
+      'CELERYD_PREFETCH_MULTIPLIER': 1,
+    }
+  },
  'notify-delivery-worker-priority': {},
  'notify-delivery-worker-letters': {},
  'notify-delivery-worker-retry-tasks': {},
--- a/scripts/paas_app_wrapper.sh
+++ b/scripts/paas_app_wrapper.sh
@@ -29,7 +29,7 @@ case $NOTIFY_APP_NAME in
    -Q periodic-tasks 2> /dev/null
    ;;
  delivery-worker-reporting)
-    exec scripts/run_app_paas.sh celery -A run_celery.notify_celery worker --loglevel=INFO --concurrency=11 -Ofair \
+    exec scripts/run_app_paas.sh celery -A run_celery.notify_celery worker --loglevel=INFO --concurrency=2 -Ofair \
    -Q reporting-tasks 2> /dev/null
    ;;
  delivery-worker-priority)