Commit Graph

14 Commits

Author SHA1 Message Date
Athanasios Voutsadakis
3528aab25b Kill the other processes started by the script
We use exec to start awslogs_agent and then a tail to print logs to
stdout. CF docs[1] recommend to use exec to start processes which seems
to imply that as long as there are commands running the container will
remain up and running.

This commit ensures that if there are no celery tasks running we will
kill any other processes that we have started, so that the container will
no longer be considered healthy by cloudfoundry and will be replaced.

1: https://docs.cloudfoundry.org/devguide/deploy-apps/manifest.html#start-commands
2019-01-23 16:23:58 +00:00
Toby Lorne
afcdf1f9a1 Exit if celery processes are not running
In 4427827b2f and celery monitoring was
changed from using PID files to actually looking at processes.

If celery workers get OOM killed (for instance) the container init
script would not restart them, this is because `get_celery_pids` would
not contain any processes that contained the string celery. This would
cause the pipe to fail (-o pipefail). APP_PIDS would not get updated but
the script would continue to run. This caused the script to not restart
the celery processes.

We think the correct behaviour when celery processes are killed (i.e.
there are no more celery processes running in a container) is to kill
the container. The PaaS should then schedule new ones which may
remediate the cause of the celery processes being killed.

Upon detection of no celery processes running, some diagnostic
information from the environment is sent to the logs, e.g.:

```
CF_INSTANCE_ADDR=10.0.32.4:61012
CF_INSTANCE_INTERNAL_IP=10.255.184.9
CF_INSTANCE_GUID=81c57dbc-e706-411e-6a5f-2013
CF_INSTANCE_PORT=61012
CF_INSTANCE_IP=10.0.32.4
```

Then the script (which is the container entrypoint) exits 1.

Co-author: @servingupaces @tlwr
2019-01-23 14:28:53 +00:00
Athanasios Voutsadakis
4427827b2f Handle celery PIDs more reliably
This addresses some problems that existed in the previous approach:

1. There was a race condition that could occur between the time we were
looking for the existence of the .pid files and actually reading them.

2. If for some reason the .pid file was left behind after a process had
died, the script would never know because we do:

    kill -s ${1} ${APP_PID} || true
2019-01-17 16:55:44 +00:00
Leo Hemsted
80454579ee fix multi worker exit script
previously the script would:

try and SIGTERM each celery process every second for the 9 second
timeout, and then SIGKILL every second after, with no upper bound.

This commit changes this to:
* SIGTERM each process once.
* Wait nine seconds (checking if the pid files are still present each
  second)
* SIGKILL any remaining processes once.
* exit
2019-01-02 15:15:46 +00:00
Alexey Bezhan
676e3ec39a Stream delivery worker logs to stdout when running on PaaS
Our application servers and celery workers write logs both to a
file that is shipped to CloudWatch and to stdout, which is picked
up by CloudFoundry and sent to Logit Logstash.

This works with gunicorn and single-worker celery deployments, however
celery multi daemonizes worker processes, which detaches them from
stdout, so there's no log output in `cf logs` or Logit.

To fix this, we start a separate tail process to duplicate logs written
to a file to stdout, which should be picked up by CloudFoundry.
2018-06-29 11:49:02 +01:00
Athanasios Voutsadakis
d365921093 Address PR comments
Make timeout 9s on both files
Use more descriptive variable name
Declare the logs dir as a constant
2018-03-06 16:11:42 +00:00
Athanasios Voutsadakis
3045f6233b Use continue instead of break on SIGKILL
`break`ing would keep trying to kill the same process and never move to
the next
2018-02-28 14:50:23 +00:00
Athanasios Voutsadakis
b158c66705 Set timeout to 9
This will allow us an extra second to `kill -9` any remaining processes.
2018-02-28 14:37:02 +00:00
Athanasios Voutsadakis
7d562f9e85 Get empty array when no .pid file exists 2018-02-28 14:36:40 +00:00
Athanasios Voutsadakis
1adee47230 Always get the latest PIDs to check 2018-02-28 14:36:06 +00:00
Athanasios Voutsadakis
138d4eee25 Better logging of PIDs array 2018-02-28 14:34:48 +00:00
Athanasios Voutsadakis
bef80e3414 Use eval to run the command instead of exec
`exec` is replacing the current shell to run the command, which means that
the script execution stops at that line.

Passing it to the background with `exec "$@" &` won't work either,
because the script will move directly to the next command where it
looks for the `.pid` files that have not yet been created because celery
takes a few seconds to spin up all the processes.

Using `sleep X` to remedy this seems just wrong given that

1. we can use `eval` that blocks until the command returns
2. there is no obvious benefit in sticking with `exec`
2018-02-28 14:16:15 +00:00
Athanasios Voutsadakis
9a8e771d95 Lower the termination timeout to 10s
Cloudfoundry only gives us 10s to terminate our processes:
https://docs.cloudfoundry.org/devguide/deploy-apps/app-lifecycle.html#shutdown
2018-02-27 17:44:16 +00:00
Athanasios Voutsadakis
7c36c3894d Add a start script for running celery multi
The existing script would not work with `celery multi` as it was trying
to put it in the background and the get its pid.~

`celery multi` creates a number of worker processes and stores their
PIDs in files named celeryN.pid, where N the index number of the worker
(starting at 1).
2018-02-27 17:37:09 +00:00