notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-18 08:02:31 -05:00

Author	SHA1	Message	Date
Athanasios Voutsadakis	3528aab25b	Kill the other processes started by the script We use exec to start awslogs_agent and then a tail to print logs to stdout. CF docs[1] recommend to use exec to start processes which seems to imply that as long as there are commands running the container will remain up and running. This commit ensures that if there are no celery tasks running we will kill any other processes that we have started, so that the container will no longer be considered healthy by cloudfoundry and will be replaced. 1: https://docs.cloudfoundry.org/devguide/deploy-apps/manifest.html#start-commands	2019-01-23 16:23:58 +00:00
Toby Lorne	afcdf1f9a1	Exit if celery processes are not running In `4427827b2f` and celery monitoring was changed from using PID files to actually looking at processes. If celery workers get OOM killed (for instance) the container init script would not restart them, this is because `get_celery_pids` would not contain any processes that contained the string celery. This would cause the pipe to fail (-o pipefail). APP_PIDS would not get updated but the script would continue to run. This caused the script to not restart the celery processes. We think the correct behaviour when celery processes are killed (i.e. there are no more celery processes running in a container) is to kill the container. The PaaS should then schedule new ones which may remediate the cause of the celery processes being killed. Upon detection of no celery processes running, some diagnostic information from the environment is sent to the logs, e.g.: ``` CF_INSTANCE_ADDR=10.0.32.4:61012 CF_INSTANCE_INTERNAL_IP=10.255.184.9 CF_INSTANCE_GUID=81c57dbc-e706-411e-6a5f-2013 CF_INSTANCE_PORT=61012 CF_INSTANCE_IP=10.0.32.4 ``` Then the script (which is the container entrypoint) exits 1. Co-author: @servingupaces @tlwr	2019-01-23 14:28:53 +00:00
Athanasios Voutsadakis	4427827b2f	Handle celery PIDs more reliably This addresses some problems that existed in the previous approach: 1. There was a race condition that could occur between the time we were looking for the existence of the .pid files and actually reading them. 2. If for some reason the .pid file was left behind after a process had died, the script would never know because we do: kill -s ${1} ${APP_PID} \|\| true	2019-01-17 16:55:44 +00:00
Leo Hemsted	80454579ee	fix multi worker exit script previously the script would: try and SIGTERM each celery process every second for the 9 second timeout, and then SIGKILL every second after, with no upper bound. This commit changes this to: * SIGTERM each process once. * Wait nine seconds (checking if the pid files are still present each second) * SIGKILL any remaining processes once. * exit	2019-01-02 15:15:46 +00:00
Alexey Bezhan	676e3ec39a	Stream delivery worker logs to stdout when running on PaaS Our application servers and celery workers write logs both to a file that is shipped to CloudWatch and to stdout, which is picked up by CloudFoundry and sent to Logit Logstash. This works with gunicorn and single-worker celery deployments, however celery multi daemonizes worker processes, which detaches them from stdout, so there's no log output in `cf logs` or Logit. To fix this, we start a separate tail process to duplicate logs written to a file to stdout, which should be picked up by CloudFoundry.	2018-06-29 11:49:02 +01:00
Athanasios Voutsadakis	d365921093	Address PR comments Make timeout 9s on both files Use more descriptive variable name Declare the logs dir as a constant	2018-03-06 16:11:42 +00:00
Athanasios Voutsadakis	3045f6233b	Use `continue` instead of `break` on SIGKILL `break`ing would keep trying to kill the same process and never move to the next	2018-02-28 14:50:23 +00:00
Athanasios Voutsadakis	b158c66705	Set timeout to 9 This will allow us an extra second to `kill -9` any remaining processes.	2018-02-28 14:37:02 +00:00
Athanasios Voutsadakis	7d562f9e85	Get empty array when no .pid file exists	2018-02-28 14:36:40 +00:00
Athanasios Voutsadakis	1adee47230	Always get the latest PIDs to check	2018-02-28 14:36:06 +00:00
Athanasios Voutsadakis	138d4eee25	Better logging of PIDs array	2018-02-28 14:34:48 +00:00
Athanasios Voutsadakis	bef80e3414	Use `eval` to run the command instead of exec `exec` is replacing the current shell to run the command, which means that the script execution stops at that line. Passing it to the background with `exec "$@" &` won't work either, because the script will move directly to the next command where it looks for the `.pid` files that have not yet been created because celery takes a few seconds to spin up all the processes. Using `sleep X` to remedy this seems just wrong given that 1. we can use `eval` that blocks until the command returns 2. there is no obvious benefit in sticking with `exec`	2018-02-28 14:16:15 +00:00
Athanasios Voutsadakis	9a8e771d95	Lower the termination timeout to 10s Cloudfoundry only gives us 10s to terminate our processes: https://docs.cloudfoundry.org/devguide/deploy-apps/app-lifecycle.html#shutdown	2018-02-27 17:44:16 +00:00
Athanasios Voutsadakis	7c36c3894d	Add a start script for running celery multi The existing script would not work with `celery multi` as it was trying to put it in the background and the get its pid.~ `celery multi` creates a number of worker processes and stores their PIDs in files named celeryN.pid, where N the index number of the worker (starting at 1).	2018-02-27 17:37:09 +00:00

14 Commits