notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-15 17:52:26 -05:00

Author	SHA1	Message	Date
Athanasios Voutsadakis	3528aab25b	Kill the other processes started by the script We use exec to start awslogs_agent and then a tail to print logs to stdout. CF docs[1] recommend to use exec to start processes which seems to imply that as long as there are commands running the container will remain up and running. This commit ensures that if there are no celery tasks running we will kill any other processes that we have started, so that the container will no longer be considered healthy by cloudfoundry and will be replaced. 1: https://docs.cloudfoundry.org/devguide/deploy-apps/manifest.html#start-commands	2019-01-23 16:23:58 +00:00
Toby Lorne	afcdf1f9a1	Exit if celery processes are not running In `4427827b2f` and celery monitoring was changed from using PID files to actually looking at processes. If celery workers get OOM killed (for instance) the container init script would not restart them, this is because `get_celery_pids` would not contain any processes that contained the string celery. This would cause the pipe to fail (-o pipefail). APP_PIDS would not get updated but the script would continue to run. This caused the script to not restart the celery processes. We think the correct behaviour when celery processes are killed (i.e. there are no more celery processes running in a container) is to kill the container. The PaaS should then schedule new ones which may remediate the cause of the celery processes being killed. Upon detection of no celery processes running, some diagnostic information from the environment is sent to the logs, e.g.: ``` CF_INSTANCE_ADDR=10.0.32.4:61012 CF_INSTANCE_INTERNAL_IP=10.255.184.9 CF_INSTANCE_GUID=81c57dbc-e706-411e-6a5f-2013 CF_INSTANCE_PORT=61012 CF_INSTANCE_IP=10.0.32.4 ``` Then the script (which is the container entrypoint) exits 1. Co-author: @servingupaces @tlwr	2019-01-23 14:28:53 +00:00
Athanasios Voutsadakis	4427827b2f	Handle celery PIDs more reliably This addresses some problems that existed in the previous approach: 1. There was a race condition that could occur between the time we were looking for the existence of the .pid files and actually reading them. 2. If for some reason the .pid file was left behind after a process had died, the script would never know because we do: kill -s ${1} ${APP_PID} \|\| true	2019-01-17 16:55:44 +00:00
Leo Hemsted	80454579ee	fix multi worker exit script previously the script would: try and SIGTERM each celery process every second for the 9 second timeout, and then SIGKILL every second after, with no upper bound. This commit changes this to: * SIGTERM each process once. * Wait nine seconds (checking if the pid files are still present each second) * SIGKILL any remaining processes once. * exit	2019-01-02 15:15:46 +00:00
Rebecca Law	68cea04210	Fixed error message	2018-11-09 16:40:58 +00:00
Leo Hemsted	2ed50e760f	Revert "Celery 4"	2018-10-09 13:27:49 +01:00
Leo Hemsted	bfc4343b0e	remove pip-accel and make sure commands work if you're in a venv remove pip-accel - it's not been updated in two years, and pins our version of pip to a version that is several breaking changes old. make sure commands work if you're already in a venv - mostly by checking for presence of $VIRTUAL_ENV, and ensuring we use the correct pip to install packages. Also clean up the commands a bit.	2018-10-04 15:52:51 +01:00
Leo Hemsted	640f00b0e8	install celery with sqs support you need to `pip install celery[sqs]` to get the additional dependencies that celery needs to use SQS queues - there are two libs - boto3 and pycurl. pycurl is a bunch of python handles around curl, so needs to be installed from source so it can link to your curl/ssl libs. On paas and in docker this works fine (needed to add `libcurl4-openssl-dev` to the docker container), but on macos it can't find openssl. We need to pass a couple of flags in: * set the environment variable PYCURL_SSL_LIBRARY=openssl * pass in the global options `build_ext` and `-I{openssl_headers_path}`. As shown here: https://github.com/pycurl/pycurl/issues/530#issuecomment-395403253 Env var is no biggie, but using any install-option flags disables wheels for the whole pip install run. (See https://github.com/pypa/pip/issues/2677 and https://github.com/pypa/pip/issues/4118 for more context on the install-options flags). A whole bunch of our dependencies don't install nicely from source (but do from wheel), so this commit installs pycurl separately as an initial step, with the requisite flags, and then installs the rest of the requirements as before. I've updated the makefile and bootstrap.sh files to reflect this, but if you run `pip install -r requirements.txt` from scratch you will run into issues.	2018-10-03 14:11:30 +01:00
Alexey Bezhan	75940c9566	Pin all application requirements in requirements.txt The list of top-level dependencies is moved to requirements-app.txt, which is used by `make freeze-requirements` to generate the full list of requirements in requirements.txt. This is based on alphagov/digitalmarketplace-api#615, so rationale from that PR applies here. We had a problem with unpinned packages on new deployments leading to failed tests (e.g. alphagov/notifications-admin#2144) which is why we're implementing this now. After re-evaluating pipenv again, this still seems like the least disruptive approach: * pyup.io has experimental support for Pipfile, but doesn't respect version ranges or updating hashes in the lock file * CloudFoundry buildpack recognizes and supports Pipfiles out of the box, but the support is relatively new. For example until recently CF would install dev packages during deployment. It's also based on generating a requirements file from the Pipfile, which doesn't properly support pinning VCS dependencies (eg it doesn't set the #egg= version, meaning pip will not upgrade the package if it's already installed). * pipenv has a strict dependency resolution algorithm, which doesn't appear to be well documented and can cause some unexpected failures. For example, pipenv doesn't seem to be able to install `awscli-cwlogs` package at all, believing it to have a version conflict for `botocore` (which it doesn't list as a direct dependency) while neither `pip` nor `pip-tools` highlight any issues with it. * While trying out `pipenv install` on our list of dependencies it would regularly fail to install utils with a "Will try again." message. While the installation succeeds after a retry, this doesn't inspire confidence. * The switch to Pipfile and pipenv-managed virtualenvs requires a series of changes to `make` targets and scripts - replacing `pip install` with `pipenv`, removing references to requirements files and prefixing commands with `pipenv run`. While it's likely to simplify the overall process of managing dependencies, it would require time to properly implement across our applications and environments (Jenkins, PaaS, docker containers, and dev machines).	2018-07-10 14:59:04 +01:00
Alexey Bezhan	676e3ec39a	Stream delivery worker logs to stdout when running on PaaS Our application servers and celery workers write logs both to a file that is shipped to CloudWatch and to stdout, which is picked up by CloudFoundry and sent to Logit Logstash. This works with gunicorn and single-worker celery deployments, however celery multi daemonizes worker processes, which detaches them from stdout, so there's no log output in `cf logs` or Logit. To fix this, we start a separate tail process to duplicate logs written to a file to stdout, which should be picked up by CloudFoundry.	2018-06-29 11:49:02 +01:00
Athanasios Voutsadakis	d365921093	Address PR comments Make timeout 9s on both files Use more descriptive variable name Declare the logs dir as a constant	2018-03-06 16:11:42 +00:00
Athanasios Voutsadakis	3045f6233b	Use `continue` instead of `break` on SIGKILL `break`ing would keep trying to kill the same process and never move to the next	2018-02-28 14:50:23 +00:00
Athanasios Voutsadakis	b158c66705	Set timeout to 9 This will allow us an extra second to `kill -9` any remaining processes.	2018-02-28 14:37:02 +00:00
Athanasios Voutsadakis	7d562f9e85	Get empty array when no .pid file exists	2018-02-28 14:36:40 +00:00
Athanasios Voutsadakis	1adee47230	Always get the latest PIDs to check	2018-02-28 14:36:06 +00:00
Athanasios Voutsadakis	138d4eee25	Better logging of PIDs array	2018-02-28 14:34:48 +00:00
Athanasios Voutsadakis	bef80e3414	Use `eval` to run the command instead of exec `exec` is replacing the current shell to run the command, which means that the script execution stops at that line. Passing it to the background with `exec "$@" &` won't work either, because the script will move directly to the next command where it looks for the `.pid` files that have not yet been created because celery takes a few seconds to spin up all the processes. Using `sleep X` to remedy this seems just wrong given that 1. we can use `eval` that blocks until the command returns 2. there is no obvious benefit in sticking with `exec`	2018-02-28 14:16:15 +00:00
Athanasios Voutsadakis	9a8e771d95	Lower the termination timeout to 10s Cloudfoundry only gives us 10s to terminate our processes: https://docs.cloudfoundry.org/devguide/deploy-apps/app-lifecycle.html#shutdown	2018-02-27 17:44:16 +00:00
Athanasios Voutsadakis	7c36c3894d	Add a start script for running celery multi The existing script would not work with `celery multi` as it was trying to put it in the background and the get its pid.~ `celery multi` creates a number of worker processes and stores their PIDs in files named celeryN.pid, where N the index number of the worker (starting at 1).	2018-02-27 17:37:09 +00:00
Chris Hill-Scott	de11751919	Merge pull request #1661 from alphagov/pytest-env Automatically set environment vars before tests	2018-02-15 09:34:49 +00:00
Chris Hill-Scott	faa76755c3	Allow up to 10 tests to fail before stopping This is a sensible compromise between 1 test and ALL THE TESTS. Uses the `maxfail` option as documented here: https://docs.pytest.org/en/latest/usage.html#stopping-after-the-first-or-n-failures	2018-02-14 13:27:06 +00:00
Chris Hill-Scott	845aad1183	Remove environment_test.sh These config variables are now set in `pytest.ini` instead.	2018-02-14 12:19:13 +00:00
Chris Hill-Scott	e87ed0f68a	Stop pytest on first failing test If a PR is going to fail because tests aren’t passing then you: - should know about it as quick as possible - shouldn’t waste precious Jenkins CPU running subsequent tests This commit adds the `-x` flag to pytest, which stops the test run as soon as one failing test is discovered.	2018-02-14 11:54:40 +00:00
Alexey Bezhan	40c27c6e70	Don't parse AWS credentials from VCAP_SERVICES in run_app_paas AWS credentials are provided in the environment variables directly, so we don't need to parse them from VCAP_SERVICES	2018-01-09 10:45:03 +00:00
Alexey Bezhan	0ad5c184c2	Only update existing manifest variables when adding env secrets Changes generate manifest script to parse variables file as YAML and only add variables to the manifest if they're already listed in the `env` section. This allows us to use a single variables file for all applications and avoid duplicating secrets across multiple files while adding only the relevant secrets to the application manifest.	2018-01-09 10:45:03 +00:00
Alexey Bezhan	da1a7ceb0d	Add a script to generate PaaS manifest with environment variables `./scripts/generate_manifest.py` takes a path to a PaaS manifest file and a list of variable files and prints a single CloudFoundry manifest. The generated manifest replaces all `inherit` keys by loading the data from parent manifests. This allows us to pipe the script output directly to CF CLI, without saving it to disk, which minimises the risk of it being accidentally included in the deployment artefact. The combined manifest might differ from the results produced by CF CLI itself, so the original manifest shouldn't normally be used on its own. After combining the manifests the script will load and parse all listed variable files and add them to the manifest's `env` by merging the files together in the order they were listed (so in case of any key conflicts the latest file overwrites previous values), upper-casing keys and processing any list or dictionary values with `json.dumps`, so that they can be set as environment variables. This gives us a full list of environment variables that were previously parsed from the CloudFoundry user services data.	2018-01-09 10:45:03 +00:00
Athanasios Voutsadakis	2470faf04b	Remove unused scripts These were referenced by appspec and have not been used since we migrated to paas, so they can be removed.	2018-01-02 14:25:40 +00:00
Leo Hemsted	c6e6fad01f	if apps crash on startup, then fail deploy process we saw an issue where the app started, then immediately crashed due to a setup error. However, jenkins had already returned positively, and the deploy continued. cf-deploy should fail if the app doesn't start up. We do this by looking through the cloudfoundry events, and aborting if there are any `app.crash` events for the new GUID.	2017-12-14 14:23:32 +00:00
Leo Hemsted	da468681f2	add warning to fix_migrations script for when the filename is different	2017-12-07 10:41:10 +00:00
Athanasios Voutsadakis	444876222e	Touch log files in the initialization The logs agent would complain for the missing gunicorn error log file.	2017-12-05 13:36:45 +00:00
Leo Hemsted	82acec87b6	flake8 the entire repo now that the tests are good 👌	2017-11-28 17:29:43 +00:00
Leo Hemsted	90e9a2f1b3	use flake8 instead of pycodestyle since there are thousands and thousands of errors in the tests files at the moment, i propose fixing those errors in separate PR for now.	2017-11-28 14:28:01 +00:00
Leo Hemsted	fcf3932bb4	Merge pull request #1418 from alphagov/remove-flask-script Remove flask script	2017-11-24 10:28:56 +00:00
Alexey Bezhan	e813f1ff87	Add a script to fix migration ordering conflicts When generating a new migration we give it a number that increments the latest existing migration on master. This means that when there are multiple PRs open containing a migration and one of them gets merged the others need to be updated to move their migration files to apply on top of the recently merged one. This requires renaming the file and changing migration references for both the migration revision and down_revision. If a PR introduced more than 1 migration they all need to be updated one after another since each one needs to be renamed. This adds a script to simplify this process. `./scripts/fix_migrations.py` will check for any branch points If it finds exactly one (which should be the common case), it asks which migration should be moved and renames / updates references to move the selected branch on top of the other one. It won't resolve any conflicts within migrations themselves (eg if both branches modified the same column) and it won't try to resolve cases with more than 1 branch.	2017-11-24 09:58:41 +00:00
Leo Hemsted	61b28ca523	update bootstrap to use new command	2017-11-23 17:12:13 +00:00
Leo Hemsted	9f56dccdee	Remove flask-script, move commands to click click (http://click.pocoo.org/) is used by flask to run its cli args. In removing flask_script (it's unmaintained), we had to migrate all our commands to use click. This is a change for the better in my eyes - you don't need to define the command in several places, and it makes managing options a bit easier. View diff with whitespace turned off unless you're a masochist.	2017-11-23 17:04:58 +00:00
Leo Hemsted	8ef4e1e1da	remove trailing underscore, which is sometimes created by alembic	2017-11-09 16:48:05 +00:00
Athanasios Voutsadakis	646290e2c4	Implement worker_abort server hook Also update the manifest command to pass the config file in as a parameter and modify the logs agent to start shipping the error log to cloudwatch	2017-09-22 15:03:45 +01:00
Leo Hemsted	e18e78180e	update letter tests to use correct service previously they were using sample_service fixture under the hood, but with full permissions added - this works fine, unless there's already a service with the name "sample service" in the database. This can happen for two reasons: * A previous test didn't tear down correctly * This test already invoked the sample_service fixture somehow If this happens, we just return the existing service, without modifying its values - values that we might change in tests, such as research mode or letters permissions. In the future, we'll have to be vigilant! and aware! and careful! to not use sample_service if we're doing tests involving letters, since they create a service with a different name now	2017-09-21 11:50:49 +01:00
Leo Hemsted	c8ff45be2d	don't create test db anymore, thats done in pytest	2017-09-20 11:21:27 +01:00
Athanasios Voutsadakis	4b88ec639f	Remove * from logs matching pattern Now we are only going to have one log file that we're interested in	2017-09-11 11:11:06 +01:00
Leo Hemsted	2fefe8a957	use sqlalchemy hooks rather than pyscopg2 seems to play nicer with docker?	2017-08-29 18:03:15 +01:00
Leo Hemsted	3d4dbaa632	run tests in multiple threads at once previously we didn't do this because the tests all used the same DB (test_notifications_api), however @minglis shared a snippet that simply creates one test db per thread.	2017-08-29 17:46:11 +01:00
Leo Hemsted	4e7c19a98f	update utils to v20 - log from multiple app.log*.json files the rotating file handler puts the date on the end, so those old files won't be pushed	2017-08-25 14:38:04 +01:00
Leo Hemsted	e7b13e727a	don't capture logs directly from stdout previously in run_app_paas.sh, we captured stdout from the app and piped that into the log file. However, this came up with a bunch of problems, mainly: * exceptions with stack traces often weren't formatted properly, and kibana could not parse them * celery logs were duplicated - we'd collect both the json logs and the human readable stdout logs. instead, with the updated utils library, we can use that to log json straight to the appropriate directory directly.	2017-08-09 15:29:39 +01:00
Imdad Ahad	6da3d3ed0b	Remove wheels-ing on deployment	2017-07-21 14:26:59 +01:00
minglis	033fe0f4dd	Merge pull request #998 from alphagov/update-queues-script Did some work around the delete queues script	2017-05-31 15:08:35 +01:00
Martyn Inglis	b98b97c4a2	Added a comment about delete queues	2017-05-31 15:06:21 +01:00
Martyn Inglis	68e15b57f5	Fixed pep8	2017-05-30 17:29:14 +01:00
Martyn Inglis	8f7afcdb16	Did some work around the delete queues script	2017-05-30 17:07:43 +01:00

1 2 3

143 Commits