Commit Graph

27 Commits

Author SHA1 Message Date
David McDonald
a13b17a7c9 Transform list to json so it can be used read by json.loads 2020-02-21 13:41:08 +00:00
David McDonald
2dc5550159 Change variable name to make more descriptive
Also remove unnecessary if statement
Also add manifest change to make sure relevant environment variables
makes it into the app
2020-02-20 15:48:15 +00:00
David McDonald
9de3a9ff43 Set healthcheck timeout as integer
Not allowed to be a non integer so have upped to 3 rather than going to
2 (fairly arbitrary choice).
2019-12-19 10:55:38 +00:00
David McDonald
5aaf109ae1 Up API health check timeout to 2.5 seconds
This was after we saw an instance of the API failing it's healthcheck
even though it was still healthy enough to serve requests to users.

This follows the change we've also made to template-preview and admin of
upping the health check timeout. Unlike those where we set it to be 10
seconds, we have been less allowing here and only chosen 2.5 seconds.
This was at suggestion of Toby from PaaS as the api should generally
have quicker response times and more annoyance might be created for
users if we let an instance stick around for 10 seconds where it was
unable to serve requests successfully.
2019-12-18 10:36:04 +00:00
Leo Hemsted
4701e5d9af don't define MMG_URL and FIRETEXT_URL in manifest
these URLs never change, and it lead to surprising issues where an
updated default MMG_URL wasn't actually respected on PaaS. These urls
aren't private and don't need to be stored in credentials.

By not defining them in the manifest, we expect them to use the default
unless `cf set-env` has been specifically used to modify them in an app.
2019-12-04 15:26:49 +00:00
Leo Hemsted
e094dd4bfd remove loadtesting from providers
we don't use it since we wrote our own provider stubs for performance
tests.

this removes it from the api - it's still in the DB and will be
retrieved by queries, but is set to disabled on prod
2019-10-23 11:45:07 +01:00
Leo Hemsted
9c2ded00c1 scale up api to 25 instances before deploy on production
deploys take up to five minutes, during which notify-paas-autoscaler
can't scale the app. We saw 502s due to a large volume of traffic
coming in during that time, and we couldn't react cos we were
deploying.

scale up to 25 instances, the autoscaler won't be able to downscale
until after the deploy has finished.
2019-10-14 16:12:35 +01:00
Leo Hemsted
3a0bf2b23e Add reporting worker
also remove references to unused statistics queue
2019-08-15 16:42:15 +01:00
Athanasios Voutsadakis
cd936d2e71 Enable statsd exporter for production
Also bump the utils version to include a fix on the error handling logic
when we fail to send a metric.
2019-08-14 11:42:13 +01:00
Andy Paine
088f234185 REP-340: Use PaaS statsd exporter
- We are running the statsd-exporter on the PaaS now so we can use the
  internal UDP route to talk to it
- Only update in preview and staging still so that we can get the
  dashboards fully up to date before switching prod
2019-08-05 10:36:58 +01:00
Andy Paine
57705fd6fe AUTO: Explicitly include FIRETEXT_URL in manifest
- We are explicit about MMG_URL but not FIRETEXT_URL
- credentials has already been updated (checked by doing make
  generate-manifest for all envs)
2019-06-14 15:22:18 +01:00
Andy Paine
2d17827780 AUTO: Enable statsd exporter on staging
- We want to do some load testing so we want to use the Prometheus
  metrics for observing the system
- Roll out the statsd exporter work to staging too
2019-06-10 11:12:44 +01:00
Andy Paine
e61619f3e0 AUTO-413: Point preview statsd at tools
- We are running a statsd exporter on tools to collect all our statsd
  metrics for scraping by Prometheus
- Update preview to point there instead of at the local one which has
  issues with redeployment and DNS changing
2019-05-30 17:03:08 +01:00
Andy Paine
adf81ef689 BAU: Use port health checks for API
- We've been seeing an issue when traffic spikes of the http health
  checks taking over 1s and PaaS killing the app
- Port health checks won't care about being stuck in a queue so should
  continue to work even at high loads
- We have functional tests to catch if a deployment brings up the app
  (and so passes port health check) but then doesn't work
2019-05-30 11:56:19 +01:00
Andy Paine
655d5a4e16 AUTO-413: Use an internal app for statsd preview
- We are running statsd exporter as an app with a public route for
  Prometheus to scrape
- This updates preview to send statsd metrics over the CF internal
  networking to the statsd exporter
- Removes the sidecar statsd exporters too
2019-05-23 11:10:33 +01:00
Leo Hemsted
10a6f32a09 add routes for all apps
all apps get a route assigned when using v3-zdt-push.

> By default, the web process has a route and one instance. Other processes have zero instances by default.

([source](https://docs.cloudfoundry.org/devguide/multiple-processes.html))

When we push apps to multiple environments they need different routes
or the second push will fail, so this means that we need to define
routes ourselves for every app.

We're also manually flagging the health-check as either "http" or
"process" - http for the api, process for all others.

If not specified, healthcheck is set to `port` by cloudfoundry - we've
seen some issues with upgrading the deployment from v2 to v3 when using
port - it adds apps to load balancer when they're not ready, which can
result in 404s. by setting healthcheck to http it'll wait for the
/status endpoint to return 200, which will wait for flask to get
everything up and running properly
2019-05-15 16:01:28 +01:00
Leo Hemsted
7a711cf314 Revert "Zero downtime deploy" 2019-05-15 13:48:40 +01:00
Leo Hemsted
53806f168d add routes for all apps
all apps get a route assigned when using v3-zdt-push.

> By default, the web process has a route and one instance. Other processes have zero instances by default.

([source](https://docs.cloudfoundry.org/devguide/multiple-processes.html))

When we push apps to multiple environments they need different routes
or the second push will fail, so this means that we need to define
routes ourselves for every app.

We're also manually flagging the health-check as either "http" or
"process" - http for the api, process for all others.
2019-05-09 14:37:03 +01:00
Leo Hemsted
fda3f4b41a Revert "Stub out SMS providers on staging for the perf tests" 2019-05-07 15:32:35 +01:00
Pea Tyczynska
b59bca0fc2 Rename workers so they are less wordy xd 2019-05-01 14:51:43 +01:00
Pea Tyczynska
6163ca8b45 Change distribution of queues among notify delivery workers
This is so that retry-tasks queue, which can have quite a lot of
load, has its own worker, and other queues are paired with queues
that flow similarly:
- letter-tasks with create-letters-pdf-tasks
- job-tasks with database-tasks
2019-04-30 12:03:06 +01:00
Alexey Bezhan
bc7d91daec Merge pull request #2468 from alphagov/local-statsd-exporter
Local statsd exporter
2019-04-24 15:27:55 +01:00
Alexey Bezhan
ba2abc9127 Add a local_statsd configuration to PaaS manifest template
Running `statsd_exporter` alongside the app process allows us to get
StatsD metrics pushed by workers to Prometheus.

This requires adding a route to the worker instances and binding the
RE prometheus discovery service. So this approach won't work for API
and admin since they already have `gunicorn` bound to the `$PORT`.

Since we're not ready to switch all apps to Prometheus metrics at once
and we don't currently have a way to push statsd metrics to multiple
destination we're using a configuration setting in the manifest template
to switch individual workers in specific environments.

`local_statsd` contains a list of environments where the app should
use local `statsd_exporter` for pushing statsd metrics instead of
HostedGraphite.
2019-04-24 13:50:13 +01:00
Alexey Bezhan
7520cc46de Stub out SMS providers on staging for the perf tests
This points MMG and Firetext on staging to a stub service run on
PaaS to avoid text message costs during the load test.
2019-04-24 11:37:41 +01:00
Leo Hemsted
c786258a60 give instances and NOTIFY_APP_NAME sensible defaults
NOTIFY_APP_NAME follows precedent and just tries to strip 'notify-'
from the beginning of the string.

instances is not specified at all if not defined - it'll scale up to
the same amount of instances as currently present, and then the
autoscaler will take over anyway
2019-04-11 14:57:22 +01:00
Leo Hemsted
7d9cd58e89 rename public-api to api
we don't use the public-api bit anywhere - even cloudwatch overwrites
based on CW_APP_NAME (which we can get rid of as this distinction is
gone)
2019-04-10 15:19:46 +01:00
Leo Hemsted
66ca98fbfb create manifest from jinja template
newer versions of cf api don't allow you to have multiple apps per
manifest file. So, instead of our current inheritance based model, move
to the newer doc-dl/antivirus/template-preview approved jinja based
model.

the new single manifest.yml.j2 file sets a bunch of variables based on
the CF_APP variable - things like NOTIFY_APP_NAME, default instances,
etc. Then the manifest is built up to define all of the app options
based on these defaults. Things default to sensible values, which can
vary based on environment.

When adding new environment variables, you'll need to add them to the
manifest file. If they're json encoded lists, you'll need to pass them
back to the `tojson` filter, or jinja2 will print them as python lists,
with single quotes around strings.
2019-04-10 15:15:48 +01:00