Commit Graph

10 Commits

Author SHA1 Message Date
David McDonald
d4ed909d0f Revert "Revert "Statsd to prometheus"" 2020-07-01 13:27:12 +01:00
David McDonald
5fb58260e2 Revert "Statsd to prometheus" 2020-07-01 10:00:39 +01:00
David McDonald
043e6ac69c Add GDS metrics package to admin app
Follows the code from the API
2020-06-30 14:24:34 +01:00
David McDonald
6958c0d677 Remove statsd
We don't expose these metrics anywhere anyway and we want to move to
prometheus too (which will be done in the next commit)
2020-06-30 11:08:11 +01:00
David McDonald
5f548a395a Remove unused environment variable
We no longer use this.

See f56795655e
for further details.
2020-03-06 13:25:53 +00:00
David McDonald
f8dc3936fc Change http healthcheck invocation timeout to 10 seconds
We have seen multiple issues in production where healthchecks have
failed for our applications as responses have taken longer than 1 second
(the default health check invocation timeout) to respond and this has
marked the instance as unhealthy and restarted it. This restarting has
dropped inflight requests and caused 502s for our users.

We are not entirely sure why the healthchecks sometimes take longer than
expected. One hypothesis is large amounts of traffic slowing response
times of the apps, however we have also seen contradictory evidence
where health checks can still fail even when apps are getting very low
levels of traffic. There could also be an issue with the actual
healthcheck process itself.

Regardless of the cause, we think by changing the timeout to 10 seconds
it might stop our apps being restarted when they are infact still
healthy enough to serve requests to users. Further investigation will
also be done by the PaaS team into the health check process itself to
see if this throws any more light on the situation.

10 seconds was a fairly abritary choice that was significantly longer
than 1 second.
2019-11-28 13:35:54 +00:00
Andy Paine
5242f67d97 REP-340: Use PaaS hosted stats exporter
- We are running the statsd exporter on PaaS now and we can route to it
  on apps.internal
- Send metrics there instead so they end up in Prometheus
2019-08-05 13:47:53 +01:00
Leo Hemsted
89c7a87af2 use http healthcheck
prevents traffic being routed to apps that haven't warmed up yet
2019-05-17 14:50:36 +01:00
Leo Hemsted
9f10cd0b82 ensure env vars in manifest are quoted
if they're not defined in the credentials file, they should be an empty
string, rather than null.
2019-04-18 15:14:32 +01:00
Leo Hemsted
9547fedcc3 move manifest to single jinja template
uses the same pattern as other apps. Infers route based on app name,
you can declare more in the dict at the top if you need to.
2019-04-16 14:46:00 +01:00