we have one global metrics variable `metrics = GDSMetrics()`, and we
then call `metrics.init_app` from within the flask application set up.
The v2/test_errors.py app_for_test fixture calls create_app, would call
metrics.init_app multiple times for the same metrics instance. This
causes errors, so change the fixture to session level so it only calls
once per test run.
grab the worker app name and task name rather than the web host and
endpoint. also add a fallback for if we're not in a web request or a
celery task. I think that'll probably happen when we use alembic, or if
we do things from within flask shell
add the following prometheus metrics to keep track of general app
instance health.
# sqs_apply_async_duration
how long does the actual SQS call (a standard web request to AWS) take.
a histogram with default bucket sizes, split up per task that was
created.
# concurrent_web_request_count
how many web requests is this app currently serving. this is split up
per process, so we'd expect multiple responses per app instance
# db_connection_total_connected
how many connections does this app (process) have open to the database.
They might be idle.
# db_connection_total_checked_out
how many connections does this app (process) have open that are
currently in use by a web worker
# db_connection_open_duration_seconds
a histogram per endpoint of how long the db connection was taken from
the pool for. won't have any data if a connection was never opened.
If we set an environment variable, we can stub out calls to SES and send
them to our own stub app. If the environment variable is not set, things
work as normal.
To be used alongside
https://github.com/alphagov/notifications-email-provider-stub
Now that the encryption module has been moved from this app to utils, we
can remove it from here (along with its tests) and import it from utils
instead. This also renames the `encryption.py` file to `hashing.py`,
since it no longer contains the encryption class.
It is likely this endppoint will need additional data for the UI to display, for the first iteration this will enable the /uploads page to show both letters and jobs. Only letter uploaded by the UI are included in the resultset.
Add file name to resultset.
we don't use it since we wrote our own provider stubs for performance
tests.
this removes it from the api - it's still in the DB and will be
retrieved by queries, but is set to disabled on prod
This table is no longer used or referenced in the code.
We can remove this mapping table now that the organisation to service relationship is handled by the foreign key in Services.
Handle werkzeug http errors separately. Also, improve the generic
`Exception` handler to be more flexible when exceptions don't look
like http errors.
note: because they're 405s the route isn't matched, so the v2 error
handlers don't trip properly. this might be an issue because the
internal endpoints expect a different return format. Might have to
think about how to handle this.
A recent issue with a long-running query (#2288) highlighted the
fact that even though the original HTTP connection might be closed
(for example after gorouter timeout of 15 minutes, which returns a
504 response to the client), the request worker will not be stopped.
This means that the worker is spending time and potentially DB
resources generating a response that will never be delivered.
Gunicorn's timeout setting only applies to sync workers and there
doesn't seem to be an option to interrupt individual requests in
gevent/eventlet deployments.
Since the most likely (and potentially most dangerous) scenario for
this is a long-running DB query, we can set a statement timeout on
our DB connections. This will raise a sqlalchemy.exc.OperationalError
(wrapping psycopg2.extensions.QueryCanceledError), interrupting the
request after the given timeout has been reached.
This is a Postgres client setting, so the database itself will abort
the transaction when it reaches the set timeout.
Since this will also apply to our celery tasks (including potentially
long-running nightly tasks) we set a timeout of 20 minutes to begin
with.
This can potentially be split in the future to set a different value
for each app, so that we could limit API requests even more.
By default, SQLAlchemy will start a transaction with an
existing connection without checking that the connection
is still valid.
Enabling "pre-ping" makes the ORM send a `SELECT 1` when
acquiring a connection, which should help avoid some errors
caused by connections breaking during a DB failover.
The added statement has a constant overhead for all transactions,
so we should only keep it enabled when we're planning to switch
or upgrade the database server.
https://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-pessimistic
* create template folder
* rename template folder
* get list of template folders for service (not nested/presented in any
particular way)
* delete template folder
Also removed `lazy=dynamic` from the `template_folder.templates`
relationship. lazy=dynamic returns a query object (which you can then
filter further). We just want to return the entire fetched list, at
least for now.
Created a platform-stats blueprint and moved the new platform stats
endpoint to the new blueprint (it was previously in the service
blueprint). Since the original platform stats route and the new platform
stats route are now in different blueprints, their view functions can
have the same name without any issues.
> On Python 3.3 or newer, monotonic will be an alias of time.monotonic
> from the standard library. On older versions, it will fall back to an
> equivalent implementation.
– https://pypi.org/project/monotonic/
Allows uploading documents to the Document Download API.
The client is configured with an API host and auth token. There's
no need for a flag to disable the client in the test environments
at the moment since the upload is only triggered by a specific
payload which would only be sent with an explicit goal of using
document download.
The main drive behind this is to allow us to enable http healthchecks on
the `/_status` endpoint. The healthcheck requests are happening directly
on the instances without going to the proxy to get the header properly
set.
In any case, endpoints like `/_status` should be generally accessible by
anything without requiring any form of authorization.
notable things that have been kept until migration is complete:
* passing in `organisation` to update_service will update email branding
* both `/email-branding` and `/organisation` hit the same code
* service endpoints still return organisation as well as email branding
logging at info level and as such no longer prints out the celery task
timing which are found to be use to find out if a tasks has been called
but also the timing for the task. Added an extra timing message for
celery tasks so that it can be determined if the these are less frequent
than the API calls and provide more useful information
click (http://click.pocoo.org/) is used by flask to run its cli args.
In removing flask_script (it's unmaintained), we had to migrate all our
commands to use click. This is a change for the better in my eyes - you
don't need to define the command in several places, and it makes
managing options a bit easier.
View diff with whitespace turned off unless you're a masochist.
- had to update the serialization in the model so that the date time is appended with the UTC timezone
- test has been added to ensure that the schema will validate the response correctly
This was only ever a spike into what it might look like to document
Notify’s API with Swagger (see
7c3d25a87a).
It’s no longer updated, and only talks about version 1 of the public
API.
Keeping it around now is just a liability, and gives us additional Pyup
upgrades to deal with.