Most of the time spent by the admin app to generate a page is spent
waiting for the API. This is slow for three reasons:
1. Talking to the API means going out to the internet, then through
nginx, the Flask app, SQLAlchemy, down to the database, and then
serialising the result to JSON and making it into a HTTP response
2. Each call to the API is synchronous, therefore if a page needs 3 API
calls to render then the second API call won’t be made until the
first has finished, and the third won’t start until the second has
finished
3. Every request for a service page in the admin app makes a minimum
of two requests to the API (`GET /service/…` and `GET /user/…`)
Hitting the database will always be the slowest part of an app like
Notify. But this slowness is exacerbated by 2. and 3. Conversely every
speedup made to 1. is multiplied by 2. and 3.
So this pull request aims to make 1. a _lot_ faster by taking nginx,
Flask, SQLAlchemy and the database out of the equation. It replaces them
with Redis, which as an in-memory key/value store is a lot faster than
Postgres. There is still the overhead of going across the network to
talk to Redis, but the net improvement is vast.
This commit only caches the `GET /service` response, but is written in
such a way that we can easily expand to caching other responses down the
line.
The tradeoff here is that our code is more complex, and we risk
introducing edge cases where a cache becomes stale. The mitigations
against this are:
- invalidating all caches after 24h so a stale cache doesn’t remain
around indefinitely
- being careful when we add new stuff to the service response
---
Some indicative numbers, based on:
- `GET http://localhost:6012/services/<service_id>/template/<template_id>`
- with the admin app running locally
- talking to Redis running locally
- also talking to the API running locally, itself talking to a local
Postgres instance
- times measured with Chrome web inspector, average of 10 requests
╲ | No cache | Cache service | Cache service and user | Cache service, user and template
-- | -- | -- | -- | --
**Request time** | 136ms | 97ms | 73ms | 37ms
**Improvement** | 0% | 41% | 88% | 265%
---
Estimates of how much storage this requires:
- Services: 1,942 on production × 2kb = 4Mb
- Users: 4,534 on production × 2kb = 9Mb
- Templates: 7,079 on production × 4kb = 28Mb
* Juror Central Summoning Bureau - Online – HM Courts & Tribunals Service
* Richmond and Wandsworth Council – Richmond and Wandsworth Council
* ESFA feedback – Department for Education
* PINS Digital Service – Planning Inspectorate
* HMPO Belfast – Home Office
* food.gov.uk – Food Standards Agency
* NCC Mobile Team – Newcastle City Council
* Cambs CC Emergency planning – Cambridgeshire County Council
* HMPO Durham – Home Office
* HMPO London – Home Office
* HMPO Glasgow – Home Office
* HMPO Peterborough – Home Office
* HMPO Southport – Home Office
* Student Loans Company. Password Reset. – Student Loans Company
* DigITS – Crown Commercial Service
* Ofsted Reports Beta – Ofsted
* Revenue Accounts – Ministry of Housing, Communities & Local Government
* Luton Council - Electoral Registration – Luton Council
* Cael eich Pensiwn y Wladwriaeth – Department for Work and Pensions
* Warwick Crown Court – Ministry of Justice
* ProLive - Housing Repairs – Pembrokeshire County Council
* Wirral Council – Wirral Council
* DfT Bus Open Data – Department for Transport
* Paris File Watcher – Pembrokeshire County Council
* WFDC Electoral Services – Wyre Forest District Council
Adding a ‘testing’ template it not enough. It needs to have some real
looking content, so that we can:
- work out what a service is doing
- assess whether that’s a reasonable (ie meeting the terms of use) thing
to be doing with Notify
At the moment we’re having to go back to services quite a lot when they
request to go live and ask them for this stuff.
The check page expects template ID to be passed through in the URL not
the session now. The send test letter page wasn’t changed.
This commit changes it, and adds a test to make sure this path is
covered.
The start job endpoint needs the template ID in order to make the API
call.
It doesn’t make sense to add it to the start job URL, because users
could potentially start a job with the wrong template by hacking the URL
(which would blow up at some point, if the template didn’t match the
columns in the file).
A of this commit’s parent we are storing `template_id` and
`original_file_name` in the URL. Getting them from the URL is better,
so the check page no longer needs to look for them in the session. This
commit removes the code that looks for these values in the session.
At the moment you can’t press refresh on the check page if there’s
errors. This is because the session gets cleared when there’s errors.
This is a bad user experience.
The data that this page is relying on (from the session) is:
- template ID
- original file name
Neither of these things need to be in the session because:
- they are not secret
- the user can modify them already (by choosing a different template or
renaming their file locally)
So this commit additionally stores them in the URL.
if a user signs in again, clear their file upload data from any
aborted journeys from before, so that their cookies don't fill up
also add some temporary logging when the session starts getting full.