The app is now always getting an organisation by its `id` (or domain),
and never by `service_id`.
This means that the client method and associated mocking can be removed.
I think this must come from a time when the service response didn’t
include `organisation_id`, but now it always does.
A lot of pages in the admin app are now generated entirely from Redis,
without touching the API.
The one remaining API call that a lot of pages make, when the user is
platform admin or a member of an organisation, is to get the name of
the current service’s organisation.
This commit adds some code to start caching that as well, which should
speed up page load times for when we’re clicking around the admin app
(it’s typically 100ms just to get the organisation, and more than that
when the API is under load).
This means changing the service model to get the organisation from the
API by ID, not by service ID. Otherwise it would be very hard to clear
the cache if the name of the organisation ever changed.
We can’t cache the whole organisation because it has a
`count_of_live_services` field which can change at any time, without an
update being made.
Updating an organisation’s branding might now also update the branding
of services associated to that organisation. This is similar to how
updating an organisation’s type can update the organisation type for its
services.
In the latter case we already make sure to clear the cached version of
these services which is held in Redis.
This commit does the same clearing of the caches when updating an
organisation’s branding (and does a bit of refactoring to do so without
duplication of code.)
This should make the ‘All organisations’ page load a lil’ bit quicker.
Still worth caching the domains separately so the response is smaller
when we only care about domains. This is because the code that uses the
domains is part of the sign up flow, so it’s really important that it’s
snappy.
The domains lookup is a bit slow because it’s serialising all the
organisations in the database. Since we’re putting this in the sign up
flow it should feel snappy, so lets cache just the domain bit of it in
Redis.
At the moment we have to update a YAML file and deploy the change to get
a new domain whitelisted.
We already have a thing for adding new domains – the organisation stuff.
This commit extends the validation to look in the `domains` table on the
API if it can’t find anything in the YAML whitelist.
This has the advantage of:
- not having to deploy code to whitelist a new domain
- forcing us to create new organisations as they come along, so that
users’ services automatically get allocated to the organisation once
their domain is whitelisted
This is the first step of replacing the `domains.yml` file.
In order to replicate the same functionality we get from the
`domains.yml` file and its associated code this commit adds a
`Organisation` model. This model copies a lot of methods from the
`AgreementInfo` class which wrapped the `domains.yml` file.
It factors out some stuff that would otherwise be duplicated between the
`Organisation` and `Service` model, in such a way that could be reused
for making other models in the future.
This commit doesn’t change other parts of the code to make use of this
new model yet – that will come in subsequent commits.
Returns the data calculated by the API. Stored in Redis against a
hardcoded key so that no-one hammering the home page is directly hitting
the database.
This removes some code which is duplicative and obscure (ie it’s not
very clear why we do `"a" * 73` even though there is a Very Good Reason
for doing so).
This is easier to read than having to understand the arguments 1…n of
the cache decorator are ‘magic’, and gives us more flexibility about
how the cache keys are formatted, eg being able to add words in the
middle of them.
Also changes the key format for all templates to be
`service-{service_id}-templates` instead of `templates-{service_id}`
because then it’s clearer what the ID represents.
`@cache.delete('user', 'user_id')` is easier to read and understand than
`@cache.delete('user', key_from_args=[1])`. This will become even more
apparent if we have to start doing stuff like `key_from_args=[1, 5]`,
which is a lot more opaque than just saying
`'service_id', 'template_id'`.
It does make the implementation a bit more complex, but I’m not too
worried about that because:
- the tests are solid
- it’s nicely encapsulated
Most of the time spent by the admin app to generate a page is spent
waiting for the API. This is slow for three reasons:
1. Talking to the API means going out to the internet, then through
nginx, the Flask app, SQLAlchemy, down to the database, and then
serialising the result to JSON and making it into a HTTP response
2. Each call to the API is synchronous, therefore if a page needs 3 API
calls to render then the second API call won’t be made until the
first has finished, and the third won’t start until the second has
finished
3. Every request for a service page in the admin app makes a minimum
of two requests to the API (`GET /service/…` and `GET /user/…`)
Hitting the database will always be the slowest part of an app like
Notify. But this slowness is exacerbated by 2. and 3. Conversely every
speedup made to 1. is multiplied by 2. and 3.
So this pull request aims to make 1. a _lot_ faster by taking nginx,
Flask, SQLAlchemy and the database out of the equation. It replaces them
with Redis, which as an in-memory key/value store is a lot faster than
Postgres. There is still the overhead of going across the network to
talk to Redis, but the net improvement is vast.
This commit only caches the `GET /service` response, but is written in
such a way that we can easily expand to caching other responses down the
line.
The tradeoff here is that our code is more complex, and we risk
introducing edge cases where a cache becomes stale. The mitigations
against this are:
- invalidating all caches after 24h so a stale cache doesn’t remain
around indefinitely
- being careful when we add new stuff to the service response
---
Some indicative numbers, based on:
- `GET http://localhost:6012/services/<service_id>/template/<template_id>`
- with the admin app running locally
- talking to Redis running locally
- also talking to the API running locally, itself talking to a local
Postgres instance
- times measured with Chrome web inspector, average of 10 requests
╲ | No cache | Cache service | Cache service and user | Cache service, user and template
-- | -- | -- | -- | --
**Request time** | 136ms | 97ms | 73ms | 37ms
**Improvement** | 0% | 41% | 88% | 265%
---
Estimates of how much storage this requires:
- Services: 1,942 on production × 2kb = 4Mb
- Users: 4,534 on production × 2kb = 9Mb
- Templates: 7,079 on production × 4kb = 28Mb
Currently requests to the API made from the admin app are going from
PaaS admin app to the nginx router ELB, which then routes them back
to the api app on PaaS.
This makes sense for external requests, but for requests made from
the admin app we could skip nginx and go directly to the api PaaS
host, which should reduce load on the nginx instances and
potentially reduce latency of the api requests.
API apps on PaaS are checking the X-Custom-Forwarder header (which
is set by nginx on proxy_pass requests) to only allow requests going
through the proxy.
This adds the custom header to the API client requests, so that they
can pass that header check without going through nginx.
Done using isort[1], with the following command:
```
isort -rc ./app ./tests
```
Adds linting to the `run_tests.sh` script to stop badly-sorted imports
getting re-introduced.
Chosen style is ‘Vertical Hanging Indent’ with trailing commas, because
I think it gives the cleanest diffs, eg:
```
from third_party import (
lib1,
lib2,
lib3,
lib4,
)
```
1. https://pypi.python.org/pypi/isort