We were trying to delete the old 'template-{template-id}' keys but
should have been deleting the new keys which have the service id as part
of the key name. This was causing the cache to not be correctly purged
when we did things like update sender names or set defaults. This should
fix it.
We can drop use of the old key as we no longer need to read data from
the old key. Either data exists in the new key and we read it from there
or data doesn't exist in the new key and we go to the API to get it and
then set it in redis.
Note, the previous commit is important because it means we aren't at
risk of when this commit is being deployed out, of us getting stale data
from the old key.
We want to change cache keys for templates and broadcasts to include
their service ID. So cache keys should change from
`template-{template_id}-versions` to
`service-{service_id}-template-{template_id}-versions`.
The first step of this which needs to be deployed as a change first is
to delete both keys when updating service templates (even if they key is
not yet set). This means that when we release code in the next PR to
start setting the new key, we won't run into a case where either the old
or the new key can remain set with stale data.
sms sender, email reply to, letter contact blocks.
These are all cached within template, under `template.reply_to` - if the
template doesnt have a specific default, then that field stores the
service default. So when service default changes, we need to clear the
template cache so that it is updated to reflect that.
We already use this pattern for deleting the template cache for a bunch
of templates in `template_folder_api_client.move_to_folder`
The API now[1] accepts requests on `…/guest-list` as well as
`…/whitelist`. This commit starts using the former, which means:
- the use of ‘whitelist’ is fully gone from the admin app
- the API can stop using it, at least in URLs
1. As of https://github.com/alphagov/notifications-api/pull/2928
This should speed things up by:
- less time waiting for big blobs of JSON to come from Redis or the API
- less time spent deserialising big blobs of JSON
Fixes a bug where we were calling a wrapper method when instead we
should have been calling the redis_client. This had resulted in no
actual calls to redis happening.
When the admin app gets user objects from the API, these include a dict
of permissions by service for what the user can do to that services.
Permissions for inactive services are not included in the response as
per:
87cb6f2597/app/dao/permissions_dao.py (L66)
However, this causes a bug where a service is archived but cached user
data still tells us that the user has permissions to view the service.
This should not be the case and causes errors where users can still see
the archived service page, it's settings, and even request to go live
for it, because they are using old cached data for the user.
We solve this by deleting the users who are part of the service from the
cache.
We also delete the templates for this service from the cache as the
templates are also archived when we ask the API to archive the service
as per:
d95c0131e0/app/service/rest.py (L597)
Note, one decision I had to make was whether to delete the user cache
for just active team members or also invited users. Assuming an invited
user can't see the service until they've accepted their invite anyway, it
shouldn't make any difference whether we delete their cache or not.
We’re going to start using the returned letters summary to show some
info on the dashboard.
This means we will be accessing it more often than it changes. And we
know exactly when it changes because it’s us manually submitting the
references we get from DVLA.
This makes it a good candidate for being cached, and Redis is where we
cache stuff that we’d otherwise go to the API for.
Before, the delete decorator would delete the keys from Redis and then
we made the request to api to change the data. However, it is possible
that the cache could get re-populated in between these two things
happening, and so would cache outdated data.
This changes the order to send the api request first. We then always
delete the specified keys from Redis. Changing the order of the code in
the decorator changes the order in which the cache keys get deleted, so
the tests have been updated.
Added a folder permissions form to the page to invite users to services.
This only shows if the service has 'edit_folder_permissions' enabled,
and all folder checkboxes are checked by default. This change means that
InviteApiClient.create_invite now sends folder_permissions through to
notifications_api (so invites get created with folder permissions).
Started passing the folder_permissions through to notifications-api when
accepting an invite. This changes UserApiClient.add_user_to_service to
send folder_permissions to notifications_api so that new users get folder
permissions when they are added to the service.
when clients are defined in app/__init__.py, it increases the chance of
cyclical imports. By moving module level client singletons out to a
separate extensions file, we stop cyclical imports, but keep the same
code flow - the clients are still initialised in `create_app` in
`__init__.py`.
The redis client in particular is no longer separate - previously redis
was set up on the `NotifyAdminAPIClient` base class, but now there's one
singleton in `app.extensions`. This was done so that we can access redis
from outside of the existing clients.
Pytest is deprecating the direct calling of fixtures. One fixture that
we call directly quite a lot is `fake_uuid`. Since it just returns the
value of `sample_uuid()` we can either call that instead (where we need
a fixed value) or generate a new UUID each time (where a fixed value is
not needed).
Allows getting notification counts for a given number of days to
support services with custom data retention periods (admin
dashboard page should still display counts for the last 7 days,
while the notifications page displays all stored notifications).
we're not actually looking at the detailed service aspects - just
the stats. We're doing this in three places:
* dashboard
* notification activity page
* when checking jobs to see if we're over the daily limit
change these places to use a new api endpoint (service/id/statistics),
which hopefully be a little more performant, and will definitely be a
little more organised - moving away from generic endpoints with loads
of optional parameters.
We still need the detailed endpoints for the platform admin page tho.
Depends on https://github.com/alphagov/notifications-api/pull/1865
we were seeing isort produce different outputs locally and in docker -
this was due to it having different opinions about whether the tests
module (ie all our unit tests) is a first party (local) or third party
(pip installed) import. It's a first party import, so by defining this
in the setup.cfg isort settings, we can force it to be consistent
between environments.
Note: I don't know why it was different in the first place though
Redis is giving us a big performance boost (it’s roughly halved the
median request time on the admin app).
Once we’re confident that it’s working properly[1] we can eke out a bit
more performance from it by keeping the caches alive for longer. As
far as I can tell we’re still using Redis in a very low-volume way[2],
so increasing the number of things we’re storing shouldn’t start taxing
our Redis server at all. But reducing the number of times we have to
hit the API to refresh the cache _should_ result in some performance
increase.
---
1. ie we’re not seeing instances of stale caches not being invalidated
2. We have 2.5G of available space in Redis. Here is our current usage:
```
used_memory:7728960
used_memory_human:7.37M
used_memory_rss:7728960
used_memory_peak:16563776
used_memory_peak_human:15.79M
used_memory_lua:37888
```
This is easier to read than having to understand the arguments 1…n of
the cache decorator are ‘magic’, and gives us more flexibility about
how the cache keys are formatted, eg being able to add words in the
middle of them.
Also changes the key format for all templates to be
`service-{service_id}-templates` instead of `templates-{service_id}`
because then it’s clearer what the ID represents.
A lot of the frequently-used pages in the admin app rely on the API to
get templates.
So this commit adds three new caches:
- a single template version (including a key without a version number,
which is the current version)
- all the templates for a service
- all versions of a template
The first will be the most crucial for performance, but there’s not much
cost to adding the other two.
Accepting an invite changes:
- the `user_to_service` list of users returned by `GET /service/<id>`
- the `services` list return by `GET /user/<id>`
The latter change is causing the functional tests to fail.
In the same way, and for the same reasons that we’re caching the service
object.
Here’s a sample of the data returned by the API – so we should make sure
that any changes to this data invalidate the cache.
If we ever change a user’s phone number (for example) directly in the
database, then we will need to invalidate this cache manually.
```python
{
'data':{
'organisations':[
'4c707b81-4c6d-4d33-9376-17f0de6e0405'
],
'logged_in_at':'2018-04-10T11:41:03.781990Z',
'id':'2c45486e-177e-40b8-997d-5f4f81a461ca',
'email_address':'test@example.gov.uk',
'platform_admin':False,
'password_changed_at':'2018-01-01 10:10:10.100000',
'permissions':{
'42a9d4f2-1444-4e22-9133-52d9e406213f':[
'manage_api_keys',
'send_letters',
'manage_users',
'manage_templates',
'view_activity',
'send_texts',
'send_emails',
'manage_settings'
],
'a928eef8-0f25-41ca-b480-0447f29b2c20':[
'manage_users',
'manage_templates',
'manage_settings',
'send_texts',
'send_emails',
'send_letters',
'manage_api_keys',
'view_activity'
],
},
'state':'active',
'mobile_number':'07700900123',
'failed_login_count':0,
'name':'Example',
'services':[
'6078a8c0-52f5-4c4f-b724-d7d1ff2d3884',
'6afe3c1c-7fda-4d8d-aa8d-769c4bdf7803',
],
'current_session_id':'fea2ade1-db0a-4c90-93e7-c64a877ce83e',
'auth_type':'sms_auth'
}
}
```
Most of the time spent by the admin app to generate a page is spent
waiting for the API. This is slow for three reasons:
1. Talking to the API means going out to the internet, then through
nginx, the Flask app, SQLAlchemy, down to the database, and then
serialising the result to JSON and making it into a HTTP response
2. Each call to the API is synchronous, therefore if a page needs 3 API
calls to render then the second API call won’t be made until the
first has finished, and the third won’t start until the second has
finished
3. Every request for a service page in the admin app makes a minimum
of two requests to the API (`GET /service/…` and `GET /user/…`)
Hitting the database will always be the slowest part of an app like
Notify. But this slowness is exacerbated by 2. and 3. Conversely every
speedup made to 1. is multiplied by 2. and 3.
So this pull request aims to make 1. a _lot_ faster by taking nginx,
Flask, SQLAlchemy and the database out of the equation. It replaces them
with Redis, which as an in-memory key/value store is a lot faster than
Postgres. There is still the overhead of going across the network to
talk to Redis, but the net improvement is vast.
This commit only caches the `GET /service` response, but is written in
such a way that we can easily expand to caching other responses down the
line.
The tradeoff here is that our code is more complex, and we risk
introducing edge cases where a cache becomes stale. The mitigations
against this are:
- invalidating all caches after 24h so a stale cache doesn’t remain
around indefinitely
- being careful when we add new stuff to the service response
---
Some indicative numbers, based on:
- `GET http://localhost:6012/services/<service_id>/template/<template_id>`
- with the admin app running locally
- talking to Redis running locally
- also talking to the API running locally, itself talking to a local
Postgres instance
- times measured with Chrome web inspector, average of 10 requests
╲ | No cache | Cache service | Cache service and user | Cache service, user and template
-- | -- | -- | -- | --
**Request time** | 136ms | 97ms | 73ms | 37ms
**Improvement** | 0% | 41% | 88% | 265%
---
Estimates of how much storage this requires:
- Services: 1,942 on production × 2kb = 4Mb
- Users: 4,534 on production × 2kb = 9Mb
- Templates: 7,079 on production × 4kb = 28Mb
Done using isort[1], with the following command:
```
isort -rc ./app ./tests
```
Adds linting to the `run_tests.sh` script to stop badly-sorted imports
getting re-introduced.
Chosen style is ‘Vertical Hanging Indent’ with trailing commas, because
I think it gives the cleanest diffs, eg:
```
from third_party import (
lib1,
lib2,
lib3,
lib4,
)
```
1. https://pypi.python.org/pypi/isort
When users request to go live we check stuff like:
- if they’ve added templates
- if they have email templates (then we can check their reply to
address)
This commit adds a method to do this programatically rather than
manually.
We _could_ do this in SQL, but for page that’s used intermittently it
doesn’t feel worth the work/optimisation (and the client method is at
least in place now if we do ever need to lean on this code more
heavily).
Different parts of government get billed slightly differently, and
there’s differences in how much money we’re allowed to give them.
Think these numbers are right, but should be double checked.
So that we can default services to their appropriate text allowance, we
need to find out what sector they're in. So let's start collecting that
from teams as they create new services.
I think Central/Local/NHS are the right options, but these can be easily
changed if not.
Mutating dictionaries is gross and doesn’t work as you’d expect. Better
to have the function return a new dictionary instead.
Means we can be explicit that `created_by` is one of the allowed params
when updating a service.
To prevent typos and inadvertently updating something we shouldn’t,
this adds some filtering to the update_service method to make sure it
is only allowed to update certain attributes of a service.