If we know the code won’t pass the validation on the API side, we might
as well tell the user before even passing it to the API.
So this commit:
- adds some more validators to the field
- rewrites the validation function on the form to actually call the
field-level validators before hitting the API 🤦♂️
- refactors the tests to be parametrize, which means they can be
shorter, easier to read, and more comprehensive
tests weren't patching out create_event (which is invoked every time a
user logs in). This was getting caught by our egress proxy on jenkins.
We didn't notice because the event handler code was swallowing all
exceptions and not re-raising.
This changes that code to no longer swallow exceptions. Since we did
that, we also need to update all the tests that test log-in to mock
the call
Adds a platform admin button to the service settings to turn on/off
'upload_document' service permission. The permission allows uploading
documents to document-download-api through the post notification API
endpoint.
If a user clicks ‘back’ once they’ve sent a job we don’t want them to
land on the ‘check’ page again. This would suggest that they can send
the same job again (they can’t because that `job_id` is in the database
already). That said, it’s confusing to see that page; the natural thing
is to go jump back another step, to where you uploaded the file.
We’re going to stop storing job metadata in the session. So we can’t
rely on it for checking whether a file is valid. That safeguard is
happening in the API instead now (because it’s looking at the metadata
stored in S3).
For both SMS senders and email reply to addresses this commit adds:
- a delete link
- a confirmation loop
It doesn’t let users delete:
- default SMS senders or reply to addresses (they always have to have
one)
- inbound numbers
It assumes that the API will allow updating of an attribute named
`active` on the respective database rows. It could work in a different
way. We can’t do complete deletion though because these will still be
keyed to notifications.
we reckon users will like to see gov reply-to email addresses because
it will improve their confidence in the email.
however, some services, for a few complex reasons, don't want a gov
reply to address. rather than add their specific domains to the
whitelist for signups etc, just make reply tos allowed from any domain.
We vet reply-tos before services go live anyway.
S3 has a limit of 2kb for metadata:
> the user-defined metadata is limited to 2 KB in size. The size of
> user-defined metadata is measured by taking the sum of the number of
> bytes in the UTF-8 encoding of each key and value.
– https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata
This means we have a limit of 1870 bytes for the filename:
```python
encoded = 'notification_count50000template_id665d26e7-ceac-4cc5-82ed-63d773d21561validTrueoriginal_file_name'.encode('utf-8')
sys.getsizeof(b)
>>> 130
2000-130
>>> 1870
```
Or, in other words, ~918 characters:
```python
sys.getsizeof(('ü'*918).encode('utf-8'))
>>> 1869
```
By doing this we no longer have to store it in the session. This is the
last thing that’s currently in the session, so removing it means we can
drop session storage for file uploads entirely.
Storing things in the session is proving buggy – we still have one user
(that we know about) where the session data isn’t getting written, so
they’re blocked from uploading a file.
Since all the info we’re storing in the session is about the file, it
makes sense to store it with the file.
This commit only does the writing of the metadata, once we’re sure this
is working we can do subsequent work to read it back, and remove
reliance on the session.
p1 == "should notify team be alerted of this (via pagerduty)"
urgent == "should the user be told we'll look at it"
* If it's in office hours, it's always urgent. It's never a P1 because
we'll notice it anyway
* If it's outside of office hours, it's urgent and P1 if it's severe,
otherwise it's neither
We strip most whitespace as of:
https://github.com/alphagov/notifications-admin/pull/1701
However we are still getting some bad email addresses through, for
example one that had a leading zero-width space character. This means
that the user sees a validation error; really we should just deal with
the mess for them.
So this commit also includes characters without Unicode character
property "WSpace=Y" (which includes zero-width space) to those which are
stripped from form submissions.
List taken from here: https://en.wikipedia.org/wiki/Whitespace_character
See issue and discussion here: https://bugs.python.org/issue13391
we were seeing isort produce different outputs locally and in docker -
this was due to it having different opinions about whether the tests
module (ie all our unit tests) is a first party (local) or third party
(pip installed) import. It's a first party import, so by defining this
in the setup.cfg isort settings, we can force it to be consistent
between environments.
Note: I don't know why it was different in the first place though
It is standard practice when using GOV.UK template to highlight the
selected navigation item in the propositional navigation (black bar) by
colouring it blue.
This commit adds a new subclass of `Navigation` with the mapping needed
to decide which pages belong to which item in the navigation (or none
at all).
Because we have multiple navigations, which will share the same methods
(by subclassing) but different mappings of navigation items to endpoints
by overriding the `.mapping` and `.exclude` attributes.
In research I’ve sometimes seen people click the wrong nav item. I
reckon that people’s concept of which pages live behind which navigation
items isn’t very strong.
We can reinforce this relationship by showing, for every page, which is
the corresponding nav item. The conventional way of doing this is either
with some kind of emphasis, typically colour or bold. I’ve gone for bold
because colour would be weird.
---
The implementation of this is quite loosely coupled to our application
code because:
- our application code is not well structured (eg we don’t make any use
of blueprints)
- spreading this change across lots of files in our application would
make it harder to test without actually hitting each endpoints; such
tests would be slow and verbose
So I’ve gone for more of a meta approach. Rather than testing that each
endpoint has a specific navigation item selected, I’ve gone for
validating that:
- all endpoints being mapped to are real
- all endpoints have _a_ selected navigation item (or are specifically
excluded)
This means that it’s impossible to add, change or remove an endpoint
without also updating which navigation item should be selected. And the
actual mapping is so declarative that it testing it would be redundant.
Redis is giving us a big performance boost (it’s roughly halved the
median request time on the admin app).
Once we’re confident that it’s working properly[1] we can eke out a bit
more performance from it by keeping the caches alive for longer. As
far as I can tell we’re still using Redis in a very low-volume way[2],
so increasing the number of things we’re storing shouldn’t start taxing
our Redis server at all. But reducing the number of times we have to
hit the API to refresh the cache _should_ result in some performance
increase.
---
1. ie we’re not seeing instances of stale caches not being invalidated
2. We have 2.5G of available space in Redis. Here is our current usage:
```
used_memory:7728960
used_memory_human:7.37M
used_memory_rss:7728960
used_memory_peak:16563776
used_memory_peak_human:15.79M
used_memory_lua:37888
```
This is easier to read than having to understand the arguments 1…n of
the cache decorator are ‘magic’, and gives us more flexibility about
how the cache keys are formatted, eg being able to add words in the
middle of them.
Also changes the key format for all templates to be
`service-{service_id}-templates` instead of `templates-{service_id}`
because then it’s clearer what the ID represents.
A lot of the frequently-used pages in the admin app rely on the API to
get templates.
So this commit adds three new caches:
- a single template version (including a key without a version number,
which is the current version)
- all the templates for a service
- all versions of a template
The first will be the most crucial for performance, but there’s not much
cost to adding the other two.
Accepting an invite changes:
- the `user_to_service` list of users returned by `GET /service/<id>`
- the `services` list return by `GET /user/<id>`
The latter change is causing the functional tests to fail.
In the same way, and for the same reasons that we’re caching the service
object.
Here’s a sample of the data returned by the API – so we should make sure
that any changes to this data invalidate the cache.
If we ever change a user’s phone number (for example) directly in the
database, then we will need to invalidate this cache manually.
```python
{
'data':{
'organisations':[
'4c707b81-4c6d-4d33-9376-17f0de6e0405'
],
'logged_in_at':'2018-04-10T11:41:03.781990Z',
'id':'2c45486e-177e-40b8-997d-5f4f81a461ca',
'email_address':'test@example.gov.uk',
'platform_admin':False,
'password_changed_at':'2018-01-01 10:10:10.100000',
'permissions':{
'42a9d4f2-1444-4e22-9133-52d9e406213f':[
'manage_api_keys',
'send_letters',
'manage_users',
'manage_templates',
'view_activity',
'send_texts',
'send_emails',
'manage_settings'
],
'a928eef8-0f25-41ca-b480-0447f29b2c20':[
'manage_users',
'manage_templates',
'manage_settings',
'send_texts',
'send_emails',
'send_letters',
'manage_api_keys',
'view_activity'
],
},
'state':'active',
'mobile_number':'07700900123',
'failed_login_count':0,
'name':'Example',
'services':[
'6078a8c0-52f5-4c4f-b724-d7d1ff2d3884',
'6afe3c1c-7fda-4d8d-aa8d-769c4bdf7803',
],
'current_session_id':'fea2ade1-db0a-4c90-93e7-c64a877ce83e',
'auth_type':'sms_auth'
}
}
```
Most of the time spent by the admin app to generate a page is spent
waiting for the API. This is slow for three reasons:
1. Talking to the API means going out to the internet, then through
nginx, the Flask app, SQLAlchemy, down to the database, and then
serialising the result to JSON and making it into a HTTP response
2. Each call to the API is synchronous, therefore if a page needs 3 API
calls to render then the second API call won’t be made until the
first has finished, and the third won’t start until the second has
finished
3. Every request for a service page in the admin app makes a minimum
of two requests to the API (`GET /service/…` and `GET /user/…`)
Hitting the database will always be the slowest part of an app like
Notify. But this slowness is exacerbated by 2. and 3. Conversely every
speedup made to 1. is multiplied by 2. and 3.
So this pull request aims to make 1. a _lot_ faster by taking nginx,
Flask, SQLAlchemy and the database out of the equation. It replaces them
with Redis, which as an in-memory key/value store is a lot faster than
Postgres. There is still the overhead of going across the network to
talk to Redis, but the net improvement is vast.
This commit only caches the `GET /service` response, but is written in
such a way that we can easily expand to caching other responses down the
line.
The tradeoff here is that our code is more complex, and we risk
introducing edge cases where a cache becomes stale. The mitigations
against this are:
- invalidating all caches after 24h so a stale cache doesn’t remain
around indefinitely
- being careful when we add new stuff to the service response
---
Some indicative numbers, based on:
- `GET http://localhost:6012/services/<service_id>/template/<template_id>`
- with the admin app running locally
- talking to Redis running locally
- also talking to the API running locally, itself talking to a local
Postgres instance
- times measured with Chrome web inspector, average of 10 requests
╲ | No cache | Cache service | Cache service and user | Cache service, user and template
-- | -- | -- | -- | --
**Request time** | 136ms | 97ms | 73ms | 37ms
**Improvement** | 0% | 41% | 88% | 265%
---
Estimates of how much storage this requires:
- Services: 1,942 on production × 2kb = 4Mb
- Users: 4,534 on production × 2kb = 9Mb
- Templates: 7,079 on production × 4kb = 28Mb