For both SMS senders and email reply to addresses this commit adds:
- a delete link
- a confirmation loop
It doesn’t let users delete:
- default SMS senders or reply to addresses (they always have to have
one)
- inbound numbers
It assumes that the API will allow updating of an attribute named
`active` on the respective database rows. It could work in a different
way. We can’t do complete deletion though because these will still be
keyed to notifications.
we reckon users will like to see gov reply-to email addresses because
it will improve their confidence in the email.
however, some services, for a few complex reasons, don't want a gov
reply to address. rather than add their specific domains to the
whitelist for signups etc, just make reply tos allowed from any domain.
We vet reply-tos before services go live anyway.
S3 has a limit of 2kb for metadata:
> the user-defined metadata is limited to 2 KB in size. The size of
> user-defined metadata is measured by taking the sum of the number of
> bytes in the UTF-8 encoding of each key and value.
– https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata
This means we have a limit of 1870 bytes for the filename:
```python
encoded = 'notification_count50000template_id665d26e7-ceac-4cc5-82ed-63d773d21561validTrueoriginal_file_name'.encode('utf-8')
sys.getsizeof(b)
>>> 130
2000-130
>>> 1870
```
Or, in other words, ~918 characters:
```python
sys.getsizeof(('ü'*918).encode('utf-8'))
>>> 1869
```
By doing this we no longer have to store it in the session. This is the
last thing that’s currently in the session, so removing it means we can
drop session storage for file uploads entirely.
Storing things in the session is proving buggy – we still have one user
(that we know about) where the session data isn’t getting written, so
they’re blocked from uploading a file.
Since all the info we’re storing in the session is about the file, it
makes sense to store it with the file.
This commit only does the writing of the metadata, once we’re sure this
is working we can do subsequent work to read it back, and remove
reliance on the session.
p1 == "should notify team be alerted of this (via pagerduty)"
urgent == "should the user be told we'll look at it"
* If it's in office hours, it's always urgent. It's never a P1 because
we'll notice it anyway
* If it's outside of office hours, it's urgent and P1 if it's severe,
otherwise it's neither
We strip most whitespace as of:
https://github.com/alphagov/notifications-admin/pull/1701
However we are still getting some bad email addresses through, for
example one that had a leading zero-width space character. This means
that the user sees a validation error; really we should just deal with
the mess for them.
So this commit also includes characters without Unicode character
property "WSpace=Y" (which includes zero-width space) to those which are
stripped from form submissions.
List taken from here: https://en.wikipedia.org/wiki/Whitespace_character
See issue and discussion here: https://bugs.python.org/issue13391
we were seeing isort produce different outputs locally and in docker -
this was due to it having different opinions about whether the tests
module (ie all our unit tests) is a first party (local) or third party
(pip installed) import. It's a first party import, so by defining this
in the setup.cfg isort settings, we can force it to be consistent
between environments.
Note: I don't know why it was different in the first place though
It is standard practice when using GOV.UK template to highlight the
selected navigation item in the propositional navigation (black bar) by
colouring it blue.
This commit adds a new subclass of `Navigation` with the mapping needed
to decide which pages belong to which item in the navigation (or none
at all).
Because we have multiple navigations, which will share the same methods
(by subclassing) but different mappings of navigation items to endpoints
by overriding the `.mapping` and `.exclude` attributes.
In research I’ve sometimes seen people click the wrong nav item. I
reckon that people’s concept of which pages live behind which navigation
items isn’t very strong.
We can reinforce this relationship by showing, for every page, which is
the corresponding nav item. The conventional way of doing this is either
with some kind of emphasis, typically colour or bold. I’ve gone for bold
because colour would be weird.
---
The implementation of this is quite loosely coupled to our application
code because:
- our application code is not well structured (eg we don’t make any use
of blueprints)
- spreading this change across lots of files in our application would
make it harder to test without actually hitting each endpoints; such
tests would be slow and verbose
So I’ve gone for more of a meta approach. Rather than testing that each
endpoint has a specific navigation item selected, I’ve gone for
validating that:
- all endpoints being mapped to are real
- all endpoints have _a_ selected navigation item (or are specifically
excluded)
This means that it’s impossible to add, change or remove an endpoint
without also updating which navigation item should be selected. And the
actual mapping is so declarative that it testing it would be redundant.
Redis is giving us a big performance boost (it’s roughly halved the
median request time on the admin app).
Once we’re confident that it’s working properly[1] we can eke out a bit
more performance from it by keeping the caches alive for longer. As
far as I can tell we’re still using Redis in a very low-volume way[2],
so increasing the number of things we’re storing shouldn’t start taxing
our Redis server at all. But reducing the number of times we have to
hit the API to refresh the cache _should_ result in some performance
increase.
---
1. ie we’re not seeing instances of stale caches not being invalidated
2. We have 2.5G of available space in Redis. Here is our current usage:
```
used_memory:7728960
used_memory_human:7.37M
used_memory_rss:7728960
used_memory_peak:16563776
used_memory_peak_human:15.79M
used_memory_lua:37888
```
This is easier to read than having to understand the arguments 1…n of
the cache decorator are ‘magic’, and gives us more flexibility about
how the cache keys are formatted, eg being able to add words in the
middle of them.
Also changes the key format for all templates to be
`service-{service_id}-templates` instead of `templates-{service_id}`
because then it’s clearer what the ID represents.
A lot of the frequently-used pages in the admin app rely on the API to
get templates.
So this commit adds three new caches:
- a single template version (including a key without a version number,
which is the current version)
- all the templates for a service
- all versions of a template
The first will be the most crucial for performance, but there’s not much
cost to adding the other two.
Accepting an invite changes:
- the `user_to_service` list of users returned by `GET /service/<id>`
- the `services` list return by `GET /user/<id>`
The latter change is causing the functional tests to fail.
In the same way, and for the same reasons that we’re caching the service
object.
Here’s a sample of the data returned by the API – so we should make sure
that any changes to this data invalidate the cache.
If we ever change a user’s phone number (for example) directly in the
database, then we will need to invalidate this cache manually.
```python
{
'data':{
'organisations':[
'4c707b81-4c6d-4d33-9376-17f0de6e0405'
],
'logged_in_at':'2018-04-10T11:41:03.781990Z',
'id':'2c45486e-177e-40b8-997d-5f4f81a461ca',
'email_address':'test@example.gov.uk',
'platform_admin':False,
'password_changed_at':'2018-01-01 10:10:10.100000',
'permissions':{
'42a9d4f2-1444-4e22-9133-52d9e406213f':[
'manage_api_keys',
'send_letters',
'manage_users',
'manage_templates',
'view_activity',
'send_texts',
'send_emails',
'manage_settings'
],
'a928eef8-0f25-41ca-b480-0447f29b2c20':[
'manage_users',
'manage_templates',
'manage_settings',
'send_texts',
'send_emails',
'send_letters',
'manage_api_keys',
'view_activity'
],
},
'state':'active',
'mobile_number':'07700900123',
'failed_login_count':0,
'name':'Example',
'services':[
'6078a8c0-52f5-4c4f-b724-d7d1ff2d3884',
'6afe3c1c-7fda-4d8d-aa8d-769c4bdf7803',
],
'current_session_id':'fea2ade1-db0a-4c90-93e7-c64a877ce83e',
'auth_type':'sms_auth'
}
}
```
Most of the time spent by the admin app to generate a page is spent
waiting for the API. This is slow for three reasons:
1. Talking to the API means going out to the internet, then through
nginx, the Flask app, SQLAlchemy, down to the database, and then
serialising the result to JSON and making it into a HTTP response
2. Each call to the API is synchronous, therefore if a page needs 3 API
calls to render then the second API call won’t be made until the
first has finished, and the third won’t start until the second has
finished
3. Every request for a service page in the admin app makes a minimum
of two requests to the API (`GET /service/…` and `GET /user/…`)
Hitting the database will always be the slowest part of an app like
Notify. But this slowness is exacerbated by 2. and 3. Conversely every
speedup made to 1. is multiplied by 2. and 3.
So this pull request aims to make 1. a _lot_ faster by taking nginx,
Flask, SQLAlchemy and the database out of the equation. It replaces them
with Redis, which as an in-memory key/value store is a lot faster than
Postgres. There is still the overhead of going across the network to
talk to Redis, but the net improvement is vast.
This commit only caches the `GET /service` response, but is written in
such a way that we can easily expand to caching other responses down the
line.
The tradeoff here is that our code is more complex, and we risk
introducing edge cases where a cache becomes stale. The mitigations
against this are:
- invalidating all caches after 24h so a stale cache doesn’t remain
around indefinitely
- being careful when we add new stuff to the service response
---
Some indicative numbers, based on:
- `GET http://localhost:6012/services/<service_id>/template/<template_id>`
- with the admin app running locally
- talking to Redis running locally
- also talking to the API running locally, itself talking to a local
Postgres instance
- times measured with Chrome web inspector, average of 10 requests
╲ | No cache | Cache service | Cache service and user | Cache service, user and template
-- | -- | -- | -- | --
**Request time** | 136ms | 97ms | 73ms | 37ms
**Improvement** | 0% | 41% | 88% | 265%
---
Estimates of how much storage this requires:
- Services: 1,942 on production × 2kb = 4Mb
- Users: 4,534 on production × 2kb = 9Mb
- Templates: 7,079 on production × 4kb = 28Mb
Adding a ‘testing’ template it not enough. It needs to have some real
looking content, so that we can:
- work out what a service is doing
- assess whether that’s a reasonable (ie meeting the terms of use) thing
to be doing with Notify
At the moment we’re having to go back to services quite a lot when they
request to go live and ask them for this stuff.
The check page expects template ID to be passed through in the URL not
the session now. The send test letter page wasn’t changed.
This commit changes it, and adds a test to make sure this path is
covered.
The start job endpoint needs the template ID in order to make the API
call.
It doesn’t make sense to add it to the start job URL, because users
could potentially start a job with the wrong template by hacking the URL
(which would blow up at some point, if the template didn’t match the
columns in the file).
A of this commit’s parent we are storing `template_id` and
`original_file_name` in the URL. Getting them from the URL is better,
so the check page no longer needs to look for them in the session. This
commit removes the code that looks for these values in the session.
At the moment you can’t press refresh on the check page if there’s
errors. This is because the session gets cleared when there’s errors.
This is a bad user experience.
The data that this page is relying on (from the session) is:
- template ID
- original file name
Neither of these things need to be in the session because:
- they are not secret
- the user can modify them already (by choosing a different template or
renaming their file locally)
So this commit additionally stores them in the URL.
Because we now[1] store info about each file upload separately in the
session the session isn’t overridden every time you upload a file. This
is good because you can do multiple file uploads idempotently.
Generally we are cleaning up after ourselves because we pop anything to
do with that upload from the session. However there is an edge case: if
you never send the file then the info about the file stays in the
session in perpetuity[2]. This is generally happening when people are
uploading files that are impossible to send, ie ones that have errors.
So this commit makes two changes:
1. remove info about a file upload from the session as soon as we know
that it contains errors
2. `POST` reuploads to the same endpoint as initial uploads because
otherwise we need to keep info about bad uploads in the session,
which would prevent us from doing 1.
1. https://github.com/alphagov/notifications-admin/pull/1968
2. or at least until the session is cleared by the user logging out
We prefer people downloading the agreement if they can. If we don’t know
which agreement they should be using (ie we don’t know their crown
status) then we fall back to having them contact us.
Rather than making users contact us to get the agreement, we should just
let them download it, when we know which version to send them.
This commit adds two endpoints:
- one to serve a page which links to the agreement
- one to serve the agreement itself
These pages are not linked to anywhere because the underlying files
don’t exist yet. So I haven’t bothered putting real content on the page
yet either. I imagine the deploy sequence will be:
1. Upload the files to the buckets in each environment
2. Deploy this code through each enviroment, checking the links work
3. Make another PR to start linking to the endpoints added by this
commit