This adds a link to the user profile of the person who requested to go
live for "Request to go live" Zendesk tickets. Viewing a user's profile
page helps us to check for duplicate organisations and services from
that user.
This stat is shown on a few of our pages, such as our homepage,
the performance page and also a platform admin page
and is currently catched for the
default TTL of 1 week. I think there is no reason we can't make
this only cache once an hour and give slightly more up to date stats
which will update more regularly.
This mimics the approach and also the TTL choice of 1 hour that has
been added for the performance page (although there is no
particular bug here to fix, it is just nice to have slightly more up
up to date data).
Note, the API call only takes about 0.3 seconds at the moment
so it is not particularly intensive on the DB to run this more
regularly.
This is a quick additional check to protect the user:
- From getting a CloudFront 502 error if the file takes too
long to upload. I was surprised to find it takes about 1 minute
to upload a 70Mb file to S3.*
- From getting a CloudFront 502 error when we follow the redirect
and run through the slow processing code in utils that builds a
RecipientCSV [1].
For context, a CSV with 100K rows and a few columns is around 5Mb,
so a 10Mb limit should be enough. Analysis over the past week shows
that the vast majority of CSV uploads are actually < 2.5Mb.
I haven't added any tests for this because:
- The check isn't critical, as the worst case scenario is the user
gets a worse error than this in-app one.
- There's no easy way to mock the validation, and I didn't want to
have a test that depends on a 10Mb+ file.
*We're using "key.put" to upload the file, when we could be doing
a multipart upload [2]. However, I tried this myself with a chunk
size of 1000 bytes and found it only led to a marginal improvement.
[1]: https://github.com/alphagov/notifications-utils/pull/930
[2]: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
When we combine simplified polygons from different areas we try to use
the polygons that are defined in metres as a speed optimisation.
This doesn’t work when those polygons are in several different
projections because the distance in metres is relative to the edge of
the projection. Just naively choosing one projection means that some
of the polygons will shift position on the earth by 100s of kilometers
once they are converted back to degrees.
To avoid this we need to:
- check for situations where we have polygons in multiple different
projections
- transforms these polygons back into degrees and let `Polygons` pick a
common projection for all of them before doing any
merging/smoothing/etc.
I came across a bug where the performance page might be visited
on say the 22nd and we expect it to show data for yesterday and
the past 7 days (so the 21st and back 7 days) but instead it
only shows it for the 20th and then back 6 days.
The cause is that we have nightly tasks that run to
calculate the number of notifications sent per service (the
ft_notification_status table)
calculate the number of notifications sent within 10 seconds
(the ft_processing_time table)
Both of these tables get filled between the hours of midnight and
2:30am. The first seems to be filled by about 12:30 every night
whereas the processing stats seems to be filled about 2am every
night.
The admin app currently caches the results of the call to the API
(https://github.com/alphagov/notifications-api/blob/master/app/performance_dashboard/rest.py#L24)
to get this data with the key
performance-stats-{start_date}-to-{end_date}.
Now the issue here is that if the performance page is visited at
00:01 on the 23rd, it will call the API for data on the 22nd and
backwards 7 days. The data it gets at 00:01 will not yet have the
data for the 22nd, it will only go as far as the 21st, because
the overnight jobs to put data in the tables have not run yet. The
admin app then caches this data, meaning that for the rest of the
day the page will be missing data for the 22nd, even though it
becomes available after approx 2am. We have seen it a few times
in production where the performance page has indeed been visited
before say 2 am, leaving the page with missing data.
In terms of solutions, there are several potential options.
1. Remove the caching that we do in redis. The downside of this is
that we will hit the API more (currently once a day for the first
visitor to the performance page). We have had 255 requests
to the performance page in the last week and when there is no value
in redis, the call to the API takes approximately 1-2 seconds.
Removing the caching would against why the caching was originally
added which was to act as a basic protection against malicious
DOS attempts.
2. Make the admin app only set the cache value if it is after
2:30am. This one feels a bit gross and snowflakey.
3. The API flushes the redis cache after it has finished its nightly
tasks. We don't have that much precedent of the API interacting
with redis, it is mostly the admin app that does but this is still
a valid option.
4. Cache in redis the data for 2 days ago to 7 days ago and then
get the data for yesterday freshly every time. This would mean
some things are still cached but the smallest bit that we can't rely
on can be gathered fresh.
5. Remove the caching we do in redis but then try and optimise the
API endpoint to return results even faster and with less
interaction with the DB. My hypothesis is that the slowest part
is where the API has to calculate how many notifications Notify
has ever sent
(https://github.com/alphagov/notifications-api/blob/master/app/performance_dashboard/rest.py#L36)
and so we could potentially try and store these totals in the DB
every night and update them nightly rather than having to query all
the data in the table to figure them out every time.
6. Reduce the expiry time of the cache key so although the bug will
still exist, it will only exist for a much shorter time.
In the end, we've gone for option 6. The user impact of
this bug sticking around for an hour is minimal, in particular
because that would be in the early hours of the morning. The solution
is also quick to implement and keeps the protection against a DOS attack.
The value of 1 hour for expiry was a fairly abitrary choice, it could
have been as little as 1 minute or as much as 6 hours.
This has several advantages:
- It gives us more room to explain the error and actions. This will
be useful for upcoming work we want to do, which will add yet more
validations for CSV uploads.
- We already use a flash to show certain kinds of errors on these
pages (just above). This is more consistent.
- It's potentially more accessible. Previously the error and the
button text used to be read out as a single sentence. Now the page
reloads and reads the flash error alone.
In theory we should show an error in both places, but this can be
confusing on pages where there's only a single form control, and
especially if the error is long.
The original idea behind was to always ask users to re-enter their
password any time:
- we want them to be sure that they want to do what they’re about to do
- we want to be sure it’s really the user trying to do the thing (and
not someone malicious)
In reality we:
- removed this from the initial place it was added (a descendent of the
‘suspend service’ feature)
- only ever added it to the ‘rename service’ feature
So in reality it’s not a pattern we have persisted with. Arguably there
are several things you can now do in the admin app without re-entering
your password which are much more high consequence than changing the
service name.
Also, with browser autofill there’s a lot less chance that forcing
someone to re-enter a password really gives much defence against an
unatteneded laptop, for example.
I also wonder whether we might get people to give better service names
if we make the process of renaming the service less intimidating.
So this commit removes the need to re-enter your password when renaming
a service.
Note that re-naming an organisation still has the same check, but I
haven’t removed that too for the sake of keeping scope of the PR small.
Transforming full resolution polygons to linear coordinates is somewhat
expensive. We can make things run faster by:
- looking at `simple_polygons` which have fewer points and are therefore
faster to transform.
- reusing polygons that have already been transformed where possible
Brings in https://github.com/alphagov/notifications-utils/pull/889/files
At the moment, we are not doing any transformation of features before
applying geometric algorithms to them. This is, in effect, assuming that
the earth is flat.
This new version of utils implements the transformation of our polygons
to a Cartesian plane. In other words, it converts them from being
defined in spherical degrees to metres.
For the admin app this means we need to convert places where the code
expects things to be measured in degrees to work in metres instead.
We think that the API is returning incorrect data for this column.
It’s going to take a while to figure out what’s going on with the
queries in the API, so this pull request temporarily removes the column
so we’ve not giving people incorrect data.
We are starting to see lots of 100.0%s in the current table
and we think this looks suspiciously too good so think it is
beneficial to change it to be 2dp such that we get a few more
non 100.0% values.
This will put all the values in to having 2dp, however it will
also require the API to have a change to
https://github.com/alphagov/notifications-api/blob/master/app/performance_dashboard/rest.py#L81
where it is currently losing the granularity down to a single
decimal place (meaning that if we were really at 99.894% then that
would be shown on the page as 99.90% rather than 99.89%). However,
I don't think it is a blocker that we get that sorted before this
can be merged.
This means that the data in the report will match what’s on the page,
where the values are rounded to the nearest penny.
This uses the same string formatting to round the numbers which the
`big_number` component does, so it should round the numbers in the same
way.
This is so org level users can use this data easier for things
like determining spending per service.
We do not include sms fragments sent column and remove other sms columns
consistency.
Do not add sms fragments sent column for now until we agree on an
unambiguous name for it. The data in this column is sms billing units
multiplied by international sms weighing. My favourite for a clear
name would be 'text message credits used', but we need a naming
strategy for this.
This makes it clearer that this model collection isn’t the organisations
for a user or a service or some other entity, like most model
collections are.
It will also lets us make a separate Organisations model, without the
name conflicting.
This commits adds test coverage for ther HTML in several of the forms
which had broken autofocus.
It means that if we make changes to the HTML which triggers autofocus in
the future it should be more obvious that something is depending on the
attributes being added/removed.
In 674c27a693 we updated the autofocus
Javascript to be compatible with GOV.UK Frontend textboxes, which have
the `data-module` attribute set on the `input` element, rather than on
a wrapper element.
However we still have some `<textarea>`s and `<input>`s which haven’t
moved to GOV.UK Frontend and therefore aren’t getting picked up by the
Javascript which is supposed to focus them.
This commit makes the Javascript work with both kinds of textbox, which
is needed until we move entirely to GOV.UK Frontend.
This moves things out of the parametrization that didn't need to be
in there and also makes the test match the test below,
test_shows_delete_link_for_error_on_post_request_for_edit_email_reply_to_address,
more closely.
https://www.pivotaltracker.com/story/show/180026726
There was a bug where if you enter an invalid email address in
to the edit reply to email address form and click save, the
form you get shown with your error message will always contain
the field to set as default the reply to and also delete. This should
not have been the case. If you make an error on the form when
changing a reply to that is already a default, then you should not
be given the chance to change it to not default, nor should you
be able to delete it.
This commit fixes that bug by making sure the additional form fields
are only shown if the reply to being changed is not the default.
This test is now failing due to the combination of both Werkzeug having
had a major version bump in a previous PR and the Python version
changing. Calling `current_service` is now giving an error when trying
to get the `service` from the `_request_ctx_stack` because
`_request_ctx_stack` has no `service` attribute. There was no error
before, so it appears some underlying behaviour has changed. I've fixed
the test by changing the setup, but there is no effect on production
code - when the app is running `load_service_before_request` is called
before every request.
The current link button for removing an area is
created using a psuedo element with an 'x' as
content.
The inline box for the 'x' overlapped its parent.
This is visible in high contrast mode, breaking the
parent's border. Despite existing in CSS, the 'x'
is also announced by screen readers, which is not
what we want.
This changes it to be an inline SVG with a role of
image. It doesn't require as large an inline box
so doesn't cause visual issues in high contrast
modes. It also means we can set it's label similar to
how you would an image's alt text, giving us
control over what is announced by screen readers.
This commit also includes some extra CSS,
targeting high contrast modes, giving the
link button the following when viewed in those
modes:
- a complete border so it is
distinguishable from the list item
- a focus style