Previously we were passing a flag to the API which handled this. Now
we are doing it at the time of clicking the link, not at the time of
storing the new password. We don’t need to update the timestamp twice,
so this commit removes the code which tells the API to do it.
When someone uses a fresh password reset link they have proved that they
have access to their inbox.
At the moment, when revalidating a user’s email address we wait until
after they’ve put in the 2FA code before updating the timestamp which
records when they last validated their email address[1].
We can’t think of a good reason that we need the extra assurance of a
valid 2FA code to assert that the user has access to their email –
they’ve done that just by clicking the link. When the user clicks the
link we already update their failed login count before they 2fa. Think
it makes sense to handle `email_access_validated_at` then too.
As a bonus, the functional tests never go as far as getting a 2FA code
after a password reset[2], so the functional test user never gets its
timestamp updated. This causes the functional tests start failing after
90 days. By moving the update to this point we ensure that the
functional tests will keep passing indefinitely.
1. This code in the API (91542ad33e/app/dao/users_dao.py (L131))
which is called by this code in the admin app (9ba37249a4/app/utils/login.py (L26))
2. 5837eb01dc/tests/functional/preview_and_dev/test_email_auth.py (L43-L46)
Theoretically the maximum expiry time of a broadcast should be 24 hours.
If it goes over 24 hours there can be problems.
However we want to make it more conservative to mitigate two potential
issues:
1. The CBC has a repetition period (eg 60 seconds) and a count (eg
1,440). If these were slightly innaccurate or generous it could take
us over 24 hours. For this reason we should give ourselves half an
hour of buffer.
2. It’s possibly that the CBC could interpret a UTC time as BST or vice
versa. Until we’re sure that it’s using UTC everywhere, we need to
remove another whole hour as buffer.
In total this means we remove 1 hour 30 minutes from 24 hours, giving an
expiry time of 22 hours 30 minutes.
While testing alerts on these channels the MNOs sometimes need to
restart their CBCs to make sure everything is failing over properly.
If the CBC does not come back up, for whatever reason, then we are left
in a state where the alert can’t be cancelled.
To minimise the impact to the public in this scenario we should keep the
expiry time at 4 hours for alerts sent on test channels. We recently
increased it back up to 24 hours for all channels, so this in effect is
reverting that change for channels that won’t be used in a real
emergency.
The page which shows the count of phones does some logic based on how
close the ‘will get’ and ‘likely to get’ numbers are. This means it
accesses the `BroadcastMessage.count_of_phones` and
`BroadcastMessage.count_of_phones_likely` properties multiple times.
These properties are computed fresh every time, and are quite expensive
to compute. By caching them in memory we can cut the page load time
approximately in half.
This renames the two functions we have to translate between UI and
DB permissions, as well as some of their associated variables to
make it clearer which kind of permission they contain.
We don't use this term consistently and it's not defined anywhere.
Since most of the Admin app deals with user-facing permssions, it's
OK to just use the term "permissions". Where both types of permission
are present in the same file, we can more clearly distinguish them
as "UI permissions" and "DB permissions".
At the moment we say that you either ‘add’ an alert or ‘send’ it.
This is confusing because:
- an alert isn’t received on people’s phones until it’s approved, so
this is really when it is ‘sent’ conceptually
- an alert can be rejected before anyone receives it, so the UI can say
an alert that no-one ever received was sent
This commit re-labels things so that the the first part of the process
is ‘creating’ the alert.
This makes all the permissions nice and distinct from each other. Adding
templates and adding alerts feel conceptually quite different things
(what are you adding the alert to?).
It will likely be the same people who have permission to create alerts
and edit templates (maybe someone in a comms role).
But combining the two permissions makes the options presented in the
form feel clunky because ‘alerts’ and ‘templates’ are conceptually quite
different.
So I think it makes sense to keep the templates permission the same as
it is for regular Notify services.
This is useful information to store for the event, which would be
lost if someone subsequently changed them.
Rather than updating lots of mock assertions, I've replaced them
with a single test / assert at a lower level, which is consistent
with auditing being a non-critical function.
I've used the term "admin_roles" in the event data to try and show
that these are not the permissions we store in the DB. This is the
name we use for the abstracted form of permissions in the Admin app.
While we could store the DB permissions, that would be a bit more
effort and arguably it's clearer to keep the event data consistent
with the options the user actually saw / chose.
Usually we have imports at the top. It looks like the reason for
them being inline was to avoid a circular import, but we can also
avoid this by not importing everything from the app module.
Since we're about to add more imports from event_handlers, now is
a good time to refactor them. Note this matches how we import the
event handlers in every other module.
We've added new broadcast roles in the database (`create_broadcasts` and
`approve_broadcasts`).
Adding these has meant we've needed to do a bit of a rewrite of the roles and
permissions code since this had been based on the assumption that each
database permission only belongs to one admin role - this is no longer true.
This means that flipping the roles dict round to create a dict which
contains database permissions as the keys is no longer possible. We can't
necessarily tell which admin role someone has given a database permission.
To check if a user has an admin role given a list of database permissions,
the user must now have ALL the database permissions mapped to that role
(instead of just one). This works because no one has the `manage_users`
permission without also having the `manage_settings` (and similar for
the other admin roles which map to multiple database permissions).
Some test data was changed because it was using admin roles where
database permissions are actually used when the app is running. I've kept
the functionality of the `translate_permissions_from_db_to_admin_roles`
function passing through any unknown roles it is passed as an argument.
This is not necessary, so can be changed later if we decide it will not
ever be used. However, removing it would require updating a lot of
tests since the tests rely on this behaviour.
Added two new permissions - `create_broadcasts` and
`approve_broadcasts`. These new permissions get added to the
`has_permissions` decorator of the broadcast routes to allow the routes
to be accessed with either the old permissions on the new ones while we
switch over.
We were using the `send_messages` permission for the broadcast routes.
By having two new permissions we can allow a more granular control of
these routes.
This allows us to roll out the feature to other users. Note that
the flag is also "True" if the user has "webauthn_auth" as their
auth type, so this is compatible with the more fine-grained check
we have on the authentication parts of the feature. We could do a
more explicit "can_use_webauthn or webauthn_auth" check here, but
the idea is that we'll be able to get rid of this flag eventually,
so I've optimised for brevity instead.
I've modified a couple of the unhappy-path tests to make it more
explicit that the flag is false, since it can be true for Platform
Admins and "normal users" alike.
Our current assumption is that the bleed area has the same population
density as the broadcast area.
This is particularly naïve when:
- the bleed area overlaps the sea – no-one lives in the sea
- the broadcast area is a village and the bleed area is the surrounding
countryside
- the broadcast area is adjacent to a densely populated area like a city
We can be smarter about this now that we have a way of determining the
number of phones in an arbitrary area, based on the known areas that we
have population data about.
Calculating the population in an overlap is a slightly more intensive
calculation. So we only doing it for areas which are smaller enough that
it doesn’t slow things down too much. For larger areas we still use the
more naïve algorithm.
We signal that we're mid-way through the sign-in flow by adding a
`user_details` dict to the session.
previously, we'd only put a user's details in the session in `User.sign_in`,
just before sending any 2fa prompt and redirecting to the two factor
pages.
However, we found a bug where a user with no session (eg, using a fresh
browser) tried to log in, but they had never clicked the link to
validate their email address when registering. Their user's state was
still in "pending", so we redirected to `main.resend_email_verification`
as intended - however, they didn't have anything in the session and the
resend page expected to get the email address to resend to out of that.
To be safe, as soon as we've confirmed the user has entered their
password correctly, lets save the session data at that point. That way
any redirects will be fine.
Previously this was duplicated between the "two_factor" and the
"webauthn" views, and required more test setup. This DRYs up the
check and tests it once, using mocks to simplify the view tests.
As part of DRYing up the check into a util module, I've also moved
the "is_less_than_days_ago" function it uses.
Previously this was hidden away in an anonymous __init__.py file.
I did think about splitting the models into individual files, like
we do with the top-level models for the app. Since the models are
only imported in one place - i.e. are all used together - it didn't
seem worth the hassle, so I've kept them in one file.
This saves a bit of repetition, and lets us attach other methods to the
collection, rather than having multiple methods on the user object
prefixed with the same name, or random functions floating about.
_complete_webauthn_authentication -> _verify_webauthn_authentication
This function just does verification of the actual auth process -
checking the challenge is correct, the signature matches the public key
we have stored in our database, etc.
verify_webauthn_login -> _complete_webauthn_login_attempt
This function doesn't do any actual verification, we've already verified
the user is who they say they are (or not), it's about marking the
attempt, either unsuccessful (we bump the failed_login_count in the db)
or successful (we set the logged_in_at and current_session_id in the
db).
This change also informs changes to the names of methods on the user
model and in user_api_client.
the js `fetch` function will follow redirects blindly and return you the
final 200 response. when there's an error, we don't want to go anywhere,
and we want to use the flask `flash` functionality to pop up an error
page (the likely reason for seeing this is using a yubikey that isn't
associated with your user). using `flash` and then
`window.location.reload()` handles this fine.
However, when the user does log in succesfully we need to properly log
them in - this includes:
* checking their account isn't over the max login count
* resetting failed login count to 0 if not
* setting a new session id in the database (so other browser windows are
logged out)
* checking if they need to revalidate their email access (every 90 days)
* clearing old user out of the cache
This code all happens in the ajax function rather than being in a
separate redirect, so that you can't just navigate to the login flow. I
wasn't able to unit test that function due how it uses the session and
other flask globals, so moved the auth into its own function so it's
easy to stub out all that CBOR nonsense.
TODO: We still need to pass any `next` URLs through the chain from login
page all the way through the javascript AJAX calls and redirects to the
log_in_user function
Previously we would raise a 500 error in a variety of cases:
- If a second key was being registered simultaneously (e.g. in a
separate tab), which means the registration state could be missing
after the first registration completes. That smells like an attack.
- If the server-side verification failed e.g. origin verification,
challenge verification, etc. The library seems to use 'ValueError'
for all such errors [1] (after auditing its 'raise' statements, and
excluding AttestationError [2], since we're not doing that).
- If a key is used that attempts to sign with an unsupported
algorithm. This would normally raise a NotImplemented error as part
of verifying attestation [3], but we don't do that, so we need to
verify the algorithm is supported by the library manually.
This adds error handling to return a 400 response and error message
in these cases, since the error is not unexpected (i.e. not a 500).
A 400 seems more appropriate than a 403, since in many cases it's
not clear if the request data is valid.
I've used CBOR for the transport encoding, to match the successful
request / response encoding. Note that the ordering of then/catch
matters in JS - we don't want to catch our own throws!
[1]: 142587b3e6/fido2/server.py (L255)
[2]: c42d9628a4/fido2/attestation/base.py (L39)
[3]: c42d9628a4/fido2/cose.py (L92)
This links up the `get_webauthn_credentials_for_user` and
`create_webauthn_credential_for_user` methods of the user api client to
notifications-api.
To send data to the API we need strings to be unicode, so we call
decode('utf-8') on base64 objects.
Co-authored-by: Leo Hemsted <leo.hemsted@digital.cabinet-office.gov.uk>
This adds Yubico's FIDO2 library and two APIs for working with the
"navigator.credentials.create()" function in JavaScript. The GET
API uses the library to generate options for the "create()" function,
and the POST API decodes and verifies the resulting credential. While
the options and response are dict-like, CBOR is necessary to encode
some of the byte-level values, which can't be represented in JSON.
Much of the code here is based on the Yubico library example [1][2].
Implementation notes:
- There are definitely better ways to alert the user about failure, but
window.alert() will do for the time being. Using location.reload() is
also a bit jarring if the page scrolls, but not a major issue.
- Ideally we would use window.fetch() to do AJAX calls, but we don't
have a polyfill for this, and we use $.ajax() elsewhere [3]. We need
to do a few weird tricks [6] to stop jQuery trashing the data.
- The FIDO2 server doesn't serve web requests; it's just a "server" in
the sense of WebAuthn terminology. It lives in its own module, since it
needs to be initialised with the app / config.
- $.ajax returns a promise-like object. Although we've used ".fail()"
elsewhere [3], I couldn't find a stub object that supports it, so I've
gone for ".catch()", and used a Promise stub object in tests.
- WebAuthn only works over HTTPS, but there's an exception for "localhost"
[4]. However, the library is a bit too strict [5], so we have to disable
origin verification to avoid needing HTTPS for dev work.
[1]: c42d9628a4/examples/server/server.py
[2]: c42d9628a4/examples/server/static/register.html
[3]: 91453d3639/app/assets/javascripts/updateContent.js (L33)
[4]: https://stackoverflow.com/questions/55971593/navigator-credentials-is-null-on-local-server
[5]: c42d9628a4/fido2/rpid.py (L69)
[6]: https://stackoverflow.com/questions/12394622/does-jquery-ajax-or-load-allow-for-responsetype-arraybuffer
When showing what auth type user uses to sign in, add a text
for users with webauthn.
On password change or sign in, throw not implemented error
if user uses webauthn auth.
This adds a new platform admin settings row, leading a page which
shows any existing keys and allows a new one to be registered. Until
the APIs for this are implemented, the user API client just returns
some stubbed data for manual testing.
This also includes a basic JavaScript module to do the main work of
registering a new authenticator, to be implemented in the next commits.
Some more minor notes:
- Setting the headings in the mapping_table is necessary to get the
horizontal rule along the top (to match the design).
- Setting caption to False in the mapping_table is necessary to stop
an extra margin appearing at the top.
The code for this page was making assumptions about properties which
aren’t present on rejected broadcasts.
This commit accounts for those properties and presents the relevant
elements on the page.
A commit was added:
600e3affc1
In it, it falls back to the string 'Unknown' for actions done by those
not belonging to the service.
This commit changes the behaviour such that if the user is not in the
list of active users for a service, it will go get the user from the DB
(or redis). This should be fine to do as redis will protect us from most
calls as most of these cases are for platform admins.
This will mean we can now see which user platform admin put a service
live rather than seeing 'Unknown'.
We don’t vary this between different environments so it doesn’t need to
be in the config.
I was trying to look up what this value was and found it a bit confusing
that it was spread across multiple places.
There are basically two kinds of 4G masts:
Frequency | Range | Bandwidth
----------|-------------|----------------------------------
800MHz | Long (500m) | Low (can handle a bit of traffic)
1800Mhz | Short (5km) | High (can handle lots of traffic)
The 1800Mhz masts are better in terms of how much traffic they can
handle and how fast a connection they provide. But because they have
quite short range, it’s only economical to install them in very built up
areas†.
In more rural areas the 800MHz masts are better because they cover a
wider area, and have enough bandwidth for the lower population density.
The net effect of this is that cell broadcasts in rural areas are likely
to bleed further, because the masts they are being broadcast from are
less precise.
We can use population density as a proxy for how likely it is to be
covered by 1800Mhz masts, and therefore how much bleed we should expect.
So this commit varies the amount of bleed shown based on the population
density.
I came up with the formula based on 3 fixed points:
- The most remote areas (for example the Scottish Highlands) should have
the highest average bleed, estimated at 5km
- An town, like Crewe, should have about the same bleed as we were
estimating before (1.5km) – Pete D thinks this is about right based on
his knowledge of the area around his office in Crewe
- The most built up areas, like London boroughs, could have as little as
500m of bleed
Based on these three figures I came up with the following formula, which
roughly gives the right bleed distance (`b`) for each of their population
densities (`d`):
```
b = 5900 - (log10(d) × 1_250)
```
Plotted on a curve it looks like this:
This is based on averages – remember that the UI shows where is _likely_
to receive the alert, based on bleed, not where it’s _possible_ to
receive the alert.
Here’s what it looks like on the map:
---
†There are some additional subtleties which make this not strictly true:
- The 800Mhz masts are also used in built up areas to fill in the gaps
between the areas covered by the 1800Mhz masts
- Switching between masts is inefficient, so if you’re moving fast
through a built up area (for example on a train) your phone will only
use the 800MHz masts so that you have to handoff from one mast to
another less often
the invited_user objects can be arbitrarily large, and when we put them
in the session we risk going over the session cookie's 4kb size limit.
since https://github.com/alphagov/notifications-admin/pull/3827 was
merged, we store the user id in the session. Now that's been live for a
day or two we can safely stop putting the rich object in the session.
Needed to change a bunch of tests for this to make sure appropriate
mocks were set. Also some tests were accidentally re-using fake_uuid.
Still pop the object when cleaning up sessions. We'll need to remove
that in a future PR.