For the public API we actually receive a "name" instead of an ID,
which we also want to start sending from the Admin app.
Unlike IDs, which aren't really used anywhere, we want the names
to display the alerts on gov.uk/alerts.
This is necessary until:
- The Admin app is using the new "areas(_2)" format to store and
retrieve data.
- We've migrated all existing broadcast messages to use the new
format.
Note that "areas" / "ids" isn't actually used for anything except
printing out the PagerDuty message - it's not sent to the proxy [1].
[1]: 6edc6c70aa/app/celery/broadcast_message_tasks.py (L190-L193)
Currently we have:
- An "areas" column in the DB that stores a JSON blob.
- An "areas" field inside the "areas" JSON that stores area IDs.
- Each field has to be manually copied into the JSON column.
We want to move to:
- An "areas" column in the DB (unchanged).
- An "ids" field inside the "areas" JSON (to replace "areas").
- The Admin app sending other data inside an "areas" JSON field.
The API design for areas is confusing and difficult to extend.
Here we duplicate the current API functionality using an "areas_2"
field. Once the Admin app is using this field, we'll be able to
rename it to just "areas", which is where we want to get to.
In the next commits we'll build on this to support the migration
from "areas"."areas" to "areas"."ids".
This is a temporary feature to make it easy to migrate the format
of the "areas" column and backfill extra data for it.
It's not possible to use this feature to update the status of an
old broadcast message, so the risk from this override is minimal.
If a polygon is smaller than the largest polygon in our dataset of
simplified polygons then we’re only throwing away useful detail by
simplifying it.
We should still simplify larger polygons as a fallback, to avoid sending
anything to the CBC that we’re not sure it will like.
The thresholds here are low: we can raise them as we test and experiment
more.
Here’s some data about the Flood Warning Service polygons
Percentile | 80% | 90% | 95% | 98% | 99% | 99.9%
-----------|-----|-------|--------|---------|---------|---------
Point count| 226 | 401.9 | 640.45 | 1015.38 | 1389.07 | 3008.609
Percentile | 80% | 90% | 95% | 98% | 99% | 99.9%
--------------|-----|-------|--------|---------|---------|---------
Polygon count |2----|3------|5-------|8--------|10-------|40.469
This new version of utils implements the transformation of our polygons
to a Cartesian plane. In other words, it converts them from being
defined in spherical degrees to metres.
For the API this means our simplification will be slightly more
accurate.
Regardless of channel.
Do not include:
- broadcasts older than 25.05.2021
- stubbed broadcasts
- broadcasts that were not transmitted. So only broadcasting,
cancelled and completed make the list;
This is the original behaviour [1]. Since all internal requests will
have corresponding logs from public-facing apps that are making them,
there's little value in logging them.
Logging internal requests doesn't lead to a significant increase in
our overall log ingestion: a rough estimate is its an extra 5000 logs
per minute, out of about 900K per minute.
[1]: e08d726f05/app/authentication/auth.py (L153)
The rest of the tests need to construct the header directly so they
can pass custom tokens. But for the three tests that actually make
a request to prove the auth functions work as wrappers, we can use
the same factory functions we use everywhere else in the tests.
While "key" is the term used by the JWT library, all the rest of
our code - the ApiKey model, the Python client - all use the term
"secret" instead. Although "secret" is less precise than "key", it
does help avoid confusion with (api) key (as a model object).
We can define the API properly in future work. I've used a separate
blueprint from "broadcasts" since this API is purely internal, and
it's helpful to make it clear it's specific to govuk-alerts.
This adds previously missing tests and changes the existing ones
to test the "requires_internal_auth" function directly. In order
to make the tests generic, we have a fake auth function and an
associated fixture.
Having generic tests for internal auth will make it easier to add
other "requires..." functions in future.
This makes the tests easier to read by avoiding request boilerplate
and making a clearer link between the name of the test and that we
are actually testing that specific function.
This is a lot of code to check we haven't written a single line,
which we can just visually see isn't there. We should avoid having
tests that check code _isn't_ there, as such testing is infinite.
We can simplify the rest of the tests to avoid the boilerplate of
making an actual request. But it's worth keeping these two to prove
the wrapper work correctly for an arbitrary route.
Previously we had a lot of duplicate tests inconsistently checking
each of the "requires_" functions. Since both of them now use the
same "_decode_jwt_token" helper, we can consolidate all the tests
onto that. In future commits we'll look at testing the top-level
functions in terms of what they do specifically.
This switches to testing the two functions directly as trying to
test them through the top-level "requires_..." functions or calls
to endpoints doesn't scale as we add more of them.
While this has a slight risk that a "requires_..." function might
not be using these helpers, it seems unlikely and we can always
add a mock to check this if we're concerned in future.
Previously "requires_auth" and "requires_admin_auth" had similar
but different ways of checking their keys. This switches them to
use the same checks, with the admin / internal auth passing in a
fake / stub set of "api keys" to check.
Pulling out the logic this way will make it easier to unpick the
tests, so we can focus on testing what's unique to each kind of
API auth and avoid future duplication when we start calling the
"requires_internal_auth" method with other client_ids.
Note that a couple of error messages / response codes have changed
for admin / internal auth. None of these occur in practice, so we
can make them consistent with the behaviour for the public API.
Previously this was heavily duplicated but with the odd test using
a __create_token method. This adds some fixtures to remove all the
boilerplate and standardise how tokens are created in each test.
This is to see if the worker requires slightly more memory than it has access to or to determine if there is a memory leak somewhere in the code that needs to be further investigated.
This comes at the heels of yesterday's issue that we could not process the CSVs the users uploaded, where the memory graph for this worker showed that it was using almost all of its available memory, so a redeploy fixed the problem.
Previously we just had a single array of API keys / secrets, any of
which could be used to get past the "requires_admin_auth" check.
While multiple keys are necessary to allow for rotation, we should
avoid giving other apps access this way (too much privilege).
This converts the existing config vars into a new dictionary, keyed
by client_id. We can then use the dictionary to scope auth for new
API consumers like gov.uk/alerts to just the endpoints they need to
access, while maintaining existing access for the Admin app.
Once the new dictionary is available as a JSON environment variable,
we'll be able to remove the old credentials / config. In the next
commits, we'll look at more tests for the new functionality.
that is already cancelled vs when it is too late to
cancel letter vs when we don't know what's the cause
of failure.
This is so we could show useful error messages to the users
and also for better debugging.
On a regular Notify service anyone with permission can create an API
key. If this service then is given permission to send emergency alerts
it will have an API key which can create emergency alerts. This feels
dangerous.
Secondly, if a service which legitimately has an API key for sending
alerts in training mode is changed to live mode you now have an API key
which people will think isn’t going to create a real alert but actually
will. This feels really dangerous.
Neither of these scenarios are things we should be doing, but having
them possible still makes me feel uncomfortable.
This commit revokes all API keys for a service when its broadcast
settings change, same way we remove all permissions for its users.
Since the expiry is sent as part of the message payload, we don't
need to invoke the CBC proxies (and indeed there's no way to do so
for an expired alert). In future we plan to extend this task so it
triggers the regeneration of content on gov.uk/alerts.
It's worth noting that 'finishes_at' can theoretically be None, in
which case it's unclear when the alert should expire. While alerts
from the Admin app should always have an expiry [1], we have many
in the DB that don't, so it's worth checking for this scenario.
[1]: 078ac10c8d/app/models/broadcast_message.py (L255)