Usage for all services is a platform admin report that groups letters by
postage. We want it to show `europe` and `rest-of-world` letters under a
single category of `international`, so this updates the query to do
that and to order appropriately.
This will help us better understand how far through the task has got if
it gets interrupted halfway (as was the case this morning and we
struggled to understand).
solves `AttributeError: 'DummySession' object has no attribute 'query'`
if you don't do this you get really hard to diagnose errors in unrelated
tests, due to strange import order problems or something
task takes a brodcast_message_id, and makes a post to the cbc-proxy
for now, hardcode the url to the notify stub. the stub requires template
as the admin/api get it, so use the marshmallow schema to json dump it.
Note - this also required us to tweak the BroadcastMessage.serialize
function so that it converts uuids in to ids - flask's jsonify function
does that for free but requests.post doesn't sadly.
if the request fails (either 4xx or 5xx) just raise an exception and let
it bubble up for now - in the future we'll add retry logic
add `datetime` format (note, not the built-in `date-time`) to our json
schemas. this uses the iso8601 library to try and parse the string.
also, move `strict-rfc3339` and `rfc3987` (used by jsonschema to
validate `date-time` and `uri` formats respectively from test
requirements to regular requirements. if they're not installed,
validation silently succeeds, so validation wouldnt reject anything bad
on prod, only in unit tests.
new blueprint `/service/<id>/broadcast-message` with the following
endpoints:
* GET / - get all broadcast messages for a service
* GET /<id> - get a single broadcast message
* POST / - create a new broadcast message
* POST /<id> - update an existing broadcast message's data
* POST /<id>/status - move a broadcast message to a new status
I've kept the regular data update (eg personalisation, start and end
times) separate from the status update, just to keep separation of
concerns a bit more rigid, especially around who can update.
I've included schemas for the three POSTs, they're pretty
straightforward.
This done so that we do not use statsd on our http endpoint.
We decided we do not need metrics that this gave us. If we
change our minds, we will add Prometheus-friendly decorators
instead in the future.
ALLOWED_PROPERTIES is a list containing fields that it will attempt to
deserialize into the returned object. If a field isn't present on the
underlying data source (whether that's a dump from a marshmallow schema,
a blob retrieved from redis, or anything else), then SerialisedModel
will raise an exception. However, SerialisedModel isn't involved when
setting the cache, so with that said the correct flow for adding a
column to a cached database model:
PR #1
* add new column to DB
* add new column to model, and to template_schema (because that schema
is used to create the dict that goes in to redis). New redis keys
start getting populated.
Deploy that through, and then clear redis
PR #2
* add new column to SerialisedTemplate.ALLOWED_PROPERTIES. this means
that it'll start reading that value from redis
* now instances of the app will always have the new field in their
template objects, whether they came from database directly or from
the cache
This commit removes the field from ALLOWED_PROPERTIES. After it's
deployed we'll be able to clear redis, observe redis being populated
with the new field, and then we'll be able to re-add it to
ALLOWED_PROPERTIES, ready to use.
make it clear that this is for the public api, and we shouldn't add
fields to it without considering impacts
also add the broadcast_messages relationship on service and template to
the exclude from the marshmallow schemas, so it's not included elsewhere
new broadcast_message table. contains a whole bunch of timestamps, a
couple of user ids for who created and who approved, some
personalisation and a template id/version. areas is a jsonb field - it
is expected that this will be an array, for example `["england",
"wales"]`
intended valid state transitions are as follows, from an initial status
of draft
* draft -> pending-approval, rejected
* pending-approval -> broadcasting, rejected, (draft maybe)
* broadcasting -> completed, cancelled
rejected, completed, cancelled and technical-failure are terminating
states.
also copy across the enum switching code from migration 0060 (adding
letter type).
had to go through the code and change a few places where we filter on
template types. i specifically didn't worry about jobs or notifications.
Also, add braodcast_data - a json column that might contain arbitrary
broadcast data that we'll figure out as we go. We don't know what it'll
look like, but it should be returned by the API
This prevents a race condition when we get delivery receipt before
updating notification to sending, and so the sending status would
supersede the delivered status, and the notification would time out
as temporary-failure after three days.
As we gradually move from statsd to prometheus, we change the metric to
be a prometheus metric rather than statsd.
The change worth pointing out is that we have dropped the 'successful'
and 'failed' statuses from the metrics. I don't think it's useful to
have these statuses. It's very rare for an inbound message to fail when
we receive it and when it does, we raise an error and see it in our
logs. We aren't going to be looking at a graph of it as it's a rare
event, not typical behaviour that we want to monitor with a graph.
This was broken because sometimes `service.permissions` is a list of
strings (for when we’re caching the service object) and sometimes it’s a
list of permission objects (when we’re dealing with ORM objects).
Because the validator code is shared, the least-messy way to fix it is
to make sure it can handle both types.
It can’t just take a list of permissions as argument, because it uses
other fields on the service.
It would be messy to rewrite the endpoint to use a serialised service
because the tests all expect to be dealing with database objects, so it
would be a faff to change what they’re mocking.
This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint
This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.
The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.
We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
our logs if we want it
- the response times and response codes per flask endpoint are already
exposed using the gds metrics python library
I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
we should have already removed most calls to StatsD which should
greatly reduce the chance of their being any further issues with
DNS resolution
- Some of the other metrics still provide value so no point deleting
them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
users as they are async and also we do not currently have the
infrastructure set up to replace them with a prometheus HTTP endpoint that
can be scraped so this would require more work
This reverts commit 58a9862cd1.
That commit tried to optimize the fetch template query by
However it had the side effect of making Marshmallow ignore `created_by`
when loading the JSON in the post request. So the field on the model was
never being updated, just copied from the original template.
The quick way of getting things to work again is to revert this
optimisation.
There’s probably some Marshmallow magic we could use to avoid the extra
query and still have created_by not be ignored.
It does mean we’re introducing an extra query, but I’m not too fussed
about that since the caching seems to be working well.
Because the IDs of our callback and inbound SMS APIs were stored in
lists instead of directly on the serialised model they weren’t getting
cast to a string before trying to JSONify them. And JSON doesn’t know
what to do with a UUID object.
For some reason this was only affecting the endpoint for fetching
inbound SMS.
The API needs these to check whether a service can send a notification.
This commit also updates all the tests in `test_validators.py` to take
a serialised service, not a database object.
We changed auth.py to import from app.serialised_models here:
https://github.com/alphagov/notifications-api/pull/2887/files#diff-77cbb1e03185c7319f0311371c438b0cR11
`serialised_models.py` imports from `templates_dao.py`
`templates_dao.py` imports from `users_dao.py`
`users_dao.py` imports from `errors.py`
`errors.py` imports from `auth.py` … and the circle is complete 💥
For some reason this caused the Celery workers to crash on startup, but
not the app. Which I guess is why the integration tests didn’t catch
this?