add `datetime` format (note, not the built-in `date-time`) to our json
schemas. this uses the iso8601 library to try and parse the string.
also, move `strict-rfc3339` and `rfc3987` (used by jsonschema to
validate `date-time` and `uri` formats respectively from test
requirements to regular requirements. if they're not installed,
validation silently succeeds, so validation wouldnt reject anything bad
on prod, only in unit tests.
new blueprint `/service/<id>/broadcast-message` with the following
endpoints:
* GET / - get all broadcast messages for a service
* GET /<id> - get a single broadcast message
* POST / - create a new broadcast message
* POST /<id> - update an existing broadcast message's data
* POST /<id>/status - move a broadcast message to a new status
I've kept the regular data update (eg personalisation, start and end
times) separate from the status update, just to keep separation of
concerns a bit more rigid, especially around who can update.
I've included schemas for the three POSTs, they're pretty
straightforward.
ALLOWED_PROPERTIES is a list containing fields that it will attempt to
deserialize into the returned object. If a field isn't present on the
underlying data source (whether that's a dump from a marshmallow schema,
a blob retrieved from redis, or anything else), then SerialisedModel
will raise an exception. However, SerialisedModel isn't involved when
setting the cache, so with that said the correct flow for adding a
column to a cached database model:
PR #1
* add new column to DB
* add new column to model, and to template_schema (because that schema
is used to create the dict that goes in to redis). New redis keys
start getting populated.
Deploy that through, and then clear redis
PR #2
* add new column to SerialisedTemplate.ALLOWED_PROPERTIES. this means
that it'll start reading that value from redis
* now instances of the app will always have the new field in their
template objects, whether they came from database directly or from
the cache
This commit removes the field from ALLOWED_PROPERTIES. After it's
deployed we'll be able to clear redis, observe redis being populated
with the new field, and then we'll be able to re-add it to
ALLOWED_PROPERTIES, ready to use.
make it clear that this is for the public api, and we shouldn't add
fields to it without considering impacts
also add the broadcast_messages relationship on service and template to
the exclude from the marshmallow schemas, so it's not included elsewhere
new broadcast_message table. contains a whole bunch of timestamps, a
couple of user ids for who created and who approved, some
personalisation and a template id/version. areas is a jsonb field - it
is expected that this will be an array, for example `["england",
"wales"]`
intended valid state transitions are as follows, from an initial status
of draft
* draft -> pending-approval, rejected
* pending-approval -> broadcasting, rejected, (draft maybe)
* broadcasting -> completed, cancelled
rejected, completed, cancelled and technical-failure are terminating
states.
also copy across the enum switching code from migration 0060 (adding
letter type).
had to go through the code and change a few places where we filter on
template types. i specifically didn't worry about jobs or notifications.
Also, add braodcast_data - a json column that might contain arbitrary
broadcast data that we'll figure out as we go. We don't know what it'll
look like, but it should be returned by the API
This prevents a race condition when we get delivery receipt before
updating notification to sending, and so the sending status would
supersede the delivered status, and the notification would time out
as temporary-failure after three days.
As we gradually move from statsd to prometheus, we change the metric to
be a prometheus metric rather than statsd.
The change worth pointing out is that we have dropped the 'successful'
and 'failed' statuses from the metrics. I don't think it's useful to
have these statuses. It's very rare for an inbound message to fail when
we receive it and when it does, we raise an error and see it in our
logs. We aren't going to be looking at a graph of it as it's a rare
event, not typical behaviour that we want to monitor with a graph.
This was broken because sometimes `service.permissions` is a list of
strings (for when we’re caching the service object) and sometimes it’s a
list of permission objects (when we’re dealing with ORM objects).
Because the validator code is shared, the least-messy way to fix it is
to make sure it can handle both types.
It can’t just take a list of permissions as argument, because it uses
other fields on the service.
It would be messy to rewrite the endpoint to use a serialised service
because the tests all expect to be dealing with database objects, so it
would be a faff to change what they’re mocking.
This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint
This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.
The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.
We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
our logs if we want it
- the response times and response codes per flask endpoint are already
exposed using the gds metrics python library
I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
we should have already removed most calls to StatsD which should
greatly reduce the chance of their being any further issues with
DNS resolution
- Some of the other metrics still provide value so no point deleting
them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
users as they are async and also we do not currently have the
infrastructure set up to replace them with a prometheus HTTP endpoint that
can be scraped so this would require more work
This reverts commit 58a9862cd1.
That commit tried to optimize the fetch template query by
However it had the side effect of making Marshmallow ignore `created_by`
when loading the JSON in the post request. So the field on the model was
never being updated, just copied from the original template.
The quick way of getting things to work again is to revert this
optimisation.
There’s probably some Marshmallow magic we could use to avoid the extra
query and still have created_by not be ignored.
It does mean we’re introducing an extra query, but I’m not too fussed
about that since the caching seems to be working well.
Because the IDs of our callback and inbound SMS APIs were stored in
lists instead of directly on the serialised model they weren’t getting
cast to a string before trying to JSONify them. And JSON doesn’t know
what to do with a UUID object.
For some reason this was only affecting the endpoint for fetching
inbound SMS.
normally we check the app's status page to see if migrations need
running. However, if the _status endpoint doesn't respond with 200, we
don't necessarily want to abort the deploy - we may be trying to deploy
a code fix that fixes that status endpoint for example.
We don't know whether to run the migrations or not, so err on the side
of caution by re-running the migration. The migration itself might be
the fix that gets the app working after all.
had to do a little song and dance because sometimes the response won't
be populated before an exception is thrown
The API needs these to check whether a service can send a notification.
This commit also updates all the tests in `test_validators.py` to take
a serialised service, not a database object.
We changed auth.py to import from app.serialised_models here:
https://github.com/alphagov/notifications-api/pull/2887/files#diff-77cbb1e03185c7319f0311371c438b0cR11
`serialised_models.py` imports from `templates_dao.py`
`templates_dao.py` imports from `users_dao.py`
`users_dao.py` imports from `errors.py`
`errors.py` imports from `auth.py` … and the circle is complete 💥
For some reason this caused the Celery workers to crash on startup, but
not the app. Which I guess is why the integration tests didn’t catch
this?