This was broken because sometimes `service.permissions` is a list of
strings (for when we’re caching the service object) and sometimes it’s a
list of permission objects (when we’re dealing with ORM objects).
Because the validator code is shared, the least-messy way to fix it is
to make sure it can handle both types.
It can’t just take a list of permissions as argument, because it uses
other fields on the service.
It would be messy to rewrite the endpoint to use a serialised service
because the tests all expect to be dealing with database objects, so it
would be a faff to change what they’re mocking.
This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint
This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.
The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.
We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
our logs if we want it
- the response times and response codes per flask endpoint are already
exposed using the gds metrics python library
I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
we should have already removed most calls to StatsD which should
greatly reduce the chance of their being any further issues with
DNS resolution
- Some of the other metrics still provide value so no point deleting
them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
users as they are async and also we do not currently have the
infrastructure set up to replace them with a prometheus HTTP endpoint that
can be scraped so this would require more work
This reverts commit 58a9862cd1.
That commit tried to optimize the fetch template query by
However it had the side effect of making Marshmallow ignore `created_by`
when loading the JSON in the post request. So the field on the model was
never being updated, just copied from the original template.
The quick way of getting things to work again is to revert this
optimisation.
There’s probably some Marshmallow magic we could use to avoid the extra
query and still have created_by not be ignored.
It does mean we’re introducing an extra query, but I’m not too fussed
about that since the caching seems to be working well.
Because the IDs of our callback and inbound SMS APIs were stored in
lists instead of directly on the serialised model they weren’t getting
cast to a string before trying to JSONify them. And JSON doesn’t know
what to do with a UUID object.
For some reason this was only affecting the endpoint for fetching
inbound SMS.
The API needs these to check whether a service can send a notification.
This commit also updates all the tests in `test_validators.py` to take
a serialised service, not a database object.
We changed auth.py to import from app.serialised_models here:
https://github.com/alphagov/notifications-api/pull/2887/files#diff-77cbb1e03185c7319f0311371c438b0cR11
`serialised_models.py` imports from `templates_dao.py`
`templates_dao.py` imports from `users_dao.py`
`users_dao.py` imports from `errors.py`
`errors.py` imports from `auth.py` … and the circle is complete 💥
For some reason this caused the Celery workers to crash on startup, but
not the app. Which I guess is why the integration tests didn’t catch
this?
Years ago we started to implement a way to schedule a notification. We hit a problem but we never came up with a good solution and the feature never made it back to the top of the priority list.
This PR removes the code for scheduled_for. There will be another PR to drop the scheduled_notifications table and remove the schedule_notifications service permission
Unfortunately, I don't think we can remove the `scheduled_for` attribute from the notification.serialized method because out clients might fail if something is missing. For now I have left it in but defaulted the value to None.
Same as we’re doing for templates.
This means avoiding a database call, even for services that don’t hit
our API so often.
They’ll still need to go to the database for the API keys, because we’re
not comfortable putting the API key secrets in Redis.
But once a service has got its keys from the database we commit the
transaction, so the connection can be freed up until we need it again to
insert the notification.
Same as we’ve done for templates.
For high volume services this should mean avoiding calls to external
services, either the database or Redis.
TTL is set to 2 seconds, so that’s the maximum time it will take for
revoking an API key or renaming a service to propagate.
Some of the tests created services with the same service ID. This
caused intermittent failures because the cache relies on unique service
IDs (like we have in the real world) to key itself.
We think that holding open database transactions while we go and do
something else is causing us to have poor performance.
Because we’re not serialising everything as soon as we pull it out of
the database we can guarantee that we don’t need to go back to the
database again.
So let’s see if explicitly closing the transaction helps with
performance.
By serialising these straight away we can:
- not go back to the database later, potentially closing the connection
sooner
- potentially cache the serialised data, meaning we don’t touch the
database at all
For some reason our V1 get template response wraps the whole template in
a dictionary with one key, `'data'`:
0d99033889/app/template/rest.py (L166)
That means when the admin app caches the response it also caches it in
this format.
The API needs to do the same, otherwise it will be cacheing data with a
schema that the admin app isn’t expecting, and vice-versa.
We need to serialise the template to JSON to store it in Redis. Python’s
built in JSON serialiser doesn’t know what to do with a UUID object, so
we need to manually cast it to a string instead.