Commit Graph

3485 Commits

Author SHA1 Message Date
Pea Tyczynska
efdaadbdf4 Do not update notification to sending if the status is already final
This prevents a race condition when we get delivery receipt before
updating notification to sending, and so the sending status would
supersede the delivered status, and the notification would time out
as temporary-failure after three days.
2020-07-03 17:19:03 +01:00
David McDonald
7e4acda633 Merge pull request #2910 from alphagov/inbound-sms-metric
Add prometheus client metric for number of inbound text messages
2020-06-30 11:49:57 +01:00
David McDonald
1e253b2257 Add prometheus client metric for number of inbound text messages
As we gradually move from statsd to prometheus, we change the metric to
be a prometheus metric rather than statsd.

The change worth pointing out is that we have dropped the 'successful'
and 'failed' statuses from the metrics. I don't think it's useful to
have these statuses. It's very rare for an inbound message to fail when
we receive it and when it does, we raise an error and see it in our
logs. We aren't going to be looking at a graph of it as it's a rare
event, not typical behaviour that we want to monitor with a graph.
2020-06-30 10:13:55 +01:00
Chris Hill-Scott
5ef9ad2953 Fix sending one-off international text messages
This was broken because sometimes `service.permissions` is a list of
strings (for when we’re caching the service object) and sometimes it’s a
list of permission objects (when we’re dealing with ORM objects).

Because the validator code is shared, the least-messy way to fix it is
to make sure it can handle both types.

It can’t just take a list of permissions as argument, because it uses
other fields on the service.

It would be messy to rewrite the endpoint to use a serialised service
because the tests all expect to be dealing with database objects, so it
would be a faff to change what they’re mocking.
2020-06-29 14:43:33 +01:00
Chris Hill-Scott
40f09097c8 Merge pull request #2908 from alphagov/fix-template-history-created-by
Fix template history created by
2020-06-29 13:12:44 +01:00
David McDonald
12f460adc5 Turn off statsd wrapper for synchronous statsd calls during POSTs
This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint

This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.

The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.

We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
  our logs if we want it
- the response times and response codes per flask endpoint are already
  exposed using the gds metrics python library

I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
  we should have already removed most calls to StatsD which should
  greatly reduce the chance of their being any further issues with
  DNS resolution
- Some of the other metrics still provide value so no point deleting
  them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
  users as they are async and also we do not currently have the
  infrastructure set up to replace them with a prometheus HTTP endpoint that
  can be scraped so this would require more work
2020-06-29 12:40:22 +01:00
Chris Hill-Scott
02e5ff61a4 Test that template history updates created by 2020-06-29 12:30:08 +01:00
Chris Hill-Scott
8def7d0d3b Fix serialisation of callbacks
Because the IDs of our callback and inbound SMS APIs were stored in
lists instead of directly on the serialised model they weren’t getting
cast to a string before trying to JSONify them. And JSON doesn’t know
what to do with a UUID object.

For some reason this was only affecting the endpoint for fetching
inbound SMS.
2020-06-26 16:31:49 +01:00
Chris Hill-Scott
9f41e77bf7 Add rate_limit and message_limit to SerialisedService
The API needs these to check whether a service can send a notification.
This commit also updates all the tests in `test_validators.py` to take
a serialised service, not a database object.
2020-06-26 14:10:25 +01:00
Chris Hill-Scott
3ffdb3093b Revert "Revert "Merge pull request #2887 from alphagov/cache-the-serialised-things""
This reverts commit 7e85e37e1d.
2020-06-26 14:10:12 +01:00
Chris Hill-Scott
59aba018bd Ensure rate limit is in serialised service
Once we start using the serialised service to power the `POST`
notifications endpoint it needs to include rate limit to do the rate
limit checks.
2020-06-26 13:46:32 +01:00
Chris Hill-Scott
7e85e37e1d Revert "Merge pull request #2887 from alphagov/cache-the-serialised-things"
This reverts commit b8c2c6b291, reversing
changes made to 351aca2c5a.
2020-06-26 13:42:44 +01:00
Chris Hill-Scott
b8c2c6b291 Merge pull request #2887 from alphagov/cache-the-serialised-things
Serialise and cache services and API keys
2020-06-26 09:18:45 +01:00
Rebecca Law
ce32e577b7 Remove the use of schedule_for in post_notifications.
Years ago we started to implement a way to schedule a notification. We hit a problem but we never came up with a good solution and the feature never made it back to the top of the priority list.

This PR removes the code for scheduled_for. There will be another PR to drop the scheduled_notifications table and remove the schedule_notifications service permission

Unfortunately, I don't think we can remove the `scheduled_for` attribute from the notification.serialized method because out clients might fail if something is missing. For now I have left it in but defaulted the value to None.
2020-06-24 14:54:40 +01:00
Chris Hill-Scott
d16d06fdef Cache serialised services in Redis
Same as we’re doing for templates.

This means avoiding a database call, even for services that don’t hit
our API so often.

They’ll still need to go to the database for the API keys, because we’re
not comfortable putting the API key secrets in Redis.

But once a service has got its keys from the database we commit the
transaction, so the connection can be freed up until we need it again to
insert the notification.
2020-06-24 08:52:12 +01:00
Chris Hill-Scott
6a9818b5fd Cache services and API keys in memory
Same as we’ve done for templates.

For high volume services this should mean avoiding calls to external
services, either the database or Redis.

TTL is set to 2 seconds, so that’s the maximum time it will take for
revoking an API key or renaming a service to propagate.

Some of the tests created services with the same service ID. This
caused intermittent failures because the cache relies on unique service
IDs (like we have in the real world) to key itself.
2020-06-24 08:46:13 +01:00
Chris Hill-Scott
320bca70f7 Serialise service, API keys and permissions
By serialising these straight away we can:
- not go back to the database later, potentially closing the connection
  sooner
- potentially cache the serialised data, meaning we don’t touch the
  database at all
2020-06-23 16:00:41 +01:00
Chris Hill-Scott
5ae3b0fc64 Ensure templates are cached with correct schema
For some reason our V1 get template response wraps the whole template in
a dictionary with one key, `'data'`:
0d99033889/app/template/rest.py (L166)

That means when the admin app caches the response it also caches it in
this format.

The API needs to do the same, otherwise it will be cacheing data with a
schema that the admin app isn’t expecting, and vice-versa.
2020-06-23 15:09:38 +01:00
Chris Hill-Scott
af1b021dbe Add test for when template is found in Redis
Ensures that we’re not calling the dao method when it is
2020-06-23 14:01:20 +01:00
Chris Hill-Scott
cb4b809131 Add extra assertion
To be crystal clear 💎
2020-06-23 13:48:55 +01:00
Chris Hill-Scott
996800f90d Cache serialised template in Redis 2020-06-23 10:03:27 +01:00
Chris Hill-Scott
87afc439fe Cache serialised template in memory 2020-06-23 10:03:26 +01:00
Chris Hill-Scott
14e7b6b87e Merge pull request #2880 from alphagov/serialise-less-stuff-templates
Exclude unneeded fields from all templates response
2020-06-23 09:07:30 +01:00
Chris Hill-Scott
718f8ccad0 Remove content and subject from get all templates
Content and subject are user-submitted so are effectively unbounded in
size. And we’re serialising them for every template when sending the
list of templates to the admin app.

For the service with the most templates this results in a 1.3Mb blob of
JSON going over the wire, and being cached in Redis.

And then the admin app completely ignores these fields, because it does
show template content until you’ve clicked into a single template.

This commit adds a new query parameter, `detailed`, that the admin app
can set to `False`. When it does only the fields needed to render the
`/templates` page are returned.

This is done with a new parameter so as not to break the V1 API.
Although I looked in Kibana and it doesn’t seem like anyone external is
using this endpoint we’ve come this far without breaking the API so…
2020-06-22 13:08:41 +01:00
Chris Hill-Scott
5a2f2a9ec2 Rename JSONModel to SerialisedModel 2/2
This class doesn’t actually wrap JSON, it wraps serialised data.

So this name feels better.
2020-06-22 10:20:53 +01:00
Chris Hill-Scott
e6b7e0e16c Rename JSONModel to SerialisedModel 1/2
This class doesn’t actually wrap JSON, it wraps serialised data.

So this name feels better.

This commit only renames the file for an easier diff.
2020-06-22 10:20:53 +01:00
Chris Hill-Scott
608812d314 Don’t store the underlying dict
This will give us smaller objects to cache, and forces us to be explicit
about which properties we’re using.
2020-06-22 10:20:52 +01:00
Chris Hill-Scott
ad2328fc05 Serialise template immediately after fetching
This commit changes the code in post notification endpoint to handle a
serialised template (ie a `dict`) rather than a database object.

This is the first step towards being able to cache the template and not
hit the database on every request.

There should be no functional changes here, it’s just refactoring.

There are some changes to the tests where the signature of functions
has changed.

Importing of the template schema has to be done at a function level,
otherwise Marshmallow gets weird.

This commit also copies the `JSONModel` class from the admin app, which
turns serialised data (a dict made from JSON) into an object on which
certain predefined properties are allowed.

This means we can still do the caching of serialised data, without
having to change too much of the code in the app, or make it ugly by
sprinkling dict lookups everywhere.

We’re not copying all of JSONModel from the admin app, just the bits we
need. We don’t need to compare or hash these objects, they’re just used
for lookups. And redefining `__getattribute__` scares Leo.
2020-06-22 10:20:51 +01:00
Pea Tyczynska
ef9f3c1e5f Make sure we check if a service can send to number harmoniously
We were checking this separately in two places in the code. Now
we will have this logic in one place, in validators.

Also pull in utils version that recognises crown depenency numbers
as international.
2020-06-19 15:59:15 +01:00
Katie Smith
ab956c9eb3 Update JSON schema postage validation for new values 2020-06-19 15:36:10 +01:00
Katie Smith
bdde035a0c Drop postage constraints
The constaints on notifications and notification_history have already
been dropped in production, but still exist in staging and in dev
environments.

The constraints on templates and templates_history exist in all
environments.
2020-06-19 15:36:10 +01:00
Katie Smith
98a69684c5 Update get_template_by_id_response & post_template_preview_response schemas
To check the format of postage. Neither of these two schemas are used
for validating - they seem to be added for reference.
2020-06-19 08:59:27 +01:00
Katie Smith
72be10c681 Add JSON schema for updating template
We did not have a JSON schema for updating a template. Since we will
remove the postage constraint from the templates table, this adds a JSON
schema for updating a template so that we can use it to check that the
postage is one of the allowed values.
2020-06-19 08:59:27 +01:00
Katie Smith
15112b2148 Update post_create_template_schema
Updated the `post_create_template_schema` to check that the postage is
one of our allowed values.
2020-06-19 08:59:27 +01:00
Rebecca Law
be7afdd12b In the effort to reduce the number of database connections I introduced a small bug. This only affected the test templated letter flow, a None type error would happen when trying to creathe completed_at timestamp for a delivered message.
In the previous PR I removed the `update_notification` method to reduce the need for another update query. However, that meant the notification was marked as delivered without an updated_at timestamp.

It is weird to set the updated_at when we create the notification. So is this a better fix? Or do I put the update back now?

I recommend we push this fix now.
2020-06-18 08:30:19 +01:00
Rebecca Law
77ce058732 Merge pull request #2882 from alphagov/reduce-db-connection
Reduce the use of db connections
2020-06-17 14:22:58 +01:00
Rebecca Law
a8661fbd4b Add None test back 2020-06-17 13:48:48 +01:00
Rebecca Law
e22f61977f Remove print stmts 2020-06-17 12:13:49 +01:00
Rebecca Law
a5ed8f2079 Update the post letter flow - not able to get reduce the dB transactions used in the letter flow. Prioritising the reduction for the SMS/Email flow.
Only update the daily limit cache if the service is in trial mode.
2020-06-17 12:11:28 +01:00
Chris Hill-Scott
b3c69087d8 Serialise less stuff from the service object
By default Marshallow includes unknown properties. This means every time
a new property is added to the service model it gets included in the
JSON-serialised response sent to the admin app.

This is particuarly bad because it means that for returned letters the
ID of every returned letter. So the JSON stored in Redis for the
Check Your State Pension service is 86kb.

Similarly the JSON stored in Redis for a big user of inbound text
messaging is 458kb(!!!) because it has the ID of every received text
message. That’s ~8,500 UUIDs.

Luckily the admin app tells us exactly which keys it’s using here:
5952d9c26d/app/models/service.py (L31-L52)

```python
- `active`
- `contact_link`
- `email_branding`
- `email_from`
- `id`
- `inbound_api`
- `letter_branding`
- `letter_contact_block`
- `message_limit`
- `name`
- `prefix_sms`
- `research_mode`
- `service_callback_api`
- `volume_email`
- `volume_sms`
- `volume_letter`
- `consent_to_research`
- `count_as_live`
- `go_live_user`
- `go_live_at`
}
```

Plus these which it does not get automatically:
- `email_branding`
- `letter_branding`
- `organisation`
- `organisation_type`
- `permissions`
- `restricted`

The API is returning all of these:
- `active`
- `all_template_folders`
- `annual_billing`
- `consent_to_research`
- `contact_link`
- `contact_list`
- `count_as_live`
- `created_by`
- `crown`
- `email_branding`
- `email_from`
- `go_live_at`
- `go_live_user`
- `id`
- `inbound_api`
- `inbound_number`
- `inbound_sms`
- `letter_branding`
- `letter_contact_block`
- `letter_logo_filename`
- `message_limit`
- `name`
- `organisation`
- `organisation_type`
- `permissions`
- `prefix_sms`
- `rate_limit`
- `research_mode`
- `restricted`
- `returned_letters`
- `service_callback_api`
- `users`
- `version`
- `volume_email`
- `volume_letter`
- `volume_sms`
- `whitelist`

So the ones that the admin is getting but not expecting are:
- `all_template_folders`
- `annual_billing`
- `contact_list`
- `created_by`
- `crown`
- `inbound_number`
- `inbound_sms`
- `letter_logo_filename`
- `rate_limit`
- `returned_letters`
- `users`
- `version`
- `whitelist`

Which is what this PR adds to the exclude list, except for `created_by`
which is keeps because it’s needed to validate the JSON provided when
creating a service.
2020-06-16 16:47:56 +01:00
Rebecca Law
21a1b8e8bd Remove the call to the db after the notification has committed.
After the commit we issue two calls to the db to get service and get notification. This is because after the commit the ORM wants to ensure that the data model objects are the latest.

So far this is just a proof of concept, but the letter flow needs to be updated and we should be able to get rid of research mode. And it needs some tidy up.
2020-06-16 14:33:53 +01:00
David McDonald
8b4a424df1 Tidy up 2020-06-12 16:51:44 +01:00
Leo Hemsted
cd9b80f415 set test_errors app fixture to session scope
we have one global metrics variable `metrics = GDSMetrics()`, and we
then call `metrics.init_app` from within the flask application set up.
The v2/test_errors.py app_for_test fixture calls create_app, would call
metrics.init_app multiple times for the same metrics instance. This
causes errors, so change the fixture to session level so it only calls
once per test run.
2020-06-12 14:52:22 +01:00
Pea Tyczynska
b81c7dd5ee Put status codes in logs to see if lack of detailed status is
us not recognising a code or provider not having sent the detailed
status.

It seems like Firetext is sometimes sending us permanent-failure
without detailed status. It could be due to:
- them really not sending any detailed status
- them sending a status code we don't recognise
- them sending 000 code that means 'no errors', which we ignore

To see which one it is, and to debug such issues quicker in the
future, this PR adds status and detailed status codes to the logs.
2020-06-02 13:26:41 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Pea Tyczynska
a4b942cf6c Log detailed sms delivery status for mmg from process_sms_client_response task.
Also log detailed delivery status for firetext in the same place in addition
to it being logged from notifications_dao.

Logging detailed delivery statuses will help us see why messages
fail to deliver. In the future we could persist detailed delivery
status in the database.
2020-06-01 12:44:49 +01:00
Katie Smith
64cd8f39c2 Add the date to the service name and email_reply_to when archiving
This copies what we do to a user's email address when archiving the user
by prefixing it with `_archived_{date}`. We already prefixed the
service name and email_reply_to with `_archived`, but this didn't allow
a service with the same name to be archived more than once.
2020-05-22 09:37:45 +01:00
Katie Smith
0b28766442 Reverts the new postage constraints
Reverts https://github.com/alphagov/notifications-api/pull/2843 and https://github.com/alphagov/notifications-api/pull/2848
2020-05-20 18:31:25 +01:00
Katie Smith
4116affe7f Merge pull request #2843 from alphagov/update-postage-constraint-take-2
Update postage constraint (take 2)
2020-05-20 14:41:44 +01:00
Chris Hill-Scott
95a779c649 Merge pull request #2841 from alphagov/jobs-by-contact-list
Allow jobs to be grouped by contact list
2020-05-20 11:49:25 +01:00