This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint
This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.
The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.
We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
our logs if we want it
- the response times and response codes per flask endpoint are already
exposed using the gds metrics python library
I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
we should have already removed most calls to StatsD which should
greatly reduce the chance of their being any further issues with
DNS resolution
- Some of the other metrics still provide value so no point deleting
them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
users as they are async and also we do not currently have the
infrastructure set up to replace them with a prometheus HTTP endpoint that
can be scraped so this would require more work
Same as we’ve done for templates.
For high volume services this should mean avoiding calls to external
services, either the database or Redis.
TTL is set to 2 seconds, so that’s the maximum time it will take for
revoking an API key or renaming a service to propagate.
Some of the tests created services with the same service ID. This
caused intermittent failures because the cache relies on unique service
IDs (like we have in the real world) to key itself.
We were checking this separately in two places in the code. Now
we will have this logic in one place, in validators.
Also pull in utils version that recognises crown depenency numbers
as international.
see https://github.com/alphagov/notifications-utils/pull/752 for
implementation details. Cache DNS for 15 seconds. Note: This cache is
per eventlet, so concurrent requests will handle their requests
separately, but the cache does persist between sequential requests.
re-enable statsd for all environments.
https://github.com/alphagov/notifications-utils/pull/742
We had been seeing phone numbers making it through in jobs containing
this no-break space which were then causing errors as we could not
process the job as it thought it was an invalid number. This should
solve the problem and strip that special character.
`allow_international_letters` is a new, required argument, so the tests
that make some `Row` objects need to provide that.
There are now 8 possible address columns (7 plus postcode) so the tests
need to expect that. But this won’t have any user-facing impact.
We’ve added some new properties to the templates in utils that we can
use instead of doing weird things like
`WithSubjectTemplate.__str__(another_instance)`
We don’t need to reformat the postcode here once template preview takes
care of it when rendering the PDF.
It’s better (and less code) to store what people give us, so we give
them back the same thing.
- update check_sms_content_char_count to use the SMSTemplate.is_message_too_long function, and updated the error message to align with the message returned by the admin app.
- Update the the code used by version 1 of the api to use the validate_template method.
- I did find a couple of services still using the old api, however, this change should not affect them as I checked the messages being sent and they are not too long.
- We will be sending a message to them to see if they can upgrade.
- Update the log message for authenication to include the URL - makes it easier to track if a service is using version 1 of the api.
This fixes the test in the previous commit and means we will catch other
unexpected jwt errors which are now raised as `TokenError`s and raise an
AuthError based on this.
This will stop us serving 5xx to users when we don't catch an exception.
Also runs make freeze-requirements
Update cffi from 1.13.1 to 1.13.2
Update jsonschema from 3.1.1 to 3.2.0
Update marshmallow-sqlalchemy from 0.19.0 to 0.21.0
Update marshmallow from 2.20.2 to 3.4.0
Update sqlalchemy from 1.3.10 to 1.3.13
Update notifications-python-client from 5.4.1 to 5.5.1
We aren't aware of any reason we need to use our fork of boto anymore.
We therefore swap to use `celery[sqs]` which brings in the original
version of boto and we can remove our use of the fork.
This has been tested by running the celery app and seeing it connect to
sqs and grab messages off a queue.