Commit Graph

87 Commits

Author SHA1 Message Date
Kenneth Kehl
6d84ec64e5 notify-api-522 2023-10-24 11:35:00 -07:00
Kenneth Kehl
2487aeb657 remove debugging 2023-10-02 14:13:25 -07:00
Kenneth Kehl
bd09c63ea9 notify-api-521 fix sms temporary failure message 2023-10-02 14:09:50 -07:00
Kenneth Kehl
8af7a5552f notify-api-520 persist the provider response even for successful sms messages 2023-09-29 13:39:10 -07:00
Kenneth Kehl
bebce829af instructions for bulk testing and change delivery receipt delay to 2 minutes 2023-09-28 14:27:16 -07:00
Kenneth Kehl
62f83ffe1e revamp how authentication code is displayed as per Steven 2023-08-30 09:29:08 -07:00
Kenneth Kehl
4df9fa934d Tweak signin so you don't need a phone to login in dev mode 2023-08-30 07:41:04 -07:00
Kenneth Kehl
1ecb747c6d reformat 2023-08-29 14:54:30 -07:00
Kenneth Kehl
29a280ced4 notify-api-390 2023-08-29 13:12:18 -07:00
Kenneth Kehl
17e9fc1e8f notify-api-317 fix the scrubbing of pii for successful notifications 2023-06-27 10:48:14 -07:00
Kenneth Kehl
e4a98dfcc2 cleanup, get rid of print statements, etc. 2023-06-15 10:45:03 -07:00
Kenneth Kehl
8db16f9410 fix tests 2023-06-15 08:23:00 -07:00
Kenneth Kehl
008c3a8d68 initial 2023-06-13 12:57:51 -07:00
Kenneth Kehl
633532b639 code review feedback 2023-05-26 13:47:05 -07:00
Kenneth Kehl
8f5f9f8f59 Merge branch 'main' of https://github.com/GSA/notifications-api into notify-233b 2023-05-26 13:13:13 -07:00
Kenneth Kehl
1e72f97b17 code review feedback 2023-05-09 08:45:51 -07:00
Kenneth Kehl
b59e4df06d code review feedback 2023-05-05 08:09:15 -07:00
Kenneth Kehl
3fb113a83e notify-152 sms delivery receipts 2023-05-04 07:56:24 -07:00
Kenneth Kehl
6e3d3f325d notify-233: delete notifications from notifications table after they are successfully sent 2023-04-18 12:42:23 -07:00
Ben Thorner
3988a6cd07 Include exception info in SMS warning log
This makes it easier to debug failures when adding a new provider.
2022-03-30 13:36:56 +01:00
Ben Thorner
e3e067c795 Remove redundant @statsd timing decorators
These are superseded by timing task execution generically in the
NotifyTask superclass [1]. Note that we need to wait until we've
gathered enough data under the new metrics before removing these.

[1]: https://github.com/alphagov/notifications-api/pull/3201#pullrequestreview-633549376
2021-04-12 15:19:18 +01:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
David McDonald
ac6837cde5 Downgrade exception to warning for provider API call
When we send an HTTP request to our SMS providers, there is a
chance we get a 5xx status code back from them. Currently we log this as
two different exception level logs.

If a provider has a funny few minutes, we could end up with
hundreds of exceptions thrown and pagerduty waking someone up in the
middle of the night. These problems tend to pretty quickly fix
themselves as we balance traffic from one SMS to the other SMS provider
within 5 minutes.

By downgrading both exceptions to warning in the case of a
`SmsClientResponseException`, we will reduce the change of waking us up
in the middle of the night for no reason.

If the error is not a `SmsClientResponseException`, then we will still
log at the exception level as before as this is more unexpected and we
may want to be alerted sooner.

What we still want to happen though is that let's say both SMS providers
went down at the same time for 1 hour. We don't want our tasks to just
sit there, retrying every 5 minutes for the whole time without us being
aware (so we can at least raise a statuspage update). Luckily we will
still be alerted because our smoke tests will fail after 10 minutes and
raise a p1:
https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21
2021-01-18 17:00:21 +00:00
David McDonald
977554781f Add better logging message for tech failure
So we can easily identify which notification ID failed
2020-12-30 17:28:21 +00:00
David McDonald
2480f91667 Raise better exception on InvalidParameterValue error
There are several reasons why we might get an `InvalidParameterValue`
from the SES API. One, as correctly identified before in
https://github.com/alphagov/notifications-api/pull/713/files
is if we allow an email address on our side that SES rejects.

However, there are other types of errors that could cause an
`InvalidParameterValue`. One example is a `Header too long: 'Subject'`
error that we have seen happen in production. This shouldn't raise an
`InvalidEmailError` as that is not appropriate.

Therefore, we introduce a new exception
`EmailClientNonRetryableException`, that represents any exception back
from an email client that we can use whenever we get a
`InvalidParameterValue` error.

Note, I chose `EmailClientNonRetryableException` rather than
`SESClientNonRetryableException` as our code needs to catch this
exception and it shouldn't be aware of what email client is being used,
it just needs to know that it came from one of the email clients (if in
time we have more than one).

In time, we may wish to extend the approach of having generic
`EmailClient` exceptions and `SMSClient` exceptions as this should be
the most extendable pattern and a good abstraction.
2020-12-30 17:18:16 +00:00
David McDonald
36614e5492 Log warning for SES send rate throttling rather than exception
We have hit throttling limits from SES approximately once a week during
a spike of traffic from GOV.UK. The rate limiting usually only lasts a
couple of minutes but generates enough exceptions to cause a p1 but with
no potential action for the responder.

Therefore we downgrade the warning for this case to a warning and assume
traffic will level back out such that the problem resolves itself.

Note, we will still get exceptions if we go over our daily limit, rather
than our per minute sending limit, which does require immediate action
by someone responding.

If we were to continually go over our per second sending rate for a long
continous period of time, then there is a chance we may not be aware but
given the risk of this happening is low I think it's an acceptable risk
for the moment.
2020-08-13 17:51:09 +01:00
Katie Smith
355fb07eb2 Revert "Change email status to permanent-failure if SES raises InvalidParameterValue"
This reverts commit 51716fbaf8.

Instead of relying on catching SES errors we will convert all emails to
punycode before sending instead.
2019-08-12 13:51:24 +01:00
Katie Smith
51716fbaf8 Change email status to permanent-failure if SES raises InvalidParameterValue
If SES raised an `InvalidParameterValue` error (because an email address
was wrong) we were logging an exception and setting the email status to
`technical-failure`. We now set it to `permanent-failure` instead and
change the log level to `info` - setting it to `permanent-failure` means
that people will know not to retry the message.
2019-08-12 10:24:59 +01:00
Katie Smith
e449e234db Retry deliver_sms task immediately if sending fails
If the `deliver_sms` catches an exception when trying to send an SMS, we
want the first retry to happen immediately (because we will have
switched providers), then every retry after that to happen at the
standard intervals.
2019-08-08 09:34:38 +01:00
Leo Hemsted
267c4fc07b bump requirements, fix pyflake8 things, unpin botocore/awscli 2018-11-07 13:39:08 +00:00
Leo Hemsted
6e87b36303 remove duplication shutdown loggers
also add **kwargs to make it celery4 compatible
2018-07-20 12:09:00 +01:00
Rebecca Law
c75458cee9 Revert change to exception log. 2018-03-26 16:44:29 +01:00
Rebecca Law
598539dcb3 Update logging for provider tasks.
Move the info message before the fetch.
Include the exception in the log message.
2018-03-26 15:24:21 +01:00
Rebecca Law
28e78780d0 Added more logging for provider tasks. 2018-03-26 09:31:52 +01:00
Rebecca Law
cd2d85f2a3 Updates after code review.
- Remove print
- Update exception message.
2018-03-19 14:08:38 +00:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Richard Chapman
d855b4e4ec Removed statsd from the api and use the statsd in the utils library.
The statsd code was added to the utils library a while ago, uses the
statsd from the util library and therefore consolidates the code into
once place.
2018-02-06 09:52:15 +00:00
Leo Hemsted
28d5f9b87f flake8 - remove unused imports and ensure they're always at the top of the file 2017-11-28 14:28:01 +00:00
Richard Chapman
cc4d022213 Adding extra logging to celery tasks ans gunicorn, specifically log on SIGTERM and SIGINIT so that we can track better when an app restarts and why it restarts e.g. when it restarts after another signal. 2017-10-12 11:39:21 +01:00
Leo Hemsted
6c61a3fc2a Revert celery4
Revert the following three pull requests:
https://github.com/alphagov/notifications-api/pull/1085
https://github.com/alphagov/notifications-api/pull/1086
https://github.com/alphagov/notifications-api/pull/1088

celery 4.0.2 looked promising, however, on staging under mild load
(5/sec api calls) the performance was actually worse than 3.1.25
2017-07-19 15:17:19 +01:00
Martyn Inglis
786adb5d71 Move Queuenames in with the celery code, revamp config to allow move to celery 4.x 2017-07-12 12:01:52 +01:00
Martyn Inglis
4768f0b9fd Change retries policy.
Before we had a long back off, now we have more, but shorter backoffs.

- PREVIOUS
When we had an error talking to a provider we retried quickly and if we still got errors we backed off more and more. Maximum attempts was 5, max delay 4hours. This was to allow us time to ship a build if that was required.

- NOW
Backing off 48 times of 5 minutes each. This gives us the same total backoff, but many more tries in that period.

- WHY
Having the long back off meant messages could be delayed 4 hours. This was happening more and more, as PaaS deploys can place things into the "inflight" state in SQS. The inflight state MUST have an expiry time LONGER than the maximum retry back off. This meant that messages would be delayed 4 hours, even when there was no app error.

By doing this we can reduce this delay to 5 minutes. Whilst still giving us time to fix issues.
2017-05-25 11:12:40 +01:00
Martyn Inglis
2591d3a1df This massive set of changes uses the new queue names object throughout the app and tests.
Lots of changes, all changing the line of code that puts things into queues, and the code that tests that.
2017-05-25 10:51:49 +01:00
Imdad Ahad
b5d4acb758 Make message more accurate 2017-04-27 16:58:00 +01:00
Leo Hemsted
0136e1e32d fix invalid logging
the first argument to ANY logger.____ function is ALWAYS cast to a
string and used as a format argument for ALL remaining arguments
using %s formatting. even `logger.exception`, which just logs as
normal and then appends the stack trace.

so we shouldn't be passing `e` into logger.exception - just
`logger.exception('something went wrong!')`

also de-duplicated a test
2016-12-19 17:13:10 +00:00
Leo Hemsted
a2c3d265de remove unused former send_sms_to_provider and send_sms_to_email functions
they were superceded by deliver_sms and deliver_email in the same file 3 wks ago
2016-10-13 15:53:01 +01:00
Leo Hemsted
a095aa41f3 don't retry task if InvalidEmailError
just record it as a technical error - retrying wont fix a bad email
2016-10-13 15:27:47 +01:00
Martyn Inglis
376f8355cb Updated clients to have a more robust error handling
- fire text and omg much more similar. Ready to be combined.
- Error handling now for JSON valid responses
2016-09-22 17:18:05 +01:00
Martyn Inglis
59ab5da5d3 Handle errors where the notification isn't found when executing the tasks
- Thows a NoResultFound sqlalchemy exception
- Which causes a retry. This means we give it a few goes (5, max 5 hours)  for the notification to appear.
- Should never happen, only if we get some task overlaps that are unusual that leads to tasks executed in an overlapping nature.
2016-09-22 09:52:23 +01:00
Martyn Inglis
2f36e0dbcf Refactor to make the new send_to_provider methods take a notification not a notification ID.
- Driven by the fact we won't know the type in the API call
- hence we need to load notification earlier , so pass it not the id through to the send task to avoid loading it twice.
2016-09-22 09:16:58 +01:00