Commit Graph

98 Commits

Author SHA1 Message Date
Kenneth Kehl
7529ae0153 more 2024-12-13 15:04:18 -08:00
Kenneth Kehl
6de1b226d0 cleanup 2024-12-13 10:15:47 -08:00
Kenneth Kehl
0d280d608a start writing message ids to the notifications table 2024-12-13 07:42:08 -08:00
Kenneth Kehl
921eefbb1e add expire times for redis objects 2024-12-04 08:04:34 -08:00
Kenneth Kehl
205a1da257 add provider tasks tests 2024-10-22 14:14:36 -07:00
Kenneth Kehl
3a04836fb2 add provider tasks tests 2024-10-22 14:05:44 -07:00
Kenneth Kehl
85219bf2d7 add provider tasks tests 2024-10-22 13:54:48 -07:00
Kenneth Kehl
01c811e04a add provider tasks tests 2024-10-22 13:37:26 -07:00
Kenneth Kehl
b07af91653 add provider tasks tests 2024-10-22 13:27:44 -07:00
Kenneth Kehl
697c8edf0e add provider tasks tests 2024-10-22 13:20:27 -07:00
Kenneth Kehl
4b09a2c863 add provider tasks tests 2024-10-22 13:07:54 -07:00
Kenneth Kehl
f2dec7e564 add provider tasks tests 2024-10-22 12:30:43 -07:00
Kenneth Kehl
749d1ac534 add provider tasks tests 2024-10-22 12:26:38 -07:00
Kenneth Kehl
571e91bd93 add provider tasks tests 2024-10-22 12:09:11 -07:00
Kenneth Kehl
ac03cde770 add provider tasks tests 2024-10-22 11:54:50 -07:00
Kenneth Kehl
e2b2cdcfbc fix tests 2024-03-06 10:55:03 -08:00
Cliff Hill
e9f9a3c6f1 Lots more cleanup.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-02-28 12:58:23 -05:00
Cliff Hill
43a8b6539f More fixes, removing literal "created" from code.
Signed-off-by: Cliff Hill <Clifford.hill@gsa.gov>
2024-02-28 12:58:22 -05:00
Kenneth Kehl
bd09c63ea9 notify-api-521 fix sms temporary failure message 2023-10-02 14:09:50 -07:00
Kenneth Kehl
1ecb747c6d reformat 2023-08-29 14:54:30 -07:00
Kenneth Kehl
f2f0e5a0f1 code review feedback 2023-08-15 14:50:41 -07:00
Kenneth Kehl
8f5f9f8f59 Merge branch 'main' of https://github.com/GSA/notifications-api into notify-233b 2023-05-26 13:13:13 -07:00
Kenneth Kehl
d8c3b0dfe4 fix more skips 2023-05-18 10:56:58 -07:00
Kenneth Kehl
3fb113a83e notify-152 sms delivery receipts 2023-05-04 07:56:24 -07:00
Kenneth Kehl
6e3d3f325d notify-233: delete notifications from notifications table after they are successfully sent 2023-04-18 12:42:23 -07:00
stvnrlly
e9fdfd59f4 clean flake8 except provider code 2022-10-19 16:16:26 +00:00
Christa Hartsock
af6495cd4c Get tests passing locally
When we cloned the repository and started making modifications, we
didn't initially keep tests in step. This commit tries to get us to a
clean test run by skipping tests that are failing and removing some
that we no longer expect to use (MMG, Firetext), with the intention that
we will come back in future and update or remove them as appropriate.

To find all tests skipped, search for `@pytest.mark.skip(reason="Needs
updating for TTS:`. There will be a brief description of the work that
needs to be done to get them passing, if known. Delete that line to make
them run in a standard test run (`make test`).
2022-07-07 15:41:15 -07:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Katie Smith
6b8ebb3421 Fix linting errors 2021-02-16 09:03:38 +00:00
David McDonald
ac6837cde5 Downgrade exception to warning for provider API call
When we send an HTTP request to our SMS providers, there is a
chance we get a 5xx status code back from them. Currently we log this as
two different exception level logs.

If a provider has a funny few minutes, we could end up with
hundreds of exceptions thrown and pagerduty waking someone up in the
middle of the night. These problems tend to pretty quickly fix
themselves as we balance traffic from one SMS to the other SMS provider
within 5 minutes.

By downgrading both exceptions to warning in the case of a
`SmsClientResponseException`, we will reduce the change of waking us up
in the middle of the night for no reason.

If the error is not a `SmsClientResponseException`, then we will still
log at the exception level as before as this is more unexpected and we
may want to be alerted sooner.

What we still want to happen though is that let's say both SMS providers
went down at the same time for 1 hour. We don't want our tasks to just
sit there, retrying every 5 minutes for the whole time without us being
aware (so we can at least raise a statuspage update). Luckily we will
still be alerted because our smoke tests will fail after 10 minutes and
raise a p1:
https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21
2021-01-18 17:00:21 +00:00
David McDonald
1cfd19fb6a Reorder tests
So all tests related to sms sending is done first and then all email
sending task tests happen after
2021-01-18 16:46:12 +00:00
David McDonald
2480f91667 Raise better exception on InvalidParameterValue error
There are several reasons why we might get an `InvalidParameterValue`
from the SES API. One, as correctly identified before in
https://github.com/alphagov/notifications-api/pull/713/files
is if we allow an email address on our side that SES rejects.

However, there are other types of errors that could cause an
`InvalidParameterValue`. One example is a `Header too long: 'Subject'`
error that we have seen happen in production. This shouldn't raise an
`InvalidEmailError` as that is not appropriate.

Therefore, we introduce a new exception
`EmailClientNonRetryableException`, that represents any exception back
from an email client that we can use whenever we get a
`InvalidParameterValue` error.

Note, I chose `EmailClientNonRetryableException` rather than
`SESClientNonRetryableException` as our code needs to catch this
exception and it shouldn't be aware of what email client is being used,
it just needs to know that it came from one of the email clients (if in
time we have more than one).

In time, we may wish to extend the approach of having generic
`EmailClient` exceptions and `SMSClient` exceptions as this should be
the most extendable pattern and a good abstraction.
2020-12-30 17:18:16 +00:00
David McDonald
36614e5492 Log warning for SES send rate throttling rather than exception
We have hit throttling limits from SES approximately once a week during
a spike of traffic from GOV.UK. The rate limiting usually only lasts a
couple of minutes but generates enough exceptions to cause a p1 but with
no potential action for the responder.

Therefore we downgrade the warning for this case to a warning and assume
traffic will level back out such that the problem resolves itself.

Note, we will still get exceptions if we go over our daily limit, rather
than our per minute sending limit, which does require immediate action
by someone responding.

If we were to continually go over our per second sending rate for a long
continous period of time, then there is a chance we may not be aware but
given the risk of this happening is low I think it's an acceptable risk
for the moment.
2020-08-13 17:51:09 +01:00
Leo Hemsted
3c63ccb159 move from dao_toggle_sms_provider to dao_reduce_sms_provider_priority 2019-11-28 13:29:02 +00:00
Katie Smith
09e8ac9644 Fix assertions when we catch an error in the tests
Code that is within a `with Python.raises(...)` context manager but
comes after the line that raises the exception doesn't get evaluated.
We had some assertions that we never being tested because of this, so
this ensures that they will always get run and fixes them where
necessary.
2019-09-18 11:04:24 +01:00
Rebecca Law
ae1bc54f9e Update NotificationTechnicalFailureException
- Change the NotificationTechnicalFailureException so that it only inherits from Exception.
- The notify_celery task should create the logging message on failure.
- Fix unit tests
- Remove named parameter when raising exception.
2019-08-12 16:51:39 +01:00
Katie Smith
355fb07eb2 Revert "Change email status to permanent-failure if SES raises InvalidParameterValue"
This reverts commit 51716fbaf8.

Instead of relying on catching SES errors we will convert all emails to
punycode before sending instead.
2019-08-12 13:51:24 +01:00
Katie Smith
51716fbaf8 Change email status to permanent-failure if SES raises InvalidParameterValue
If SES raised an `InvalidParameterValue` error (because an email address
was wrong) we were logging an exception and setting the email status to
`technical-failure`. We now set it to `permanent-failure` instead and
change the log level to `info` - setting it to `permanent-failure` means
that people will know not to retry the message.
2019-08-12 10:24:59 +01:00
Katie Smith
e449e234db Retry deliver_sms task immediately if sending fails
If the `deliver_sms` catches an exception when trying to send an SMS, we
want the first retry to happen immediately (because we will have
switched providers), then every retry after that to happen at the
standard intervals.
2019-08-08 09:34:38 +01:00
Katie Smith
ce24b13654 Combine two tests into one 2019-05-08 11:39:01 +01:00
Katie Smith
c809bf2207 Ensure notifications have billable units and provider if sending fails
If we try to send an SMS to the provider and the provider throws an exception
(because they return a 503 status code) the notification should retry. But if
we get the callback from the provider before the notification has been retried, the
notification will have no billable units or provider set.

To avoid this, we now set billable_units and provider even if there has been
an exception from our provider.
2019-05-08 10:54:01 +01:00
Rebecca Law
598539dcb3 Update logging for provider tasks.
Move the info message before the fetch.
Include the exception in the log message.
2018-03-26 15:24:21 +01:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Leo Hemsted
d2154451e5 update research mode email callbacks to add process-ses-response task to queue
this involved:
* moving that task to callback_tasks to prevent circular imports
* updating the dummy research mode callbacks (with actual SNS messages from the
  ses simulator emails)
* refactoring tests
2017-11-17 13:41:45 +00:00
Martyn Inglis
4768f0b9fd Change retries policy.
Before we had a long back off, now we have more, but shorter backoffs.

- PREVIOUS
When we had an error talking to a provider we retried quickly and if we still got errors we backed off more and more. Maximum attempts was 5, max delay 4hours. This was to allow us time to ship a build if that was required.

- NOW
Backing off 48 times of 5 minutes each. This gives us the same total backoff, but many more tries in that period.

- WHY
Having the long back off meant messages could be delayed 4 hours. This was happening more and more, as PaaS deploys can place things into the "inflight" state in SQS. The inflight state MUST have an expiry time LONGER than the maximum retry back off. This meant that messages would be delayed 4 hours, even when there was no app error.

By doing this we can reduce this delay to 5 minutes. Whilst still giving us time to fix issues.
2017-05-25 11:12:40 +01:00
Martyn Inglis
2591d3a1df This massive set of changes uses the new queue names object throughout the app and tests.
Lots of changes, all changing the line of code that puts things into queues, and the code that tests that.
2017-05-25 10:51:49 +01:00
Rebecca Law
c36d61c071 Update as per comments on review 2017-02-06 16:20:44 +00:00
Imdad Ahad
0a277b26b6 Switch providers ONLY on provider exception 2017-01-20 16:14:29 +00:00
Imdad Ahad
c221118669 Auto switch providers if exception is returned on sms delivery 2017-01-17 15:46:02 +00:00
Leo Hemsted
a2c3d265de remove unused former send_sms_to_provider and send_sms_to_email functions
they were superceded by deliver_sms and deliver_email in the same file 3 wks ago
2016-10-13 15:53:01 +01:00