Commit Graph

73 Commits

Author SHA1 Message Date
stvnrlly
e9fdfd59f4 clean flake8 except provider code 2022-10-19 16:16:26 +00:00
Christa Hartsock
af6495cd4c Get tests passing locally
When we cloned the repository and started making modifications, we
didn't initially keep tests in step. This commit tries to get us to a
clean test run by skipping tests that are failing and removing some
that we no longer expect to use (MMG, Firetext), with the intention that
we will come back in future and update or remove them as appropriate.

To find all tests skipped, search for `@pytest.mark.skip(reason="Needs
updating for TTS:`. There will be a brief description of the work that
needs to be done to get them passing, if known. Delete that line to make
them run in a standard test run (`make test`).
2022-07-07 15:41:15 -07:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Katie Smith
6b8ebb3421 Fix linting errors 2021-02-16 09:03:38 +00:00
David McDonald
ac6837cde5 Downgrade exception to warning for provider API call
When we send an HTTP request to our SMS providers, there is a
chance we get a 5xx status code back from them. Currently we log this as
two different exception level logs.

If a provider has a funny few minutes, we could end up with
hundreds of exceptions thrown and pagerduty waking someone up in the
middle of the night. These problems tend to pretty quickly fix
themselves as we balance traffic from one SMS to the other SMS provider
within 5 minutes.

By downgrading both exceptions to warning in the case of a
`SmsClientResponseException`, we will reduce the change of waking us up
in the middle of the night for no reason.

If the error is not a `SmsClientResponseException`, then we will still
log at the exception level as before as this is more unexpected and we
may want to be alerted sooner.

What we still want to happen though is that let's say both SMS providers
went down at the same time for 1 hour. We don't want our tasks to just
sit there, retrying every 5 minutes for the whole time without us being
aware (so we can at least raise a statuspage update). Luckily we will
still be alerted because our smoke tests will fail after 10 minutes and
raise a p1:
https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21
2021-01-18 17:00:21 +00:00
David McDonald
1cfd19fb6a Reorder tests
So all tests related to sms sending is done first and then all email
sending task tests happen after
2021-01-18 16:46:12 +00:00
David McDonald
2480f91667 Raise better exception on InvalidParameterValue error
There are several reasons why we might get an `InvalidParameterValue`
from the SES API. One, as correctly identified before in
https://github.com/alphagov/notifications-api/pull/713/files
is if we allow an email address on our side that SES rejects.

However, there are other types of errors that could cause an
`InvalidParameterValue`. One example is a `Header too long: 'Subject'`
error that we have seen happen in production. This shouldn't raise an
`InvalidEmailError` as that is not appropriate.

Therefore, we introduce a new exception
`EmailClientNonRetryableException`, that represents any exception back
from an email client that we can use whenever we get a
`InvalidParameterValue` error.

Note, I chose `EmailClientNonRetryableException` rather than
`SESClientNonRetryableException` as our code needs to catch this
exception and it shouldn't be aware of what email client is being used,
it just needs to know that it came from one of the email clients (if in
time we have more than one).

In time, we may wish to extend the approach of having generic
`EmailClient` exceptions and `SMSClient` exceptions as this should be
the most extendable pattern and a good abstraction.
2020-12-30 17:18:16 +00:00
David McDonald
36614e5492 Log warning for SES send rate throttling rather than exception
We have hit throttling limits from SES approximately once a week during
a spike of traffic from GOV.UK. The rate limiting usually only lasts a
couple of minutes but generates enough exceptions to cause a p1 but with
no potential action for the responder.

Therefore we downgrade the warning for this case to a warning and assume
traffic will level back out such that the problem resolves itself.

Note, we will still get exceptions if we go over our daily limit, rather
than our per minute sending limit, which does require immediate action
by someone responding.

If we were to continually go over our per second sending rate for a long
continous period of time, then there is a chance we may not be aware but
given the risk of this happening is low I think it's an acceptable risk
for the moment.
2020-08-13 17:51:09 +01:00
Leo Hemsted
3c63ccb159 move from dao_toggle_sms_provider to dao_reduce_sms_provider_priority 2019-11-28 13:29:02 +00:00
Katie Smith
09e8ac9644 Fix assertions when we catch an error in the tests
Code that is within a `with Python.raises(...)` context manager but
comes after the line that raises the exception doesn't get evaluated.
We had some assertions that we never being tested because of this, so
this ensures that they will always get run and fixes them where
necessary.
2019-09-18 11:04:24 +01:00
Rebecca Law
ae1bc54f9e Update NotificationTechnicalFailureException
- Change the NotificationTechnicalFailureException so that it only inherits from Exception.
- The notify_celery task should create the logging message on failure.
- Fix unit tests
- Remove named parameter when raising exception.
2019-08-12 16:51:39 +01:00
Katie Smith
355fb07eb2 Revert "Change email status to permanent-failure if SES raises InvalidParameterValue"
This reverts commit 51716fbaf8.

Instead of relying on catching SES errors we will convert all emails to
punycode before sending instead.
2019-08-12 13:51:24 +01:00
Katie Smith
51716fbaf8 Change email status to permanent-failure if SES raises InvalidParameterValue
If SES raised an `InvalidParameterValue` error (because an email address
was wrong) we were logging an exception and setting the email status to
`technical-failure`. We now set it to `permanent-failure` instead and
change the log level to `info` - setting it to `permanent-failure` means
that people will know not to retry the message.
2019-08-12 10:24:59 +01:00
Katie Smith
e449e234db Retry deliver_sms task immediately if sending fails
If the `deliver_sms` catches an exception when trying to send an SMS, we
want the first retry to happen immediately (because we will have
switched providers), then every retry after that to happen at the
standard intervals.
2019-08-08 09:34:38 +01:00
Katie Smith
ce24b13654 Combine two tests into one 2019-05-08 11:39:01 +01:00
Katie Smith
c809bf2207 Ensure notifications have billable units and provider if sending fails
If we try to send an SMS to the provider and the provider throws an exception
(because they return a 503 status code) the notification should retry. But if
we get the callback from the provider before the notification has been retried, the
notification will have no billable units or provider set.

To avoid this, we now set billable_units and provider even if there has been
an exception from our provider.
2019-05-08 10:54:01 +01:00
Rebecca Law
598539dcb3 Update logging for provider tasks.
Move the info message before the fetch.
Include the exception in the log message.
2018-03-26 15:24:21 +01:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Leo Hemsted
d2154451e5 update research mode email callbacks to add process-ses-response task to queue
this involved:
* moving that task to callback_tasks to prevent circular imports
* updating the dummy research mode callbacks (with actual SNS messages from the
  ses simulator emails)
* refactoring tests
2017-11-17 13:41:45 +00:00
Martyn Inglis
4768f0b9fd Change retries policy.
Before we had a long back off, now we have more, but shorter backoffs.

- PREVIOUS
When we had an error talking to a provider we retried quickly and if we still got errors we backed off more and more. Maximum attempts was 5, max delay 4hours. This was to allow us time to ship a build if that was required.

- NOW
Backing off 48 times of 5 minutes each. This gives us the same total backoff, but many more tries in that period.

- WHY
Having the long back off meant messages could be delayed 4 hours. This was happening more and more, as PaaS deploys can place things into the "inflight" state in SQS. The inflight state MUST have an expiry time LONGER than the maximum retry back off. This meant that messages would be delayed 4 hours, even when there was no app error.

By doing this we can reduce this delay to 5 minutes. Whilst still giving us time to fix issues.
2017-05-25 11:12:40 +01:00
Martyn Inglis
2591d3a1df This massive set of changes uses the new queue names object throughout the app and tests.
Lots of changes, all changing the line of code that puts things into queues, and the code that tests that.
2017-05-25 10:51:49 +01:00
Rebecca Law
c36d61c071 Update as per comments on review 2017-02-06 16:20:44 +00:00
Imdad Ahad
0a277b26b6 Switch providers ONLY on provider exception 2017-01-20 16:14:29 +00:00
Imdad Ahad
c221118669 Auto switch providers if exception is returned on sms delivery 2017-01-17 15:46:02 +00:00
Leo Hemsted
a2c3d265de remove unused former send_sms_to_provider and send_sms_to_email functions
they were superceded by deliver_sms and deliver_email in the same file 3 wks ago
2016-10-13 15:53:01 +01:00
Leo Hemsted
a095aa41f3 don't retry task if InvalidEmailError
just record it as a technical error - retrying wont fix a bad email
2016-10-13 15:27:47 +01:00
Martyn Inglis
c13ead77e4 Ensures that both the CSV processing and the API both use the new deliver_email and deliver_sms tasks not the old ones with the redundant parameter. 2016-09-28 15:05:50 +01:00
Martyn Inglis
137aa1e2d2 Merge branch 'master' into refactor-send-tasks-into-shared-code
Conflicts:
	app/celery/provider_tasks.py
	tests/app/celery/test_provider_tasks.py
2016-09-23 09:54:53 +01:00
Martyn Inglis
37a2dc7925 Implemented the REST endpoint to communicate with the new send-to-provider code
- this allows us to send a notification to a provider by means of an API call
- This is in addition to the celery code.
- idea is that we can use this method to help speed up throughput by generating API traffic by node/lambda etc to supplement the celery code in times of high load.
2016-09-22 14:01:25 +01:00
Martyn Inglis
59ab5da5d3 Handle errors where the notification isn't found when executing the tasks
- Thows a NoResultFound sqlalchemy exception
- Which causes a retry. This means we give it a few goes (5, max 5 hours)  for the notification to appear.
- Should never happen, only if we get some task overlaps that are unusual that leads to tasks executed in an overlapping nature.
2016-09-22 09:52:23 +01:00
Martyn Inglis
991cc884b4 Refactoring the send to provider code out of the tasks folder
- building a rest endpoint to call that code to compliment the existing task based approach.
2016-09-21 13:27:32 +01:00
Martyn Inglis
bd06b18684 Refactor of celery tasks.
- Aim to move the code that contacts providers into it's own module.
- Celery tasks now call this module to send to provider
- No exceptions caught in the new module. Celery tasks now use any exception to trigger a retry.
- tests moved about - new test directory for the new class, all tests from celery test module moved, excepting the retry logic.
2016-09-20 17:24:28 +01:00
Rebecca Law
51b3119a21 Remove provider_statistics_dao.get_provider_statistics
The provider_statistics table is no longer being populated, get rid of any reads from this table.
2016-09-19 17:24:26 +01:00
minglis
287d32c037 Merge pull request #638 from alphagov/remove-contested-writes
Remove contested writes
2016-08-30 08:56:45 +01:00
Chris Hill-Scott
f7391da350 Don’t prefix text messages is sender name is set
Implements:
- [x] https://github.com/alphagov/notifications-utils/pull/66
2016-08-26 14:45:01 +01:00
Martyn Inglis
893164ae40 Removed contented updates the notifications stats table
- As before this is now driven from the notifications history table

- Removed from updates and create
- Signatures changes to removed unused params hits many files
- Also potential issue around rate limiting - we used to get the number sent per day from the stats table - which was a single row lookup, now we have to count this. This applies to EVERY API CALL. Probably not a good thing and should be addressed urgently.
2016-08-25 11:55:38 +01:00
Martyn Inglis
708f566c24 Removed updates to the provider stats table
- again these new come from the notifications history table
- We update this when we sent a notification, so removed from celery tasks
- tests removed also
2016-08-25 10:33:12 +01:00
Martyn Inglis
44bc071037 Removed the updates to the job table to track delivery of notifications
Previously we kept a running total of job progress/success/failure on the job table. This causes contention, we now generate this data from notification history.

Removed these updates.
2016-08-25 09:29:53 +01:00
Leo Hemsted
5418f65ae7 Merge pull request #599 from alphagov/version-500
fix GET /notifications 500 error
2016-08-22 10:59:46 +01:00
Leo Hemsted
46c0728b12 add actual_template relationship to notification
also renamed the function to make it apparent that it'll join and grab personalisation
2016-08-09 16:57:18 +01:00
Leo Hemsted
e88624ed4a attempt to pull logos from the admin app's static images directory
(this is configured by a config value)
2016-08-09 16:03:04 +01:00
Leo Hemsted
ebb8947290 look at service's organisation for branding to pass through to email renderer 2016-08-09 16:03:04 +01:00
minglis
d67e43a6d0 Merge pull request #583 from alphagov/stats-db-updates
Stats db updates
2016-08-08 11:48:01 +01:00
Martyn Inglis
f223446f73 Refactor statsd logging
Removed all existing statsd logging and replaced with:

- statsd decorator. Infers the stat name from the decorated function call. Delegates statsd call to statsd client. Calls incr and timing for each decorated method. This is applied to all tasks and all dao methods that touch the notifications/notification_history tables

- statsd client changed to prefix all stats with "notification.api."

- Relies on https://github.com/alphagov/notifications-utils/pull/61 for request logging. Once integrated we pass the statsd client to the logger, allowing us to statsd all API calls. This passes in the start time and the method to be called (NOT the url) onto the global flask object. We then construct statsd counters and timers in the following way

	notifications.api.POST.notifications.send_notification.200

This should allow us to aggregate to the level of

	- API or ADMIN
	- POST or GET etc
	- modules
	- methods
	- status codes

Finally we count the callbacks received from 3rd parties to mapped status.
2016-08-05 10:44:43 +01:00
Leo Hemsted
143cfb526c change update_provider_stats to use billable_units
updated tests etc, and removed some old tests that are no longer relevant
2016-08-03 17:22:20 +01:00
Leo Hemsted
527a5c4eaa calculate billable units when sending an sms
don't calculate it if we're in research mode
* added tests to prove this
* removed last code referring to content_char_count
2016-08-03 16:46:05 +01:00
Adam Shimali
cfd42a8a05 Statsd timings were being passed a timedelta instead for float for
milliseconds.

Bugfix for https://www.pivotaltracker.com/story/show/126852733
2016-07-22 12:13:24 +01:00
Chris Hill-Scott
aa12c88551 Add a test for sending email to provider
We had a test like this for sending sms, but not email. This meant that,
for example, we weren’t checking that the provider was getting passed
the HTML and plain text versions of the email.
2016-07-08 11:16:45 +01:00
Chris Hill-Scott
bb3c55ca6c Make send sms test use a template with a newline 2016-07-08 11:16:45 +01:00
Chris Hill-Scott
a8a556d02a Use PassThrough renderer
Implements and depends on:
- [ ] https://github.com/alphagov/notifications-utils/pull/49
2016-07-07 22:48:07 +01:00