Commit Graph

42 Commits

Author SHA1 Message Date
David McDonald
ac6837cde5 Downgrade exception to warning for provider API call
When we send an HTTP request to our SMS providers, there is a
chance we get a 5xx status code back from them. Currently we log this as
two different exception level logs.

If a provider has a funny few minutes, we could end up with
hundreds of exceptions thrown and pagerduty waking someone up in the
middle of the night. These problems tend to pretty quickly fix
themselves as we balance traffic from one SMS to the other SMS provider
within 5 minutes.

By downgrading both exceptions to warning in the case of a
`SmsClientResponseException`, we will reduce the change of waking us up
in the middle of the night for no reason.

If the error is not a `SmsClientResponseException`, then we will still
log at the exception level as before as this is more unexpected and we
may want to be alerted sooner.

What we still want to happen though is that let's say both SMS providers
went down at the same time for 1 hour. We don't want our tasks to just
sit there, retrying every 5 minutes for the whole time without us being
aware (so we can at least raise a statuspage update). Luckily we will
still be alerted because our smoke tests will fail after 10 minutes and
raise a p1:
https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21
2021-01-18 17:00:21 +00:00
Pea M. Tyczynska
9f816ad5f5 Merge pull request #2856 from alphagov/mmg_detailed_response
Capture detailed delivery receipt status from MMG
2020-06-01 15:23:53 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Pea Tyczynska
a4b942cf6c Log detailed sms delivery status for mmg from process_sms_client_response task.
Also log detailed delivery status for firetext in the same place in addition
to it being logged from notifications_dao.

Logging detailed delivery statuses will help us see why messages
fail to deliver. In the future we could persist detailed delivery
status in the database.
2020-06-01 12:44:49 +01:00
Leo Hemsted
f2f2509c9b add raw request timings to provider send functions
we're using statsd to monitor how long provider requests are taking.
However, there's lots of busy work that happens inside our statsd
metrics timing window. Things like json dumping and loading, building
headers, exception handling, etc.

for firetext/mmg, the response object from requests has an elapsed
property [1], which captures from sending raw data to parsing the
response headers. for ses, it's a bit trickier, but boto3 exposes a few
event hooks [2]. it's hard to find them without stepping through the
code, but the interesting ones are before-call, after-call,
after-call-error, request-created, and response-received. The
before-call and after-call involve some marshalling, built-in retrying,
etc, while request-created and response-received are much lower level.
They might be called more than once per ses request, if boto3 itself
retries the request on 5xx, 429 and low level socket errors [3].

Add these as new `raw-request-time` metrics rather than overwriting to
avoid changing the meaning of an existing metric, and to let us compare
the metrics to see if there's a noticeable difference at all

[1] https://requests.readthedocs.io/en/master/api/#requests.Response.elapsed
[2] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/events.html
[3] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#legacy-retry-mode
2020-05-29 14:04:46 +01:00
Pea Tyczynska
9782cbc5b5 Check failure code on failure only.
We should check error code on failure only, because if we get an
error code on pending code, we should still set the notification
to pending status.
2020-04-21 13:43:45 +01:00
Pea Tyczynska
b962dcac5d Check Firetext code first to determine status and log the reason
If there is no code, fall back to old way of determining status.
If the code is not recognised, log an Error so we are notified
and can look into it.
2020-04-20 18:33:19 +01:00
Pea Tyczynska
91fe68eed4 WIP read firetext response codes 2020-04-17 10:58:25 +01:00
Alexey Bezhan
330afab5e2 Make Firetext URL configurable through the application environment
Similar to MMG, there's a new env variable FIRETEXT_URL that can be
used to override the Firetext api URL.

This will be used to stub out both providers during the load test or
can be used to run a local API against a fake provider endpoint.
2019-04-12 12:03:58 +01:00
Chris Hill-Scott
cefa253578 Remove monotonic
> On Python 3.3 or newer, monotonic will be an alias of time.monotonic
> from the standard library. On older versions, it will fall back to an
> equivalent implementation.

– https://pypi.org/project/monotonic/
2018-05-04 10:56:51 +01:00
Rebecca Law
b2dfa59b1b We are not handling the case of an unknown status code sent by the SMS provider. This PR attempts to fix that.
- If the SMS client sends a status code that we do not recognize raise a ClientException and set the notification status to technical-failure
- Simplified the code in process_client_response, using a simple map.
2018-02-07 17:49:15 +00:00
Martyn Inglis
72ecca4dab Added a 60 second timeout to the MMG and fire text clients.
- This is a CONNECT and a READ timeout.
- Gets wrapped in the standard client exception, with a status code of 504, message Gateway timeout.
- Is quiet noisy in logs to allow us to see it
- Ensures we flick across the provider.

To test change the timeout to 0 and it will timeout.
2017-04-05 15:31:37 +01:00
Martyn Inglis
57bd6494e1 WIP - adding request timeout 2017-04-04 16:05:25 +01:00
Martyn Inglis
2d946736e0 Log notification ID on deliver tasks and in clients.
- help tie things together  in Kibana.
2016-12-20 13:24:08 +00:00
Martyn Inglis
6f7dd149e2 Tidied up pull request comments 2016-09-23 15:59:23 +01:00
Martyn Inglis
54c28a4bea Updated logging as logger was parsing some JSON and erroring.
- Don't write whole JSON response. Safer not to anyway. No need.
2016-09-23 10:24:12 +01:00
Martyn Inglis
376f8355cb Updated clients to have a more robust error handling
- fire text and omg much more similar. Ready to be combined.
- Error handling now for JSON valid responses
2016-09-22 17:18:05 +01:00
Martyn Inglis
fe54fa9f73 Final pass through existing statsd endpoints to ensure they match new naming strategy.
Updates accordingly.
2016-08-08 11:23:58 +01:00
Adam Shimali
c29dd23702 Add sms sender to service to be used in sms templates
in place of default numeric short code.

If not present default short code is used.
2016-07-01 15:27:54 +01:00
NIcholas Staples
964fd5ac35 Merge pull request #377 from alphagov/single_from_number
Replaced mmg from number and firetext from number with single from nu…
2016-06-09 10:41:39 +01:00
Nicholas Staples
903c479d7a Update log messages so they don't produce an error on the utils logger. 2016-06-07 11:59:13 +01:00
Nicholas Staples
fe7d894420 Replaced mmg from number and firetext from number with single from number.
Fix merge mistake.

Fix tests from merge.

Update config to include correct staging and live names.
2016-06-06 16:41:33 +01:00
Leo Hemsted
0a8cb679d7 make mmg and firetext client params consistent 2016-06-01 15:58:48 +01:00
Rebecca Law
97088a4af1 Added a comment to explain how the delivery receipt will affect the notification status. 2016-05-27 12:28:58 +01:00
Rebecca Law
25a1b7f31c Firetext does not have a status code for temporary-failure.
In order to set a message as temporary-failure, we check if it is in pending status first.
Otherwise a delivery receipt for failure is set to permanent failure.
2016-05-26 16:46:00 +01:00
Rebecca Law
fa152c4431 Mark a declined message from Firetext as permanent-failure rather than failed. 2016-05-24 15:32:48 +01:00
Martyn Inglis
3f7559b286 Added statsd integration into the API
- new client for statsd, follows conventions used elsewhere for configuration
- client wraps underlying library so we can use a config property to send/not send statsd

Added statsd metrics for:
- count of API successful calls SMS/Email
- count of successful task execution for SMS/Email
- count of errors from Client libraries
- timing of API calls to third party clients
- timing of how long messages live on the SQS queue
2016-05-13 17:15:39 +01:00
Nicholas Staples
90f0505a3d Update limit to message_limit.
Further db changes and updates.

Remove traceback print out.

Fix bug in passing template id to a task.
2016-04-11 16:53:40 +01:00
Rebecca Law
4806123d5c Add process_mmg_responses
Refactor process_firetext_responses
Removed the abstract ClientResponses for firetext and mmg. There is a map for each response to handle the status codes sent by each client.
Since MMG has about 20 different status code, none of which seem to be a pending state (unlike firetext that has 3 status one for pending - network delay).
For MMG status codes, look for 00 as successful, everything else is assumed to be a failure.
2016-04-06 14:31:33 +01:00
Rebecca Law
f2ee8f3eb7 WIP 2016-04-05 14:39:59 +01:00
Rebecca Law
2ba12da77d WIP: adding delivery receipt endpoint for mmg 2016-04-04 15:02:21 +01:00
Martyn Inglis
e0316d1881 Adds notification stats update into the callback process
- when a provider callback occurs and we update the status of the notification, also update the statistics table

Adds:
- Mapping object to the clients to handle mapping to various states from the response codes, this replaces the map.
- query lookup in the DAO to get the query based on response type / template type

Tests around rest class and dao to check correct updating of stats

Missing:
- multiple client callbacks will keep incrementing the counts of success/failure. This edge case needs to be handle in a future story.
2016-03-21 13:24:37 +00:00
Martyn Inglis
e2cfbce8c4 Added base object for response statuses, and tests around it's behaviour 2016-03-21 09:20:38 +00:00
Martyn Inglis
69654f4209 Parking some code that updates stats when notification delivery happens 2016-03-15 14:40:42 +00:00
Martyn Inglis
2922712f0b Make sms code task use a reference too
- makes the fire text callback behave in consistent way
2016-03-10 15:51:11 +00:00
Martyn Inglis
1f22f2b7cc Updates to fire text integration:
- client updated to raise errors with fire text error codes/messages

New endpoint
- /notifications/sms/firetext
For delivery notifications to be sent to.
2016-03-10 15:40:41 +00:00
Martyn Inglis
c580b9c084 Pass notification ID to fire text as our reference
- also handle fire text errors, non-zero response code means error.
2016-03-10 13:22:45 +00:00
Martyn Inglis
35b7b884f8 Add logging around 3rd party delivery calls
- time SES, Twilio, fire text calls
- use monotonic for accuracy
2016-03-02 09:33:20 +00:00
Martyn Inglis
44632c36d3 Add sender name to the notification
- also ensure that the created time is handled properly
2016-02-25 11:23:04 +00:00
Rebecca Law
9073814d9f I have an issue with the test, not sure why? 2016-02-17 17:48:23 +00:00
Martyn Inglis
837c9b7cdb Removed logging, and make fire text only client. 2016-02-17 13:59:01 +00:00
Martyn Inglis
226459132a Basic (Very basic) implementation of the fire text API.
[https://www.firetext.co.uk/docs#sendingsms](https://www.firetext.co.uk/docs#sendingsms)

Not to be merged. This API has a limit on it at the moment that will need to be removed before it is used in anger.
2016-02-17 12:57:51 +00:00