Commit Graph

68 Commits

Author SHA1 Message Date
Ben Thorner
3eeba0266b Revert "add raw request timings to provider send functions"
This reverts commit f2f2509c9b.
Raw request stats were added to investigate a hunch about a
performance issue we were seeing [1], but turned out not to
be relevant. We don't use them anymore so we can tidy up.

[1]: https://github.com/alphagov/notifications-api/pull/2858
2021-10-28 11:12:18 +01:00
Rebecca Law
f3fdd3b09b Add internation api key for firetext.
We want to start using Firetext for sending international SMS. They
require us to use a different API key for international SMS because it
requires a new code path to switch the sender ID to something that the
country will accept.
This PR does not include switching the sender of international SMS to
Firetext but sets us up to do so.
2021-04-20 13:58:55 +01:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Rebecca Law
19f7a6ce38 Refactor method for deciding the failure type 2021-03-10 14:39:55 +00:00
David McDonald
ac6837cde5 Downgrade exception to warning for provider API call
When we send an HTTP request to our SMS providers, there is a
chance we get a 5xx status code back from them. Currently we log this as
two different exception level logs.

If a provider has a funny few minutes, we could end up with
hundreds of exceptions thrown and pagerduty waking someone up in the
middle of the night. These problems tend to pretty quickly fix
themselves as we balance traffic from one SMS to the other SMS provider
within 5 minutes.

By downgrading both exceptions to warning in the case of a
`SmsClientResponseException`, we will reduce the change of waking us up
in the middle of the night for no reason.

If the error is not a `SmsClientResponseException`, then we will still
log at the exception level as before as this is more unexpected and we
may want to be alerted sooner.

What we still want to happen though is that let's say both SMS providers
went down at the same time for 1 hour. We don't want our tasks to just
sit there, retrying every 5 minutes for the whole time without us being
aware (so we can at least raise a statuspage update). Luckily we will
still be alerted because our smoke tests will fail after 10 minutes and
raise a p1:
https://github.com/alphagov/notifications-functional-tests/blob/master/tests/functional/staging_and_prod/notify_api/test_notify_api_sms.py#L21
2021-01-18 17:00:21 +00:00
Pea M. Tyczynska
9f816ad5f5 Merge pull request #2856 from alphagov/mmg_detailed_response
Capture detailed delivery receipt status from MMG
2020-06-01 15:23:53 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Pea Tyczynska
a4b942cf6c Log detailed sms delivery status for mmg from process_sms_client_response task.
Also log detailed delivery status for firetext in the same place in addition
to it being logged from notifications_dao.

Logging detailed delivery statuses will help us see why messages
fail to deliver. In the future we could persist detailed delivery
status in the database.
2020-06-01 12:44:49 +01:00
Leo Hemsted
f2f2509c9b add raw request timings to provider send functions
we're using statsd to monitor how long provider requests are taking.
However, there's lots of busy work that happens inside our statsd
metrics timing window. Things like json dumping and loading, building
headers, exception handling, etc.

for firetext/mmg, the response object from requests has an elapsed
property [1], which captures from sending raw data to parsing the
response headers. for ses, it's a bit trickier, but boto3 exposes a few
event hooks [2]. it's hard to find them without stepping through the
code, but the interesting ones are before-call, after-call,
after-call-error, request-created, and response-received. The
before-call and after-call involve some marshalling, built-in retrying,
etc, while request-created and response-received are much lower level.
They might be called more than once per ses request, if boto3 itself
retries the request on 5xx, 429 and low level socket errors [3].

Add these as new `raw-request-time` metrics rather than overwriting to
avoid changing the meaning of an existing metric, and to let us compare
the metrics to see if there's a noticeable difference at all

[1] https://requests.readthedocs.io/en/master/api/#requests.Response.elapsed
[2] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/events.html
[3] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#legacy-retry-mode
2020-05-29 14:04:46 +01:00
Pea Tyczynska
9782cbc5b5 Check failure code on failure only.
We should check error code on failure only, because if we get an
error code on pending code, we should still set the notification
to pending status.
2020-04-21 13:43:45 +01:00
Pea Tyczynska
b962dcac5d Check Firetext code first to determine status and log the reason
If there is no code, fall back to old way of determining status.
If the code is not recognised, log an Error so we are notified
and can look into it.
2020-04-20 18:33:19 +01:00
Pea Tyczynska
91fe68eed4 WIP read firetext response codes 2020-04-17 10:58:25 +01:00
Leo Hemsted
e094dd4bfd remove loadtesting from providers
we don't use it since we wrote our own provider stubs for performance
tests.

this removes it from the api - it's still in the DB and will be
retrieved by queries, but is set to disabled on prod
2019-10-23 11:45:07 +01:00
Alexey Bezhan
330afab5e2 Make Firetext URL configurable through the application environment
Similar to MMG, there's a new env variable FIRETEXT_URL that can be
used to override the Firetext api URL.

This will be used to stub out both providers during the load test or
can be used to run a local API against a fake provider endpoint.
2019-04-12 12:03:58 +01:00
Leo Hemsted
267c4fc07b bump requirements, fix pyflake8 things, unpin botocore/awscli 2018-11-07 13:39:08 +00:00
Chris Hill-Scott
cefa253578 Remove monotonic
> On Python 3.3 or newer, monotonic will be an alias of time.monotonic
> from the standard library. On older versions, it will fall back to an
> equivalent implementation.

– https://pypi.org/project/monotonic/
2018-05-04 10:56:51 +01:00
Rebecca Law
b2dfa59b1b We are not handling the case of an unknown status code sent by the SMS provider. This PR attempts to fix that.
- If the SMS client sends a status code that we do not recognize raise a ClientException and set the notification status to technical-failure
- Simplified the code in process_client_response, using a simple map.
2018-02-07 17:49:15 +00:00
Martyn Inglis
72ecca4dab Added a 60 second timeout to the MMG and fire text clients.
- This is a CONNECT and a READ timeout.
- Gets wrapped in the standard client exception, with a status code of 504, message Gateway timeout.
- Is quiet noisy in logs to allow us to see it
- Ensures we flick across the provider.

To test change the timeout to 0 and it will timeout.
2017-04-05 15:31:37 +01:00
Martyn Inglis
57bd6494e1 WIP - adding request timeout 2017-04-04 16:05:25 +01:00
Martyn Inglis
2d946736e0 Log notification ID on deliver tasks and in clients.
- help tie things together  in Kibana.
2016-12-20 13:24:08 +00:00
Martyn Inglis
6f7dd149e2 Tidied up pull request comments 2016-09-23 15:59:23 +01:00
Martyn Inglis
124487f204 Fix from number on Load testing client 2016-09-23 10:31:18 +01:00
Martyn Inglis
54c28a4bea Updated logging as logger was parsing some JSON and erroring.
- Don't write whole JSON response. Safer not to anyway. No need.
2016-09-23 10:24:12 +01:00
Martyn Inglis
376f8355cb Updated clients to have a more robust error handling
- fire text and omg much more similar. Ready to be combined.
- Error handling now for JSON valid responses
2016-09-22 17:18:05 +01:00
Rebecca Law
3d41fa9634 Update mmg status code 2 to be a permanent failure. 2016-09-01 10:23:37 +01:00
Leo Hemsted
26d7675baa pep8 fixes
no idea why the build/local pep8s weren't picking them up before.

also excluded import order pep8
2016-08-23 12:05:47 +01:00
Martyn Inglis
fe54fa9f73 Final pass through existing statsd endpoints to ensure they match new naming strategy.
Updates accordingly.
2016-08-08 11:23:58 +01:00
Rebecca Law
c8ad5362eb Moved mmg_url to configs.
Fix up the tests
2016-07-20 10:40:50 +01:00
Rebecca Law
aca23472e3 Use the new mmg api 2016-07-19 15:27:52 +01:00
Adam Shimali
c29dd23702 Add sms sender to service to be used in sms templates
in place of default numeric short code.

If not present default short code is used.
2016-07-01 15:27:54 +01:00
NIcholas Staples
964fd5ac35 Merge pull request #377 from alphagov/single_from_number
Replaced mmg from number and firetext from number with single from nu…
2016-06-09 10:41:39 +01:00
Nicholas Staples
903c479d7a Update log messages so they don't produce an error on the utils logger. 2016-06-07 11:59:13 +01:00
Nicholas Staples
fe7d894420 Replaced mmg from number and firetext from number with single from number.
Fix merge mistake.

Fix tests from merge.

Update config to include correct staging and live names.
2016-06-06 16:41:33 +01:00
Leo Hemsted
0a8cb679d7 make mmg and firetext client params consistent 2016-06-01 15:58:48 +01:00
Rebecca Law
97088a4af1 Added a comment to explain how the delivery receipt will affect the notification status. 2016-05-27 12:28:58 +01:00
Rebecca Law
25a1b7f31c Firetext does not have a status code for temporary-failure.
In order to set a message as temporary-failure, we check if it is in pending status first.
Otherwise a delivery receipt for failure is set to permanent failure.
2016-05-26 16:46:00 +01:00
Rebecca Law
fa152c4431 Mark a declined message from Firetext as permanent-failure rather than failed. 2016-05-24 15:32:48 +01:00
Adam Shimali
955005d7fe Added additional outcome status codes to mmg responses. 2016-05-19 11:27:22 +01:00
Martyn Inglis
3f7559b286 Added statsd integration into the API
- new client for statsd, follows conventions used elsewhere for configuration
- client wraps underlying library so we can use a config property to send/not send statsd

Added statsd metrics for:
- count of API successful calls SMS/Email
- count of successful task execution for SMS/Email
- count of errors from Client libraries
- timing of API calls to third party clients
- timing of how long messages live on the SQS queue
2016-05-13 17:15:39 +01:00
Martyn Inglis
06bfe81329 Load testing client added 2016-05-12 10:46:35 +01:00
Nicholas Staples
3b1423a2ea Provider Statistics added.
Rates command added with a test.

Updated to include added migration.
2016-04-21 13:47:04 +01:00
Rebecca Law
9a25062943 Update mmg responses so that 3 is success.
Status codes from mmg are:
2: UNDELIVERABLE Message is undeliverable.
3: DELIVERED Message is delivered.
4: EXPIRED Message is expired.
5: REJECTED Message is rejected.
2016-04-20 14:29:31 +01:00
Rebecca Law
f283379646 Defend against status code 00 or 0 from mmg 2016-04-20 11:07:21 +01:00
Rebecca Law
41ce691704 Update to processing the the response from MMG
MMG changed the datatype and the status codes they send us for the delivery receipts.
This PR accounts for that change.
2016-04-20 09:45:13 +01:00
Nicholas Staples
15c4fb47f7 Added multi parameter for sending messages. 2016-04-18 09:55:56 +01:00
Nicholas Staples
90f0505a3d Update limit to message_limit.
Further db changes and updates.

Remove traceback print out.

Fix bug in passing template id to a task.
2016-04-11 16:53:40 +01:00
Rebecca Law
fb04b36cba Fix **kwargs 2016-04-07 10:53:59 +01:00
Rebecca Law
c132bbf46e Rename NOTIFY_FROM_NUMBER to MMG_FROM_NUMBER, there should be a separate short code per provider. 2016-04-07 10:18:46 +01:00
Rebecca Law
340f8ceaf6 Update mmg send_sms to include cid and request headers
Use mmg to send_sms
2016-04-06 17:35:14 +01:00
Rebecca Law
323b2ff537 Use MMG client for send-sms 2016-04-06 15:56:34 +01:00