Commit Graph

123 Commits

Author SHA1 Message Date
Kenneth Kehl
a5f78224b2 remove print statements 2024-01-05 10:39:07 -08:00
Kenneth Kehl
88379c9e46 final 2024-01-05 10:35:14 -08:00
Kenneth Kehl
065009bb7a merge from main and reformat 2023-08-29 16:21:18 -07:00
Kenneth Kehl
1ecb747c6d reformat 2023-08-29 14:54:30 -07:00
Kenneth Kehl
5a350560d7 notify-api-433b remove research mode 2023-08-25 12:09:00 -07:00
Kenneth Kehl
00fd3a72bb code review feedback, fix setup.cfg and reformat 2023-08-25 08:10:33 -07:00
Kenneth Kehl
ccf6f75df9 fix static scan 2023-08-23 11:27:28 -07:00
Kenneth Kehl
026dc14021 notify-api-412 use black to enforce python style standards 2023-08-23 10:35:43 -07:00
Kenneth Kehl
f2f0e5a0f1 code review feedback 2023-08-15 14:50:41 -07:00
Kenneth Kehl
008c3a8d68 initial 2023-06-13 12:57:51 -07:00
Kenneth Kehl
ebb4a37f8d fix more skips 2023-05-22 09:12:12 -07:00
Kenneth Kehl
3fb113a83e notify-152 sms delivery receipts 2023-05-04 07:56:24 -07:00
Kenneth Kehl
001954538e notify-243 remove statsd 2023-04-25 07:50:56 -07:00
Ryan Ahearn
041cd08097 Clean up more mmg and firetext references 2022-12-22 09:31:12 -05:00
stvnrlly
e9fdfd59f4 clean flake8 except provider code 2022-10-19 16:16:26 +00:00
jimmoffet
f1aec54665 clean up comments and method dupes 2022-09-15 15:48:37 -07:00
jimmoffet
b0f819dbd9 canada UK ses callbacks monster mash 2022-09-15 14:59:13 -07:00
Ryan Ahearn
806e2ad2dc Review and update uses of PRNG 2022-08-19 15:26:12 +00:00
Jim Moffet
2bcae20441 initial sms provider cleanup 2022-06-25 13:05:10 -07:00
Jim Moffet
aa4ec532a4 implement SNS 2022-06-17 11:16:23 -07:00
Ben Thorner
e6e16a81d0 Simplify getting name of email / sms providers
Previously we used a combination of "provider.name" and "get_name()"
which was confusing. Using a non-property function also gave me the
impression that the name was more dynamic than it actually is.
2022-03-30 13:36:55 +01:00
David McDonald
2584946823 Close DB connection whilst making HTTP to SMS providers
At the moment, when we are processing and sending an SMS we open
a DB connection at the start of the celery task and then close it
at the end of the celery task. Nice and simple.

However, during that celery task we make an HTTP call out to our
SMS providers. If our SMS providers have problems or response times
start to slow then it means we have an open DB connection sat waiting
for our SMS providers to respond which could take seconds. If our
SMS providers grind to a halt, this would cause all of the
celery tasks to hold on to their connections and we would run out
of DB connections and Notify would fall over.

We think we can solve this by closing the DB session which releases
the DB connection back to the pool.

Note, we've seen this happen in staging during load testing if our
SMS provider stub has fallen over. We've never seen it in production
and it may be less unlikely to happen as we are balancing traffic
across two providers and they generally have very good uptime.

One downside to be aware of is there could be a slight increase in
time spent to send an SMS as we will now spend a bit of extra time
closing the DB session and then reopening it again after the HTTP
request is done.

Note, there is no reason this approach couldn't be copied for our
email provider too if it appears successful.
2021-12-21 17:45:53 +00:00
Chris Hill-Scott
6900505b05 Don’t call random.choices with zero weighting
As of 041d8b48a2
it’s not valid to call `random.choices` without giving at least one of
the options a positive weighting.

This makes sense, because giving a zero weighting is effectively saying
‘theres’s only one choice, but don’t choose it’.

In our codebase this is applicable where there’s only one international
provider, which we want to use even when it’s been de-prioritised for
domestic SMS.

This doesn’t cause a problem now, but will if we upgrade to Python
versions greater than 3.9.0.
2021-08-31 11:08:27 +01:00
Chris Hill-Scott
57249b43c8 Refactor high volume into serialised service model
Just looks a bit tidier and less repetitive.

I’ve only done this for the serialised service because:
- we’re only checking this in places where we’re already using the
  serialised service
- if we want to check this elsewhere there’s a good chance that new code
  should be using the serialised service, since it’ll itself be doing
  some kind of performance optimisation
2021-06-16 10:46:18 +01:00
Rebecca Law
f3fdd3b09b Add internation api key for firetext.
We want to start using Firetext for sending international SMS. They
require us to use a different API key for international SMS because it
requires a new code path to switch the sender ID to something that the
country will accept.
This PR does not include switching the sender of international SMS to
Firetext but sets us up to do so.
2021-04-20 13:58:55 +01:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Rebecca Law
b464894325 update to check for instance of SerialisedService 2021-02-18 12:54:22 +00:00
Rebecca Law
dd686bd7a8 Add caching and remove extra call to database
Add caching by using the SeriralisedTemplate and SerialisedService objects
Removed extra call to the database to fetch the notification after the commit by saving the created_at and key_type to a local variable. After the update to the notification to mark it as sending the db.session is committed. Any reference to the the Notification data model after that will require a query to fetch the object again because it is considered "dirty" or out of date.
Added name, sms_prefix and email branding to SerialisedService.
Refactor the get_html_options to work with the SerialisedService object.
Removed the need to validate and format the to field by using `normalised_to`, since when persisting the notification the `normalised_to` field has already had this done.
Removed the validate and format for reply_to_text for email reply_to, this has been done when the email address has been added via the frontend, no need to validate this address every time a services sends an email.
2021-02-16 14:53:58 +00:00
Rebecca Law
dda7f0d47f Revert "Improve sender task" 2021-02-16 10:19:53 +00:00
Rebecca Law
d67f9fcfd6 Use from_id_and_service_id method from SerialisedTemplate.
Minor updates as per requested from review
2021-02-15 12:41:50 +00:00
Rebecca Law
61af203ad6 User cache for service in send_to_provider methods.
This will remove a call to the db if the service exists in the cache.
2021-02-11 16:45:52 +00:00
Rebecca Law
200f8aad81 Use the cached template object.
By adding SerialisedTemplate we can avoid a database call for the template. This is useful when sending many many emails/sms for the same template/version.
2021-02-11 16:45:52 +00:00
Rebecca Law
2270832873 Remove validate_and_format for email and phone numbers, using normalised_to instead because that function has already happened when persisting the notificaiton.
Remove 2 extra select queries after the update and commit. Once a transaction is committed SQLAlchemy will query for the db model if referenced after a commit.
2021-02-11 16:45:46 +00:00
Chris Hill-Scott
55afc9a401 Increase provider lookup cache TTL to 10 seconds
Tested locally with TTL values of:
- 2 seconds
- 5 seconds
- 10 seconds

The benefit really started showing at 10 seconds, where >50% of lookups
hit the cache rather than the database.

For graphs see https://github.com/alphagov/notifications-api/pull/3075#issuecomment-750836404
2020-12-31 09:36:55 +00:00
Chris Hill-Scott
51b7192750 Cache provider lookups for 2 seconds
For every email or text message we send we have to work out which
provider to send it with. Every time we do this we go and load the list
of providers from the database.

For emails, the result will always be the same.

For text messages the result is randomly chosen to balance the load
between the providers.

For international text messages the result is always the same (we only
have one international text message provider).

This commit adds an in-memory cache with a 2 second TTL so that we’re
not fetching the providers from the database every time, which should
speed things up a bit.

This does mean that, for text messages, the random choice will ‘stick’
for two seconds on each instance, before being re-chosen. I think this
is  OK because it will even out to the same distribution over time.

I really don’t like having to clear the cache in the tests, so would
welcome suggestions on a better way of doing this…
2020-12-23 16:17:27 +00:00
David McDonald
bb6e671bd1 Fix comparison of uuid to string
`service.id` is a uuid so will not be matched to anything in
`current_app.config.get('HIGH_VOLUME_SERVICE')` because that is a list
of strings.

This is why we are never falling into the first if statement and having
any metrics for high volume services on our dashboards at the moment.

Note, I had taken the existing line from the `post_notification`
endpoint, but that is using a serialised service which already has the
UUID converted to a string.
2020-12-01 11:16:15 +00:00
David McDonald
988c3a335a Remove old metric for delivery sending times
We no longer need a metric that covers both test and live keys as this
is not useful
2020-11-30 15:12:13 +00:00
David McDonald
b1336c97a4 Tweak sending time metrics to only include live notifications
Changes the high volume and not high volume metrics to both only include
non test notifications. This is because when looking at the grafana
metrics, it was impossible to tell what affect the high volume/non high
volume effect was having vs the test/live notification effect.

This leaves us with no break down of high volume/not high volume sending
times for test notifications but I don't think we really need that.
2020-11-30 15:05:40 +00:00
David McDonald
2110bc3eef Break down how long it takes to send a notification
We currently measure the sending time for all. This commit then breaks
it down into
- test keys and non test keys
- high volume services and non high volume services

Breaking it down into test keys and non test keys is important because
we don't care as much about sending test notifications within 10
seconds, only non test keys so we don't want our graphs to reflect poor
performance if it's just test keys affecting this

Breaking it down into high volume and non high volume will allow us to
easily debug issues with slow sending if they are high volume or non
high volume issues
2020-11-30 12:07:32 +00:00
Leo Hemsted
732c203d3e rename clients to notification_provider_clients
i think it's causing havoc with my attempts to mock stuff in the
`app.clients` directory because it's also accessible at that path. the
name's super vague and doesn't explain what it is anyway
2020-11-17 13:34:58 +00:00
Pea Tyczynska
efdaadbdf4 Do not update notification to sending if the status is already final
This prevents a race condition when we get delivery receipt before
updating notification to sending, and so the sending status would
supersede the delivered status, and the notification would time out
as temporary-failure after three days.
2020-07-03 17:19:03 +01:00
Leo Hemsted
bd433ad24f bump utils to fix statsd timing bug
statsd timing should always be in seconds, not milliseconds
2020-06-11 15:01:15 +01:00
Leo Hemsted
f7fbd6de5b make 500s change priorities quicker
it's not acceptable for a constantly failing provider to take 50 minutes
to drain (5x reducing priority by 10). But similarly, we need _some_
delay, or a handful of concurrent failures will completely turn off a
provider, rendering the whole excercise kinda pointless. Setting the
delay before it tries to reduce priority again to one minute is nice
because it means that if one request times out and returns 502, then any
other requests that are in flight at that time will time out before the
one minute is up and not switch, but any requests made after the switch
that take sixty seconds to time out will affect it.
2019-11-28 13:29:39 +00:00
Leo Hemsted
3c63ccb159 move from dao_toggle_sms_provider to dao_reduce_sms_provider_priority 2019-11-28 13:29:02 +00:00
Leo Hemsted
fa7e0a1e84 add dao_reduce_sms_provider_priority function
retrive the sms providers from the DB, and decrease the chosen
provider's priority by 10, while increasing the other by 10.

add a check in to ensure we never decrease below 0 or increase above 100
- this is per provider, we don't check that the two add up to 100 or
  anything. If the values are outside of this range (eg: set via the UI)
then they'll probably* fix themselves at some point - we've added tests
to document these cases.

Use with_for_update to ensure that the method can only run once at a
time - other invocations of the function will be held on that line until
the currently running one ends and commits the transaction. This doesn't
affect anyone doing things from the UI.
2019-11-28 13:29:01 +00:00
Leo Hemsted
6f38cbbcf1 randomly choose from providers based on priority
todo: make sure if they don't add up to 100 we do something sensible,
especially if they're both 0.
2019-11-28 13:29:01 +00:00
Katie Smith
3b670a0dbe Delete unused delivery blueprint 2019-08-12 10:54:11 +01:00
Katie Smith
284785a7d7 Bump utils to add alt text to email branding
Utils 33.0.0 adds alt text to email branding - the HTMLEmailTemplate now
initializes slightly differently as a result (with both `branding_name`
and `branding_text`).
2019-06-25 16:53:07 +01:00
Rebecca Law
2844fa530b Fix a bug introduced when refactoring some code. The notification status happened in the wrong order - this resolved that.
This meant notifications sent with a test key never got a 'delivered' status.
2019-05-23 17:04:55 +01:00
Rebecca Law
fc61fd2086 Remove updating provider on failure - we aren't sure yet who is sending the message the callback code sets that property as a fallback anyway.
Remove debug statements.
2019-05-08 16:31:51 +01:00