Commit Graph

97 Commits

Author SHA1 Message Date
Ryan Ahearn
806e2ad2dc Review and update uses of PRNG 2022-08-19 15:26:12 +00:00
Jim Moffet
2bcae20441 initial sms provider cleanup 2022-06-25 13:05:10 -07:00
Jim Moffet
aa4ec532a4 implement SNS 2022-06-17 11:16:23 -07:00
Ben Thorner
e6e16a81d0 Simplify getting name of email / sms providers
Previously we used a combination of "provider.name" and "get_name()"
which was confusing. Using a non-property function also gave me the
impression that the name was more dynamic than it actually is.
2022-03-30 13:36:55 +01:00
David McDonald
2584946823 Close DB connection whilst making HTTP to SMS providers
At the moment, when we are processing and sending an SMS we open
a DB connection at the start of the celery task and then close it
at the end of the celery task. Nice and simple.

However, during that celery task we make an HTTP call out to our
SMS providers. If our SMS providers have problems or response times
start to slow then it means we have an open DB connection sat waiting
for our SMS providers to respond which could take seconds. If our
SMS providers grind to a halt, this would cause all of the
celery tasks to hold on to their connections and we would run out
of DB connections and Notify would fall over.

We think we can solve this by closing the DB session which releases
the DB connection back to the pool.

Note, we've seen this happen in staging during load testing if our
SMS provider stub has fallen over. We've never seen it in production
and it may be less unlikely to happen as we are balancing traffic
across two providers and they generally have very good uptime.

One downside to be aware of is there could be a slight increase in
time spent to send an SMS as we will now spend a bit of extra time
closing the DB session and then reopening it again after the HTTP
request is done.

Note, there is no reason this approach couldn't be copied for our
email provider too if it appears successful.
2021-12-21 17:45:53 +00:00
Chris Hill-Scott
6900505b05 Don’t call random.choices with zero weighting
As of 041d8b48a2
it’s not valid to call `random.choices` without giving at least one of
the options a positive weighting.

This makes sense, because giving a zero weighting is effectively saying
‘theres’s only one choice, but don’t choose it’.

In our codebase this is applicable where there’s only one international
provider, which we want to use even when it’s been de-prioritised for
domestic SMS.

This doesn’t cause a problem now, but will if we upgrade to Python
versions greater than 3.9.0.
2021-08-31 11:08:27 +01:00
Chris Hill-Scott
57249b43c8 Refactor high volume into serialised service model
Just looks a bit tidier and less repetitive.

I’ve only done this for the serialised service because:
- we’re only checking this in places where we’re already using the
  serialised service
- if we want to check this elsewhere there’s a good chance that new code
  should be using the serialised service, since it’ll itself be doing
  some kind of performance optimisation
2021-06-16 10:46:18 +01:00
Rebecca Law
f3fdd3b09b Add internation api key for firetext.
We want to start using Firetext for sending international SMS. They
require us to use a different API key for international SMS because it
requires a new code path to switch the sender ID to something that the
country will accept.
This PR does not include switching the sender of international SMS to
Firetext but sets us up to do so.
2021-04-20 13:58:55 +01:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Rebecca Law
b464894325 update to check for instance of SerialisedService 2021-02-18 12:54:22 +00:00
Rebecca Law
dd686bd7a8 Add caching and remove extra call to database
Add caching by using the SeriralisedTemplate and SerialisedService objects
Removed extra call to the database to fetch the notification after the commit by saving the created_at and key_type to a local variable. After the update to the notification to mark it as sending the db.session is committed. Any reference to the the Notification data model after that will require a query to fetch the object again because it is considered "dirty" or out of date.
Added name, sms_prefix and email branding to SerialisedService.
Refactor the get_html_options to work with the SerialisedService object.
Removed the need to validate and format the to field by using `normalised_to`, since when persisting the notification the `normalised_to` field has already had this done.
Removed the validate and format for reply_to_text for email reply_to, this has been done when the email address has been added via the frontend, no need to validate this address every time a services sends an email.
2021-02-16 14:53:58 +00:00
Rebecca Law
dda7f0d47f Revert "Improve sender task" 2021-02-16 10:19:53 +00:00
Rebecca Law
d67f9fcfd6 Use from_id_and_service_id method from SerialisedTemplate.
Minor updates as per requested from review
2021-02-15 12:41:50 +00:00
Rebecca Law
61af203ad6 User cache for service in send_to_provider methods.
This will remove a call to the db if the service exists in the cache.
2021-02-11 16:45:52 +00:00
Rebecca Law
200f8aad81 Use the cached template object.
By adding SerialisedTemplate we can avoid a database call for the template. This is useful when sending many many emails/sms for the same template/version.
2021-02-11 16:45:52 +00:00
Rebecca Law
2270832873 Remove validate_and_format for email and phone numbers, using normalised_to instead because that function has already happened when persisting the notificaiton.
Remove 2 extra select queries after the update and commit. Once a transaction is committed SQLAlchemy will query for the db model if referenced after a commit.
2021-02-11 16:45:46 +00:00
Chris Hill-Scott
55afc9a401 Increase provider lookup cache TTL to 10 seconds
Tested locally with TTL values of:
- 2 seconds
- 5 seconds
- 10 seconds

The benefit really started showing at 10 seconds, where >50% of lookups
hit the cache rather than the database.

For graphs see https://github.com/alphagov/notifications-api/pull/3075#issuecomment-750836404
2020-12-31 09:36:55 +00:00
Chris Hill-Scott
51b7192750 Cache provider lookups for 2 seconds
For every email or text message we send we have to work out which
provider to send it with. Every time we do this we go and load the list
of providers from the database.

For emails, the result will always be the same.

For text messages the result is randomly chosen to balance the load
between the providers.

For international text messages the result is always the same (we only
have one international text message provider).

This commit adds an in-memory cache with a 2 second TTL so that we’re
not fetching the providers from the database every time, which should
speed things up a bit.

This does mean that, for text messages, the random choice will ‘stick’
for two seconds on each instance, before being re-chosen. I think this
is  OK because it will even out to the same distribution over time.

I really don’t like having to clear the cache in the tests, so would
welcome suggestions on a better way of doing this…
2020-12-23 16:17:27 +00:00
David McDonald
bb6e671bd1 Fix comparison of uuid to string
`service.id` is a uuid so will not be matched to anything in
`current_app.config.get('HIGH_VOLUME_SERVICE')` because that is a list
of strings.

This is why we are never falling into the first if statement and having
any metrics for high volume services on our dashboards at the moment.

Note, I had taken the existing line from the `post_notification`
endpoint, but that is using a serialised service which already has the
UUID converted to a string.
2020-12-01 11:16:15 +00:00
David McDonald
988c3a335a Remove old metric for delivery sending times
We no longer need a metric that covers both test and live keys as this
is not useful
2020-11-30 15:12:13 +00:00
David McDonald
b1336c97a4 Tweak sending time metrics to only include live notifications
Changes the high volume and not high volume metrics to both only include
non test notifications. This is because when looking at the grafana
metrics, it was impossible to tell what affect the high volume/non high
volume effect was having vs the test/live notification effect.

This leaves us with no break down of high volume/not high volume sending
times for test notifications but I don't think we really need that.
2020-11-30 15:05:40 +00:00
David McDonald
2110bc3eef Break down how long it takes to send a notification
We currently measure the sending time for all. This commit then breaks
it down into
- test keys and non test keys
- high volume services and non high volume services

Breaking it down into test keys and non test keys is important because
we don't care as much about sending test notifications within 10
seconds, only non test keys so we don't want our graphs to reflect poor
performance if it's just test keys affecting this

Breaking it down into high volume and non high volume will allow us to
easily debug issues with slow sending if they are high volume or non
high volume issues
2020-11-30 12:07:32 +00:00
Leo Hemsted
732c203d3e rename clients to notification_provider_clients
i think it's causing havoc with my attempts to mock stuff in the
`app.clients` directory because it's also accessible at that path. the
name's super vague and doesn't explain what it is anyway
2020-11-17 13:34:58 +00:00
Pea Tyczynska
efdaadbdf4 Do not update notification to sending if the status is already final
This prevents a race condition when we get delivery receipt before
updating notification to sending, and so the sending status would
supersede the delivered status, and the notification would time out
as temporary-failure after three days.
2020-07-03 17:19:03 +01:00
Leo Hemsted
bd433ad24f bump utils to fix statsd timing bug
statsd timing should always be in seconds, not milliseconds
2020-06-11 15:01:15 +01:00
Leo Hemsted
f7fbd6de5b make 500s change priorities quicker
it's not acceptable for a constantly failing provider to take 50 minutes
to drain (5x reducing priority by 10). But similarly, we need _some_
delay, or a handful of concurrent failures will completely turn off a
provider, rendering the whole excercise kinda pointless. Setting the
delay before it tries to reduce priority again to one minute is nice
because it means that if one request times out and returns 502, then any
other requests that are in flight at that time will time out before the
one minute is up and not switch, but any requests made after the switch
that take sixty seconds to time out will affect it.
2019-11-28 13:29:39 +00:00
Leo Hemsted
3c63ccb159 move from dao_toggle_sms_provider to dao_reduce_sms_provider_priority 2019-11-28 13:29:02 +00:00
Leo Hemsted
fa7e0a1e84 add dao_reduce_sms_provider_priority function
retrive the sms providers from the DB, and decrease the chosen
provider's priority by 10, while increasing the other by 10.

add a check in to ensure we never decrease below 0 or increase above 100
- this is per provider, we don't check that the two add up to 100 or
  anything. If the values are outside of this range (eg: set via the UI)
then they'll probably* fix themselves at some point - we've added tests
to document these cases.

Use with_for_update to ensure that the method can only run once at a
time - other invocations of the function will be held on that line until
the currently running one ends and commits the transaction. This doesn't
affect anyone doing things from the UI.
2019-11-28 13:29:01 +00:00
Leo Hemsted
6f38cbbcf1 randomly choose from providers based on priority
todo: make sure if they don't add up to 100 we do something sensible,
especially if they're both 0.
2019-11-28 13:29:01 +00:00
Katie Smith
284785a7d7 Bump utils to add alt text to email branding
Utils 33.0.0 adds alt text to email branding - the HTMLEmailTemplate now
initializes slightly differently as a result (with both `branding_name`
and `branding_text`).
2019-06-25 16:53:07 +01:00
Rebecca Law
2844fa530b Fix a bug introduced when refactoring some code. The notification status happened in the wrong order - this resolved that.
This meant notifications sent with a test key never got a 'delivered' status.
2019-05-23 17:04:55 +01:00
Rebecca Law
fc61fd2086 Remove updating provider on failure - we aren't sure yet who is sending the message the callback code sets that property as a fallback anyway.
Remove debug statements.
2019-05-08 16:31:51 +01:00
Rebecca Law
77c35da5ca Refactor send_email_to_provider
- Remove debug statement
2019-05-08 15:55:45 +01:00
Rebecca Law
53f0e9334a Refactor send_to_providers.
- Remove some redundant code for research mode.
- The international parameter in update_notification_to_sending is not needed.
- Update unit tests and removed duplicates
2019-05-08 15:45:19 +01:00
Katie Smith
c809bf2207 Ensure notifications have billable units and provider if sending fails
If we try to send an SMS to the provider and the provider throws an exception
(because they return a 503 status code) the notification should retry. But if
we get the callback from the provider before the notification has been retried, the
notification will have no billable units or provider set.

To avoid this, we now set billable_units and provider even if there has been
an exception from our provider.
2019-05-08 10:54:01 +01:00
Chris Hill-Scott
0c47d41977 Remove govuk from possible brands
‘GOV.UK’ doesn’t make sense as a type of brand. It only made sense as
a type of branding that a service had.

Since we’ve:
- deprecated the service branding column
- made sure it’s not used as a value in the email branding table

we can remove this value from the table of possible brand types.
2018-08-30 16:36:35 +01:00
Chris Hill-Scott
2671ca3e5e Look in email_branding for brand_type, not service
https://www.pivotaltracker.com/story/show/159986276

We are now setting the type of branding on the branding itself, not on
the service.

This commit switches over from looking in the old place (on the service)
to looking in the new place (on the branding).
2018-08-28 16:22:41 +01:00
Rebecca Law
562facb49c Now that EmailBranding includes the brand type we do not need two separate colours. This PR removes the new colour columns. 2018-08-24 13:51:11 +01:00
Rebecca Law
fd16d7060a Added a method to decide which colour to send to the notifications-utils HTMLEmailTemplate to create the content. 2018-08-22 12:59:06 +01:00
Rebecca Law
69ab649256 Use the EmailBranding.text for the bannder instead of the name. 2018-08-13 10:29:06 +01:00
Rebecca Law
0dc50190b2 Throw an exception whenever we updated a notification to technical failure.
If this is happening we want to know about it.
2018-03-16 17:18:44 +00:00
Katie Smith
7f2e9f507e Delete functions which call the job statistics tasks
The JobStatistics table is going to be deleted. There are currently
3 tasks which use the JobStatistics model via the Statistics DAO, so we
need to make sure that these tasks aren't being used before they are
deleted in a separate PR.

This commit deletes:
* The `create_initial_notification_statistic_tasks` function which gets
used to call the `record_initial_job_statistics` task.
* The `create_outcome_notification_statistic_tasks` function which gets
used to call the `record_outcome_job_statistics` task.
* And the scheduling of the `timeout-job-statistics` scheduled task.
2018-03-07 09:23:29 +00:00
Rebecca Law
8aa829e93a Merge pull request #1622 from alphagov/reduce-logging
Reducing the logging for the life cycle of a notification
2018-02-08 16:57:21 +00:00
Leo Hemsted
2f79da8702 create, edit and use email branding instead of organisation
notable things that have been kept until migration is complete:

* passing in `organisation` to update_service will update email branding
* both `/email-branding` and `/organisation` hit the same code
* service endpoints still return organisation as well as email branding
2018-02-06 11:23:34 +00:00
Rebecca Law
dce79832ff As Notify matures we probably need less logging, especially to report happy path events.
This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2.

Messaging would change to something like this:
February 2nd 2018, 15:39:05.885	Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
{'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']}
February 2nd 2018, 15:39:05.885	Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:57.727	SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814
February 2nd 2018, 15:38:56.727	Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181
February 2nd 2018, 15:38:56.727	Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827
February 2nd 2018, 15:38:49.449	sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
February 2nd 2018, 15:38:49.449	sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery

To somthing like this:
February 2nd 2018, 15:39:05.885	Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:49.449	sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
2018-02-02 15:55:25 +00:00
Rebecca Law
fbcf777884 Remove unused import 2017-11-29 14:38:38 +00:00
Rebecca Law
ac6d041d26 Merge branch 'master' into use-reply-to-in-send-to-provider 2017-11-29 14:34:28 +00:00
Leo Hemsted
28088428f1 flake8 - misc flake8 errs.
* unused variables
* variables in loops overshadowing imports
* excepts with no defined exc type (tried to avoid `except Exception` too)
* history mapper is still too complex
* default variables should never be mutable
2017-11-28 14:28:01 +00:00
Rebecca Law
03c3ebbbe7 Update send_to_providers and create_dvla_file_contents_for_notifications to use notification.reply_to_text.
The next thing to do is to stop updating the notification to sender mapping tables.
2017-11-27 15:24:16 +00:00
Athanasios Voutsadakis
a0beb190f3 Merge branch 'master' into bump_utils_2301 2017-11-20 11:16:29 +00:00