Commit Graph

7259 Commits

Author SHA1 Message Date
Leo Hemsted
cd9b80f415 set test_errors app fixture to session scope
we have one global metrics variable `metrics = GDSMetrics()`, and we
then call `metrics.init_app` from within the flask application set up.
The v2/test_errors.py app_for_test fixture calls create_app, would call
metrics.init_app multiple times for the same metrics instance. This
causes errors, so change the fixture to session level so it only calls
once per test run.
2020-06-12 14:52:22 +01:00
Leo Hemsted
faa8faa0c4 bump prometheus-client to 0.7.1
there's multiple performance improvements from prometheus-client 0.2.0.
pin this bump while we wait for gds metrics client to increase its
dependency
2020-06-12 14:52:22 +01:00
Leo Hemsted
c4dc0f64c5 dd sqlalchemy connection metrics for celery tasks
grab the worker app name and task name rather than the web host and
endpoint. also add a fallback for if we're not in a web request or a
celery task. I think that'll probably happen when we use alembic, or if
we do things from within flask shell
2020-06-12 14:52:22 +01:00
Leo Hemsted
6e32ca5996 add prometheus metrics for connection pools (both web and sql)
add the following prometheus metrics to keep track of general app
instance health.

 # sqs_apply_async_duration

how long does the actual SQS call (a standard web request to AWS) take.
a histogram with default bucket sizes, split up per task that was
created.

 # concurrent_web_request_count

how many web requests is this app currently serving. this is split up
per process, so we'd expect multiple responses per app instance

 # db_connection_total_connected

how many connections does this app (process) have open to the database.
They might be idle.

 # db_connection_total_checked_out

how many connections does this app (process) have open that are
currently in use by a web worker

 # db_connection_open_duration_seconds

a histogram per endpoint of how long the db connection was taken from
the pool for. won't have any data if a connection was never opened.
2020-06-12 14:52:22 +01:00
Leo Hemsted
4bb37a05ec Merge pull request #2871 from alphagov/utils-bump
bump utils to fix statsd timing bug
2020-06-11 15:13:04 +01:00
Leo Hemsted
bd433ad24f bump utils to fix statsd timing bug
statsd timing should always be in seconds, not milliseconds
2020-06-11 15:01:15 +01:00
Chris Hill-Scott
4d2396699a Merge pull request #2866 from alphagov/bump-utils-39.4.3
Bump utils to 39.4.3
2020-06-10 10:28:00 +01:00
Pea M. Tyczynska
852d1ffa03 Merge pull request #2867 from alphagov/turn-off-email-stub
Turn off email stub on staging
2020-06-09 12:10:28 +01:00
Pea Tyczynska
bb3672100f Turn off email stub on staging
To check if it was our congestion point for the soak test.
(Email stub is a bit slower to respond than Amazon SES, and
the difference could lead to us having to many connections to db open
as we wait for response from the stub)
2020-06-09 11:21:48 +01:00
Chris Hill-Scott
0f353739f5 Bump utils to 39.4.3
Brings in a bug fix for when a personalisation value is an empty list.
2020-06-09 10:45:09 +01:00
Pea M. Tyczynska
c63a78242c Merge pull request #2865 from alphagov/stub-email-and-sms-in-staging
Point API for staging at email and sms stubs for the soak tests.
2020-06-08 17:45:44 +01:00
Pea Tyczynska
2aa6470817 Point API for staging at email and sms stubs for the soak tests.
This is done to avoid sending real email and sms and incurring
unnecessary charges while we run the soak tests.
2020-06-08 17:14:16 +01:00
David McDonald
a1f8ec7d7c Merge pull request #2862 from alphagov/ses-client-stub
Stub SES email client to avoid hitting SES during load testing
2020-06-08 13:49:13 +01:00
David McDonald
7bc02ac26e Add better logging to understand callback delivery cases
In particular, we want to know how often callbacks arrive before the
notification being persisted
2020-06-04 16:00:43 +01:00
Pea Tyczynska
8b7a0b88cb Ensure that aws ses stub client is not run in production 2020-06-04 15:43:47 +01:00
Pea Tyczynska
6422a88c8c Stub SES email client to avoid hitting SES during load testing
If we set an environment variable, we can stub out calls to SES and send
them to our own stub app. If the environment variable is not set, things
work as normal.

To be used alongside
https://github.com/alphagov/notifications-email-provider-stub
2020-06-03 11:11:43 +01:00
Pea M. Tyczynska
db040c40b9 Merge pull request #2860 from alphagov/put-status-codes-in-logs
Put status codes in logs
2020-06-02 16:06:05 +01:00
Pea Tyczynska
b81c7dd5ee Put status codes in logs to see if lack of detailed status is
us not recognising a code or provider not having sent the detailed
status.

It seems like Firetext is sometimes sending us permanent-failure
without detailed status. It could be due to:
- them really not sending any detailed status
- them sending a status code we don't recognise
- them sending 000 code that means 'no errors', which we ignore

To see which one it is, and to debug such issues quicker in the
future, this PR adds status and detailed status codes to the logs.
2020-06-02 13:26:41 +01:00
Katie Smith
66dc6b046f Merge pull request #2861 from alphagov/provider-split
Moving SMS supplier resting point to 50/50
2020-06-02 12:41:26 +01:00
Pete Herlihy
8a67e14e2b Moving SMS supplier resting point to 50/50 2020-06-02 12:27:14 +01:00
Pea M. Tyczynska
9f816ad5f5 Merge pull request #2856 from alphagov/mmg_detailed_response
Capture detailed delivery receipt status from MMG
2020-06-01 15:23:53 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Pea Tyczynska
a4b942cf6c Log detailed sms delivery status for mmg from process_sms_client_response task.
Also log detailed delivery status for firetext in the same place in addition
to it being logged from notifications_dao.

Logging detailed delivery statuses will help us see why messages
fail to deliver. In the future we could persist detailed delivery
status in the database.
2020-06-01 12:44:49 +01:00
David McDonald
82cb1593b8 Merge pull request #2859 from alphagov/bump-utils
Bump utils to 39.4.2 to fix whitespace strip bug
2020-06-01 11:58:02 +01:00
David McDonald
a497f1b945 Bump utils to 39.4.2 to fix whitespace strip bug
https://github.com/alphagov/notifications-utils/pull/742

We had been seeing phone numbers making it through in jobs containing
this no-break space which were then causing errors as we could not
process the job as it thought it was an invalid number. This should
solve the problem and strip that special character.
2020-06-01 11:12:05 +01:00
Leo Hemsted
5bd400da7d Merge pull request #2858 from alphagov/more-timings
add raw request timings to provider send functions
2020-05-29 15:27:54 +01:00
Leo Hemsted
f2f2509c9b add raw request timings to provider send functions
we're using statsd to monitor how long provider requests are taking.
However, there's lots of busy work that happens inside our statsd
metrics timing window. Things like json dumping and loading, building
headers, exception handling, etc.

for firetext/mmg, the response object from requests has an elapsed
property [1], which captures from sending raw data to parsing the
response headers. for ses, it's a bit trickier, but boto3 exposes a few
event hooks [2]. it's hard to find them without stepping through the
code, but the interesting ones are before-call, after-call,
after-call-error, request-created, and response-received. The
before-call and after-call involve some marshalling, built-in retrying,
etc, while request-created and response-received are much lower level.
They might be called more than once per ses request, if boto3 itself
retries the request on 5xx, 429 and low level socket errors [3].

Add these as new `raw-request-time` metrics rather than overwriting to
avoid changing the meaning of an existing metric, and to let us compare
the metrics to see if there's a noticeable difference at all

[1] https://requests.readthedocs.io/en/master/api/#requests.Response.elapsed
[2] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/events.html
[3] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#legacy-retry-mode
2020-05-29 14:04:46 +01:00
Leo Hemsted
5462087f21 Merge pull request #2840 from alphagov/api-failwhale
add api-paas-failwhale
2020-05-28 13:49:42 +01:00
Chris Hill-Scott
b404e2ca67 Merge pull request #2852 from alphagov/bump-utils-letter-timings
Bump utils to 39.4.0
2020-05-26 11:30:29 +01:00
Chris Hill-Scott
cac24bfc5e Bump utils to 39.4.1 2020-05-26 11:20:15 +01:00
Chris Hill-Scott
14d5db58f4 Bump utils to 39.4.0
Adds delivery estimates for letters posted to Europe or the rest of the
world.
2020-05-26 11:20:14 +01:00
Katie Smith
29caa362a3 Merge pull request #2851 from alphagov/archive-service
Allow service to be archived if service with the same name has already been archived
2020-05-22 11:12:32 +01:00
Katie Smith
b1d9fd3ee9 Merge pull request #2853 from alphagov/dependency-upgrades
Dependency upgrades
2020-05-22 11:12:23 +01:00
Katie Smith
f22483a1ab Delete unused Marshmallow schemas 2020-05-22 10:49:59 +01:00
Katie Smith
e770108ec4 Update sqlalchemy from 1.3.13 to 1.3.17 2020-05-22 10:43:29 +01:00
Katie Smith
20e97bf78a Update psycopg2-binary from 2.8.4 to 2.8.5 2020-05-22 10:41:16 +01:00
Katie Smith
ba38a2d5dd Update marshmallow from 2.20.5 to 2.21.0 2020-05-22 10:38:27 +01:00
Katie Smith
8c72dabbe8 Update marshmallow-sqlalchemy from 0.22.3 to 0.23.0 2020-05-22 10:36:21 +01:00
Katie Smith
bd0ba67889 Update eventlet from 0.25.1 to 0.25.2 2020-05-22 10:34:13 +01:00
Katie Smith
eae99f7504 Update flask from 1.1.1 to 1.1.2 2020-05-22 10:31:15 +01:00
Katie Smith
d06c7adb82 Update flask-migrate from 2.5.2 to 2.5.3 2020-05-22 10:29:16 +01:00
Katie Smith
64cd8f39c2 Add the date to the service name and email_reply_to when archiving
This copies what we do to a user's email address when archiving the user
by prefixing it with `_archived_{date}`. We already prefixed the
service name and email_reply_to with `_archived`, but this didn't allow
a service with the same name to be archived more than once.
2020-05-22 09:37:45 +01:00
Katie Smith
13f7fecd5b Move function to get archived email address value
This function will be used when archiving services too, so it has been
renamed and moved to `app/utils.py`.
2020-05-22 09:36:07 +01:00
Katie Smith
6e18250fe4 Merge pull request #2849 from alphagov/revert-postage-constraint-2
Revert the new postage constraints
2020-05-20 18:45:37 +01:00
Katie Smith
0b28766442 Reverts the new postage constraints
Reverts https://github.com/alphagov/notifications-api/pull/2843 and https://github.com/alphagov/notifications-api/pull/2848
2020-05-20 18:31:25 +01:00
Katie Smith
5f46d7ba3e Merge pull request #2848 from alphagov/add-commit-migration
Adding a commit at the end of each migration file.
2020-05-20 16:59:26 +01:00
Rebecca Law
fa7d2c2c3f Adding a commit at the end of each migration file.
The assumption that a commit is issued in between migration files was false. This will force the db connection to commit, the alter table commands will complete and the alembic version updated.
2020-05-20 16:50:46 +01:00
Katie Smith
4116affe7f Merge pull request #2843 from alphagov/update-postage-constraint-take-2
Update postage constraint (take 2)
2020-05-20 14:41:44 +01:00
Chris Hill-Scott
95a779c649 Merge pull request #2841 from alphagov/jobs-by-contact-list
Allow jobs to be grouped by contact list
2020-05-20 11:49:25 +01:00
Katie Smith
6d89b01f1e Update JSON schema postage validation for new values 2020-05-19 16:04:36 +01:00