Commit Graph

4075 Commits

Author SHA1 Message Date
Chris Hill-Scott
b3c69087d8 Serialise less stuff from the service object
By default Marshallow includes unknown properties. This means every time
a new property is added to the service model it gets included in the
JSON-serialised response sent to the admin app.

This is particuarly bad because it means that for returned letters the
ID of every returned letter. So the JSON stored in Redis for the
Check Your State Pension service is 86kb.

Similarly the JSON stored in Redis for a big user of inbound text
messaging is 458kb(!!!) because it has the ID of every received text
message. That’s ~8,500 UUIDs.

Luckily the admin app tells us exactly which keys it’s using here:
5952d9c26d/app/models/service.py (L31-L52)

```python
- `active`
- `contact_link`
- `email_branding`
- `email_from`
- `id`
- `inbound_api`
- `letter_branding`
- `letter_contact_block`
- `message_limit`
- `name`
- `prefix_sms`
- `research_mode`
- `service_callback_api`
- `volume_email`
- `volume_sms`
- `volume_letter`
- `consent_to_research`
- `count_as_live`
- `go_live_user`
- `go_live_at`
}
```

Plus these which it does not get automatically:
- `email_branding`
- `letter_branding`
- `organisation`
- `organisation_type`
- `permissions`
- `restricted`

The API is returning all of these:
- `active`
- `all_template_folders`
- `annual_billing`
- `consent_to_research`
- `contact_link`
- `contact_list`
- `count_as_live`
- `created_by`
- `crown`
- `email_branding`
- `email_from`
- `go_live_at`
- `go_live_user`
- `id`
- `inbound_api`
- `inbound_number`
- `inbound_sms`
- `letter_branding`
- `letter_contact_block`
- `letter_logo_filename`
- `message_limit`
- `name`
- `organisation`
- `organisation_type`
- `permissions`
- `prefix_sms`
- `rate_limit`
- `research_mode`
- `restricted`
- `returned_letters`
- `service_callback_api`
- `users`
- `version`
- `volume_email`
- `volume_letter`
- `volume_sms`
- `whitelist`

So the ones that the admin is getting but not expecting are:
- `all_template_folders`
- `annual_billing`
- `contact_list`
- `created_by`
- `crown`
- `inbound_number`
- `inbound_sms`
- `letter_logo_filename`
- `rate_limit`
- `returned_letters`
- `users`
- `version`
- `whitelist`

Which is what this PR adds to the exclude list, except for `created_by`
which is keeps because it’s needed to validate the JSON provided when
creating a service.
2020-06-16 16:47:56 +01:00
Leo Hemsted
58ab99d74b add more prometheus metrics
Two new metrics:

auth_db_connection_duration_seconds (histogram)
  wraps the first DB call of post notifications. This includes waiting
  to get a connection from the pool, and also making the actual request
  to the db to retrieve the service and api keys. (i'm not sure there's
  an easy way to separate these two things)

post_notification_json_parse_duration_seconds
  wraps parsing the v2 post notifications json parsing and schema
  validation. Shouldn't include any async code
2020-06-15 16:26:56 +01:00
David McDonald
8b4a424df1 Tidy up 2020-06-12 16:51:44 +01:00
Leo Hemsted
15ce9fe3f9 add metrics for redis timings 2020-06-12 14:52:22 +01:00
Leo Hemsted
cd9b80f415 set test_errors app fixture to session scope
we have one global metrics variable `metrics = GDSMetrics()`, and we
then call `metrics.init_app` from within the flask application set up.
The v2/test_errors.py app_for_test fixture calls create_app, would call
metrics.init_app multiple times for the same metrics instance. This
causes errors, so change the fixture to session level so it only calls
once per test run.
2020-06-12 14:52:22 +01:00
Leo Hemsted
c4dc0f64c5 dd sqlalchemy connection metrics for celery tasks
grab the worker app name and task name rather than the web host and
endpoint. also add a fallback for if we're not in a web request or a
celery task. I think that'll probably happen when we use alembic, or if
we do things from within flask shell
2020-06-12 14:52:22 +01:00
Leo Hemsted
6e32ca5996 add prometheus metrics for connection pools (both web and sql)
add the following prometheus metrics to keep track of general app
instance health.

 # sqs_apply_async_duration

how long does the actual SQS call (a standard web request to AWS) take.
a histogram with default bucket sizes, split up per task that was
created.

 # concurrent_web_request_count

how many web requests is this app currently serving. this is split up
per process, so we'd expect multiple responses per app instance

 # db_connection_total_connected

how many connections does this app (process) have open to the database.
They might be idle.

 # db_connection_total_checked_out

how many connections does this app (process) have open that are
currently in use by a web worker

 # db_connection_open_duration_seconds

a histogram per endpoint of how long the db connection was taken from
the pool for. won't have any data if a connection was never opened.
2020-06-12 14:52:22 +01:00
Leo Hemsted
bd433ad24f bump utils to fix statsd timing bug
statsd timing should always be in seconds, not milliseconds
2020-06-11 15:01:15 +01:00
Pea Tyczynska
bb3672100f Turn off email stub on staging
To check if it was our congestion point for the soak test.
(Email stub is a bit slower to respond than Amazon SES, and
the difference could lead to us having to many connections to db open
as we wait for response from the stub)
2020-06-09 11:21:48 +01:00
Pea Tyczynska
2aa6470817 Point API for staging at email and sms stubs for the soak tests.
This is done to avoid sending real email and sms and incurring
unnecessary charges while we run the soak tests.
2020-06-08 17:14:16 +01:00
David McDonald
7bc02ac26e Add better logging to understand callback delivery cases
In particular, we want to know how often callbacks arrive before the
notification being persisted
2020-06-04 16:00:43 +01:00
Pea Tyczynska
8b7a0b88cb Ensure that aws ses stub client is not run in production 2020-06-04 15:43:47 +01:00
Pea Tyczynska
6422a88c8c Stub SES email client to avoid hitting SES during load testing
If we set an environment variable, we can stub out calls to SES and send
them to our own stub app. If the environment variable is not set, things
work as normal.

To be used alongside
https://github.com/alphagov/notifications-email-provider-stub
2020-06-03 11:11:43 +01:00
Pea M. Tyczynska
db040c40b9 Merge pull request #2860 from alphagov/put-status-codes-in-logs
Put status codes in logs
2020-06-02 16:06:05 +01:00
Pea Tyczynska
b81c7dd5ee Put status codes in logs to see if lack of detailed status is
us not recognising a code or provider not having sent the detailed
status.

It seems like Firetext is sometimes sending us permanent-failure
without detailed status. It could be due to:
- them really not sending any detailed status
- them sending a status code we don't recognise
- them sending 000 code that means 'no errors', which we ignore

To see which one it is, and to debug such issues quicker in the
future, this PR adds status and detailed status codes to the logs.
2020-06-02 13:26:41 +01:00
Katie Smith
66dc6b046f Merge pull request #2861 from alphagov/provider-split
Moving SMS supplier resting point to 50/50
2020-06-02 12:41:26 +01:00
Pete Herlihy
8a67e14e2b Moving SMS supplier resting point to 50/50 2020-06-02 12:27:14 +01:00
Pea M. Tyczynska
9f816ad5f5 Merge pull request #2856 from alphagov/mmg_detailed_response
Capture detailed delivery receipt status from MMG
2020-06-01 15:23:53 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Pea Tyczynska
a4b942cf6c Log detailed sms delivery status for mmg from process_sms_client_response task.
Also log detailed delivery status for firetext in the same place in addition
to it being logged from notifications_dao.

Logging detailed delivery statuses will help us see why messages
fail to deliver. In the future we could persist detailed delivery
status in the database.
2020-06-01 12:44:49 +01:00
Leo Hemsted
f2f2509c9b add raw request timings to provider send functions
we're using statsd to monitor how long provider requests are taking.
However, there's lots of busy work that happens inside our statsd
metrics timing window. Things like json dumping and loading, building
headers, exception handling, etc.

for firetext/mmg, the response object from requests has an elapsed
property [1], which captures from sending raw data to parsing the
response headers. for ses, it's a bit trickier, but boto3 exposes a few
event hooks [2]. it's hard to find them without stepping through the
code, but the interesting ones are before-call, after-call,
after-call-error, request-created, and response-received. The
before-call and after-call involve some marshalling, built-in retrying,
etc, while request-created and response-received are much lower level.
They might be called more than once per ses request, if boto3 itself
retries the request on 5xx, 429 and low level socket errors [3].

Add these as new `raw-request-time` metrics rather than overwriting to
avoid changing the meaning of an existing metric, and to let us compare
the metrics to see if there's a noticeable difference at all

[1] https://requests.readthedocs.io/en/master/api/#requests.Response.elapsed
[2] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/events.html
[3] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#legacy-retry-mode
2020-05-29 14:04:46 +01:00
Katie Smith
29caa362a3 Merge pull request #2851 from alphagov/archive-service
Allow service to be archived if service with the same name has already been archived
2020-05-22 11:12:32 +01:00
Katie Smith
f22483a1ab Delete unused Marshmallow schemas 2020-05-22 10:49:59 +01:00
Katie Smith
64cd8f39c2 Add the date to the service name and email_reply_to when archiving
This copies what we do to a user's email address when archiving the user
by prefixing it with `_archived_{date}`. We already prefixed the
service name and email_reply_to with `_archived`, but this didn't allow
a service with the same name to be archived more than once.
2020-05-22 09:37:45 +01:00
Katie Smith
13f7fecd5b Move function to get archived email address value
This function will be used when archiving services too, so it has been
renamed and moved to `app/utils.py`.
2020-05-22 09:36:07 +01:00
Katie Smith
0b28766442 Reverts the new postage constraints
Reverts https://github.com/alphagov/notifications-api/pull/2843 and https://github.com/alphagov/notifications-api/pull/2848
2020-05-20 18:31:25 +01:00
Katie Smith
4116affe7f Merge pull request #2843 from alphagov/update-postage-constraint-take-2
Update postage constraint (take 2)
2020-05-20 14:41:44 +01:00
Chris Hill-Scott
95a779c649 Merge pull request #2841 from alphagov/jobs-by-contact-list
Allow jobs to be grouped by contact list
2020-05-20 11:49:25 +01:00
Katie Smith
6d89b01f1e Update JSON schema postage validation for new values 2020-05-19 16:04:36 +01:00
Katie Smith
7fd52017d0 Update postage db constraints for international letters
The `notifications`, `notification_history`, `templates` and `templates_history`
tables all had a check constraint on the postage column which specified
that the postage had to be `first` or `second` if the notification or
template was a letter. We now have two more options for postage -
`europe` and `rest-of-world`.

It's not possible to alter a check constraint, so the constraints have
to be dropped then recreated. We are not recreating the constraint on
the `notification_history` table since values here are always copied
from the `notifications` table.

The constraints get added as `NOT VALID` at first - this stage will lock
the tables, so updating the `notification` table and `templates` and
`templates_history` are done in separate migrations so that we don't lock
all tables at the same time. In a third migration we then run
`VALIDATE CONSTRAINT` for all tables - this will lock a row at a time,
not the whole table.
2020-05-19 16:04:36 +01:00
Chris Hill-Scott
c7f914122a Merge pull request #2839 from alphagov/group-letter-uploads
Group letter uploads by printing day
2020-05-19 11:05:14 +01:00
David McDonald
2f0b3a9636 Fix edge case in func test data purging for created_by_id
When running the purge command I found about 4 users who could not be
deleted because their user id was still referenced in the services table
as they had created the service yet they were not a member of that
service anymore.

I have fixed this by checking that if they are not a member but created
the service then we also delete the service for them.

Note, I've followed the previous convention of no tests for this
function. I've run it locally and executed the code path so there should
be no major flaws in the code. There is a small chance I wasn't able to
exactly replicate the state that existed in preview on my local but
hopefully it was close enough to be accurate.
2020-05-18 10:30:28 +01:00
David McDonald
df5ccae4c5 Add in positive logging case for purge command
This is useful so we can see that it's doing things, which case is being
hit and know that an empty terminal for an hour isn't a bad thing
2020-05-15 17:34:30 +01:00
David McDonald
dbb2dfa502 Merge pull request #2836 from alixedi/add-csv-support
Add support for CSV files
2020-05-15 12:16:16 +01:00
Rebecca Law
aecf17fef1 This morning we raise a ParseError for a bad date format for the DateReceived attribute on the /notifications/receive/mmg request.
This PR tries to parse the date, if that throws an error return now as the datereceived. This will at least allow the message to be persisted. Typically the DateReceived, provider_date, and the created_at date in the inbound_sms table are within a second of each other.
2020-05-13 10:37:41 +01:00
Chris Hill-Scott
cce153eee8 Return 400 for bad date argument
It’s more of a bad request, because the input is bad, rather than
something on our side being not found.
2020-05-13 08:56:54 +01:00
Chris Hill-Scott
c61f7e70c2 Add comments to explain time intervals 2020-05-13 08:53:40 +01:00
Chris Hill-Scott
c8cd3c2b70 Return template name in job response
If you’ve sent a bunch of jobs from the same contact list then a handy
way to differentiate between them will be date sent, but also template
name (in effect the message you sent).

This commit extends the job response to include template name, using the
same pattern as for template type.
2020-05-12 13:10:39 +01:00
Chris Hill-Scott
3ed1700231 Count how many times a contact list has been used
Because we’ll be grouping jobs under their parent contact lists it will
be useful to have a way of showing how many times a contact list has
been used. This will give the right information scent to indicate that
clicking into a contact list is where you go to see its jobs. This means
that the API needs to return a count of jobs for each contact list.

Putting this code feels very non-idiomatic for our API. So suggestions
about how to better architect it welcome…
2020-05-12 13:00:54 +01:00
Chris Hill-Scott
18ffccf8c9 Allow jobs to be filtered by contact list
Rather than showing all jobs that have been ‘copied’ from a contact list
I think it makes more sense to group them under the contact list. This
way it’s easier to see what messages have been sent to a given group of
contacts over time.

Part of this work means the API needs to return only jobs that have been
created from a given contact list, when asked.
2020-05-12 12:58:39 +01:00
Chris Hill-Scott
27a0ba1a65 Reformat arguments for readability
We want to add another argument here, and doing so would make the line
length too long with all the arguments on one line.

Also uses the * operator to enforce keyword-only arguments.
2020-05-12 12:57:54 +01:00
Pea Tyczynska
5d6f2da155 Rename task from create_letters_pdf to get_pdf_for_templated_letter
In a separate PR we will have to delete vestigial create_letters_pdf
tasks that now only redirects to get_pdf_for_templated_letter.
2020-05-11 13:33:05 +01:00
Pea Tyczynska
879d15b736 Test logging and error message 2020-05-11 13:33:04 +01:00
Pea Tyczynska
3a00c19390 Polish and test the small task that updates billable units for letter 2020-05-11 13:32:09 +01:00
Pea Tyczynska
24a89c1c19 Modify tasks for getting letter pdf and updating billable units
So that they talk with new template preview task for pdf creation
2020-05-11 13:30:59 +01:00
Chris Hill-Scott
864c6772b3 And an endpoint to get uploaded letters for a day
Because we won’t be showing uploaded letters individually on the uploads
page any more we need a way of listing them. This should be by printing
day, to match how we’re grouping them on the uploads page.

The response matches a normal `get_notifications` response so we can
reuse the same code in the admin app.
2020-05-11 10:51:40 +01:00
Chris Hill-Scott
421c1aac96 Group uploaded letters by day of printing
Some teams have started uploading quite a lot of letters (in the
hundreds per week). They’re also uploading CSVs of emails. This means
the uploads page ends up quite jumbled.

This is because:
- there’s just a lot of items to scan through
- conceptually it’s a bit odd to have batches of things displayed
  alongside individual things on the same page

So instead this commit starts grouping together uploaded letters. It
does this by the date on which we ‘start’ printing them, or in other
words the time at which they can no longer be cancelled.

This feels like a natural grouping, and it matches what we know about
people’s mental models of ‘batches’ and ‘runs’ when talking about
printing.

The code for this is a bit gnarly because:
- timezones
- the print cutoff doesn’t align with the end of a day
- we have to do this in SQL because it wouldn’t be efficient to query
  thousands of letters and then do the timezone calculations on them in
  Python
2020-05-11 10:51:33 +01:00
Ali Zaidi
642ab1ad1e Add support for CSV files 2020-05-06 14:11:50 +01:00
Chris Hill-Scott
80fc5e6600 Paginate search results for notifications
The standard way that we indicate that there are more results than can
be returned is by paginating. So even though we don’t intend to paginate
the search results in the admin app, it can still use the presence or
absence of a ‘next’ link to determine whether or not to show a message
about only showing the first 50 results.
2020-05-06 12:13:00 +01:00
Chris Hill-Scott
625aad97c9 Limit search by recipient to 50 results
Things could get ugly if you use a short search string on a service with
lots of notifications…
2020-05-06 11:44:35 +01:00