Commit Graph

1201 Commits

Author SHA1 Message Date
David McDonald
12f460adc5 Turn off statsd wrapper for synchronous statsd calls during POSTs
This commit turns off StatsD metrics for the following
- the `dao_create_notification` function
- the `user-agent` metric
- the response times and response codes per flask endpoint

This has been done with the purpose of not having the creation of text
messages or emails call out to StatsD during the request process. These
are the three current metrics that are currently called during the
processing of one of those requests and so have been removed from the
API.

The reason for removing the calls out to StatsD when processing a
request to send a notification is that we have seen two incidents
recently affected by DNS resolution for StatsD (either by a slow down in
resolution time or a failure to resolve). These POST requests are our
most critical code path and we don't want them to be affected by any
potential unforeseen trouble with StatsD DNS resolution.

We are not going to miss the removal of these metrics.
- the `dao_create_notification` metric is rarely/never looked at
- the `user-agent` metric is rarely/never looked at and can be got from
  our logs if we want it
- the response times and response codes per flask endpoint are already
  exposed using the gds metrics python library

I did not remove the statsd metrics from any other parts of the API
because
- As the POST notification endpoints are the main source of web traffic,
  we should have already removed most calls to StatsD which should
  greatly reduce the chance of their being any further issues with
  DNS resolution
- Some of the other metrics still provide value so no point deleting
  them if we don't need to
- The metrics on celery tasks will not affect any HTTP requests from
  users as they are async and also we do not currently have the
  infrastructure set up to replace them with a prometheus HTTP endpoint that
  can be scraped so this would require more work
2020-06-29 12:40:22 +01:00
Rebecca Law
ce32e577b7 Remove the use of schedule_for in post_notifications.
Years ago we started to implement a way to schedule a notification. We hit a problem but we never came up with a good solution and the feature never made it back to the top of the priority list.

This PR removes the code for scheduled_for. There will be another PR to drop the scheduled_notifications table and remove the schedule_notifications service permission

Unfortunately, I don't think we can remove the `scheduled_for` attribute from the notification.serialized method because out clients might fail if something is missing. For now I have left it in but defaulted the value to None.
2020-06-24 14:54:40 +01:00
Pea Tyczynska
c96142ba5e Change function and variable names for readability and consistency 2020-06-01 12:44:49 +01:00
Katie Smith
64cd8f39c2 Add the date to the service name and email_reply_to when archiving
This copies what we do to a user's email address when archiving the user
by prefixing it with `_archived_{date}`. We already prefixed the
service name and email_reply_to with `_archived`, but this didn't allow
a service with the same name to be archived more than once.
2020-05-22 09:37:45 +01:00
Katie Smith
13f7fecd5b Move function to get archived email address value
This function will be used when archiving services too, so it has been
renamed and moved to `app/utils.py`.
2020-05-22 09:36:07 +01:00
Chris Hill-Scott
95a779c649 Merge pull request #2841 from alphagov/jobs-by-contact-list
Allow jobs to be grouped by contact list
2020-05-20 11:49:25 +01:00
Chris Hill-Scott
c7f914122a Merge pull request #2839 from alphagov/group-letter-uploads
Group letter uploads by printing day
2020-05-19 11:05:14 +01:00
David McDonald
2f0b3a9636 Fix edge case in func test data purging for created_by_id
When running the purge command I found about 4 users who could not be
deleted because their user id was still referenced in the services table
as they had created the service yet they were not a member of that
service anymore.

I have fixed this by checking that if they are not a member but created
the service then we also delete the service for them.

Note, I've followed the previous convention of no tests for this
function. I've run it locally and executed the code path so there should
be no major flaws in the code. There is a small chance I wasn't able to
exactly replicate the state that existed in preview on my local but
hopefully it was close enough to be accurate.
2020-05-18 10:30:28 +01:00
Chris Hill-Scott
c61f7e70c2 Add comments to explain time intervals 2020-05-13 08:53:40 +01:00
Chris Hill-Scott
18ffccf8c9 Allow jobs to be filtered by contact list
Rather than showing all jobs that have been ‘copied’ from a contact list
I think it makes more sense to group them under the contact list. This
way it’s easier to see what messages have been sent to a given group of
contacts over time.

Part of this work means the API needs to return only jobs that have been
created from a given contact list, when asked.
2020-05-12 12:58:39 +01:00
Chris Hill-Scott
27a0ba1a65 Reformat arguments for readability
We want to add another argument here, and doing so would make the line
length too long with all the arguments on one line.

Also uses the * operator to enforce keyword-only arguments.
2020-05-12 12:57:54 +01:00
Chris Hill-Scott
864c6772b3 And an endpoint to get uploaded letters for a day
Because we won’t be showing uploaded letters individually on the uploads
page any more we need a way of listing them. This should be by printing
day, to match how we’re grouping them on the uploads page.

The response matches a normal `get_notifications` response so we can
reuse the same code in the admin app.
2020-05-11 10:51:40 +01:00
Chris Hill-Scott
421c1aac96 Group uploaded letters by day of printing
Some teams have started uploading quite a lot of letters (in the
hundreds per week). They’re also uploading CSVs of emails. This means
the uploads page ends up quite jumbled.

This is because:
- there’s just a lot of items to scan through
- conceptually it’s a bit odd to have batches of things displayed
  alongside individual things on the same page

So instead this commit starts grouping together uploaded letters. It
does this by the date on which we ‘start’ printing them, or in other
words the time at which they can no longer be cancelled.

This feels like a natural grouping, and it matches what we know about
people’s mental models of ‘batches’ and ‘runs’ when talking about
printing.

The code for this is a bit gnarly because:
- timezones
- the print cutoff doesn’t align with the end of a day
- we have to do this in SQL because it wouldn’t be efficient to query
  thousands of letters and then do the timezone calculations on them in
  Python
2020-05-11 10:51:33 +01:00
Chris Hill-Scott
80fc5e6600 Paginate search results for notifications
The standard way that we indicate that there are more results than can
be returned is by paginating. So even though we don’t intend to paginate
the search results in the admin app, it can still use the presence or
absence of a ‘next’ link to determine whether or not to show a message
about only showing the first 50 results.
2020-05-06 12:13:00 +01:00
Chris Hill-Scott
625aad97c9 Limit search by recipient to 50 results
Things could get ugly if you use a short search string on a service with
lots of notifications…
2020-05-06 11:44:35 +01:00
Chris Hill-Scott
9f6bfb1b4e Merge pull request #2824 from alphagov/stop-searching-to-field
Stop searching the to field
2020-05-05 16:20:17 +01:00
Katie Smith
5b8b0dfc6e Include pending notifications in monthly data by service report
The '/service/monthly-data-by-service` endpoint which is used for the
'Monthly notification statuses for live services' Platform Admin report
did not including `pending` notifications. This updates the DAO function
that the endpoint calls to group `pending` and `sending` notifications together.
2020-04-30 13:53:01 +01:00
Chris Hill-Scott
1e20cb6be9 Stop searching the to field
We were doing this temporarily while the `normalised_to` field was not
populated for letters. Once we have a week of data in the
`normalised_to` field we can stop looking in the `to` field. This should
improve performance because:
- it’s one `WHERE` clause not two
- we had to do `ilike` on the `to` field, because we don’t lowercase its
  contents – even if the two where clauses are somehow paralleized it’s
  is slower than a `like` comparison, which is case-sensitive

Depends on:
- [ ] Tuesday 5 May (7 full days after deploying https://github.com/alphagov/notifications-api/pull/2814)
2020-04-30 10:51:30 +01:00
Chris Hill-Scott
26793899d4 Merge pull request #2814 from alphagov/search-letters
Let users search for letters
2020-04-27 15:06:32 +01:00
Chris Hill-Scott
64674be817 Make allowed notification types explicit in search
By not having a catch-all else, it makes it clearer what we’re
expecting. And then we think it’s worth adding a comment explaining why
we normalise as we do for letters and the `None` case.
2020-04-23 16:06:34 +01:00
Chris Hill-Scott
7897ae70ce Let users search for letters
Like we have search by email address or phone number, finding an
individual letter is a common task. At the moment users are having to
click through pages and pages of letters to find the one they’re looking
for.

We have to search in the `to` and `normalised_to` fields for now because
we’re not populating the `normalised_to` column for letters at the
moment.
2020-04-21 16:25:37 +01:00
Pea Tyczynska
9782cbc5b5 Check failure code on failure only.
We should check error code on failure only, because if we get an
error code on pending code, we should still set the notification
to pending status.
2020-04-21 13:43:45 +01:00
Pea Tyczynska
385d249787 Take into consideration Firetext 000 code - no error. Also change logs from error to warning 2020-04-21 12:06:28 +01:00
Pea Tyczynska
b962dcac5d Check Firetext code first to determine status and log the reason
If there is no code, fall back to old way of determining status.
If the code is not recognised, log an Error so we are notified
and can look into it.
2020-04-20 18:33:19 +01:00
Pea Tyczynska
470d00c26e Make comment clearer following review 2020-04-17 10:58:25 +01:00
Pea Tyczynska
a4507dbb3f Use firetext response code to see if temporary or permanent failure if available 2020-04-17 10:58:25 +01:00
Katie Smith
bfd40a843c Delete send-scheduled-notifications task
This was added a few years ago, but never used.
2020-04-02 14:45:18 +01:00
Rebecca Law
ac8fc49539 Merge pull request #2770 from alphagov/increase-qry-limit
Increase default query limit to 50,000 from 10,000
2020-03-31 17:15:12 +01:00
Rebecca Law
a7a158fe69 Merge pull request #2784 from alphagov/add-team-key-to-query
Add team key to query
2020-03-31 17:13:42 +01:00
David McDonald
2a528f03b8 Merge pull request #2781 from alphagov/fix-slow-org-user-query
fix the sql
2020-03-31 09:32:10 +01:00
Rebecca Law
1444150d48 Add team key to the notifications to delete/insert into notifications/notification_history 2020-03-31 09:23:41 +01:00
Leo Hemsted
53b80b931f fix the sql
"if you want something done right, then do it yourself"

The ORM was building a weird and inefficient query to get all the users
for a given organisation_id:

old query:

```sql
SELECT users.*
FROM users
WHERE (
    EXISTS (
        SELECT 1
        FROM user_to_organisation, organisation
        WHERE users.id = user_to_organisation.user_id AND
        organisation.id = user_to_organisation.organisation_id AND
        organisation.id = '63b9557c-22ea-42ac-bcba-edaa50e3ae51'
    )
) AND
users.state = 'active'
ORDER BY users.created_at
```

Note the cross join on users_to_org and org, there are a lot of users in
organisations on preview, as one is added every functional test run.
Even though they're all filtered out and the query returns the correct
data, it still does the nested loop under the hood.

new query:

```sql
SELECT users.*
FROM users
JOIN user_to_organisation AS user_to_organisation_1 ON users.id = user_to_organisation_1.user_id
JOIN organisation ON organisation.id = user_to_organisation_1.organisation_id
WHERE organisation.id = '63b9557c-22ea-42ac-bcba-edaa50e3ae51' AND users.state = 'active' ORDER BY users.created_at;
```

Much more straightforward.
2020-03-30 17:47:15 +01:00
Leo Hemsted
885f3122bd fix purge functional test data task
* it doesn't delete service email reply to or letter contacts, or
  contact lists. I don't think the contact lists will ever be an issue
  but it doesn't hurt to add it to the list of things to remove.
* it doesn't remove users from organisations before deleting the users

there may be more tables that link to Service that should be deleted,
but for now just add these ones that I could spot.
2020-03-30 17:45:29 +01:00
Rebecca Law
c81388d004 Merge pull request #2774 from alphagov/fix-report-and-delete-tasks
Fix report and delete tasks
2020-03-27 16:30:16 +00:00
Rebecca Law
5d0830d7f7 Rename function for clarity 2020-03-27 15:48:54 +00:00
Rebecca Law
8da73510c2 Delete all notification_status types.
This will ensure the whole day has been deleted. The stats table could get the wrong updates if there is partial data for a day.
2020-03-27 12:14:30 +00:00
Chris Hill-Scott
5fe0fafadf Archive, don’t delete contact lists
So we keep a record of who first uploaded a list it’s better to archive
a list than completely delete it.

The list in the database doesn’t contain any recipient info so this
isn’t a change to what data we’re retaining.

This means updating the endpoints that get contact lists to exclude ones
that are archived.
2020-03-27 09:51:54 +00:00
Chris Hill-Scott
b50dbab8fd Add an endpoint to delete a contact list
This was one of things we de-scoped when we first shipped this feature.

In order to safely delete a list, we first need to make sure any jobs
aren’t referencing it.
2020-03-26 17:42:38 +00:00
Rebecca Law
e5b6fa2143 Reduce log messages. 2020-03-25 08:08:33 +00:00
Rebecca Law
31f83bb9c4 Increase default query limit to 50,000 from 10,000 2020-03-24 17:04:38 +00:00
Rebecca Law
69102db4cb Remove the hourly loop and replace it with a while loop, which check if we still have things to delete for the day.
We are already limiting the query by a query limit.
2020-03-24 14:54:06 +00:00
Rebecca Law
3ca73a35ed Change as per code review
- fix test name
 - make query filter consistent with each other
 - add comment for clarity
 - add inner loop to continue to insert and delete notifications while the delete count is greater than 0
2020-03-24 14:17:47 +00:00
Rebecca Law
f04210ba7d Refactor delete_notifications_older_than_retention_by_type to use a new strategy.
The insert_notification_history_delete_notifications function uses a temp table to store the data to insert and delete. This will save extra queries while performing the insert and delete operations.
The function is written in such a way that if the task is stop while processing when it's started up again it will just pick up where it left off.

I've made a decision to delete all test data in one query, I don't anticipate a problem with that.

The performance of this might also be better than last nights test because we are inserting everything we need for the NotificationHistory insert, so we don't need the join to Notifications to perform the insert.
2020-03-24 12:21:28 +00:00
Rebecca Law
80845e1abb Transaction to insert history and delete notifications.
Need to test the performance of this function, then we can call it from the task.
- Create a temporary table to insert ids of the desired rows, limit by 10K (might be too low).
- Insert into NotifcationHistory select from notification where id in temp table
- Delete from Notifications where id in temp table
- drop temp table

We should be able to iterate of this. The query stats for the query to create the temp table are very good, 17ms.
2020-03-23 16:33:06 +00:00
Leo Hemsted
8ea09be1ff order the insert/update by created_at
you should always order by when doing a limit/offset, to guarantee that
the second time you run that query, the order hasn't changed and you
aren't just repeating the task with an overlap of notifications.
Luckily, in this case we haven't lost any data, because:

* we have an on conflict do update statement so when we returned
  duplicate rows it would just do an update
* when we delete, we cross-reference with the notification history so if
  a row always got missed, we won't delete it.

This resulted in, for example, govuk email still having a handful of
notifications in the table from 9th despite the task running succesfully
every day until the 18th of march.

order by created_at ascending so that we start with oldest notifications
first, in case it's interrupted part way through.
2020-03-20 19:11:24 +00:00
Leo Hemsted
dc5b56ff78 Change sql to chunk by hour to remove old notifications
insert/update, and then delete notifications in hourly batches. This
means that if the task gets interrupted part-way through, we'll have at
least something to show for it. Previously we would insert and update
into the history table but might not delete from the notification table
properly.

Keeping the offsets and limits for confidence around reliability and
queries timing out.

Keeping the join to notification_history to ensure we don't delete
anything prematurely while our DB is in a bit of a weird state with lots
of these tasks failing over the last week.
2020-03-20 19:07:08 +00:00
Rebecca Law
3bf18d0ac3 Endpoint to return a ServiceContactList for a given id. 2020-03-13 17:21:59 +00:00
Rebecca Law
654e6fc657 New table and endpoints for service contact lists.
- Table to store meta data for the emergency contact list for a service.
- Endpoint for fetching contact lists for service
- Endpoint for saving contact list for service.

The list will be stored in S3. The service will then be able to send emergency announcements to staff.
2020-03-13 12:11:16 +00:00
Pea Tyczynska
8b60e69157 Finding only jobs and letters within data retention works and tested 2020-03-10 15:53:56 +00:00
Chris Hill-Scott
851435701f Merge pull request #2737 from alphagov/returned-letter-statistics
Add an endpoint to return returned letter stats
2020-03-05 16:11:09 +00:00