notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-22 08:21:13 -05:00

Author	SHA1	Message	Date
Pea Tyczynska	e0c73ac342	Send daily email with letter and sheet volumes to DVLA	2021-02-23 15:13:19 +00:00
Pea Tyczynska	c8ffebcce8	Query to get letter and sheet volumes So we can send daily email with these volumes to DVLA.	2021-02-23 15:13:18 +00:00
David McDonald	e35ea57ba2	Do not delete letters if not in final state A few weeks ago, we deleted some pdf letters that had reached their retention period. However, these letters were in the 'created' state so it's very arguable that we should not have deleted them because we were expecting to resend them and were unable to. Part of the reason for this is that we marked the letters back to `created` as the status but we did not nullify the `sent_at` timestamp, meaning the check on `ebb43082d5/app/dao/notifications_dao.py (L346)` did not catch it. Regardless of that check, which controls whether the files were removed from S3, they were also archived into the `notification_history` table as by default. This commit does changes our code such that letters that are not in their final state do not go through our retention process. This could mean they violate their retention policy but that is likely the lesser of two evils (the other being we delete them and are unable to resend them). Note, `sending` letters have been included in those not to be removed because there is a risk that we give the letter to DVLA and put it in `sending` but then they come back to us later telling us they've had problems and require us to resend.	2020-12-16 10:50:11 +00:00
David McDonald	47e146f010	Move variable out of loop that didn't need to be	2020-12-16 10:40:30 +00:00
David McDonald	1bf9b29905	Document behaviour of s3 letter deleting The behaviour was a bit of opaque so I have added tests around it so it's clear what it is doing and why. No functionality has changed	2020-12-16 10:39:31 +00:00
Leo Hemsted	3bc3ed88b3	use yield_per instead of limit limit means we only return 50k letters, if there are more than that for a service we'll skip them and they won't be picked up until the next day. If you remove the limit, sqlalchemy prefetches query results so it can build up ORM results, for example collapsing joined rows into single objects with chidren. SQLAlchemy streams the data into a buffer, and normally will still prefetch the entire resultset so it can ensure integrity of the session, (so that if you modify one result that is duplicated further down in the results, both rows are updated in the session for example). However, we don't care about that, but we do care about preventing the result set taking up too much memory. We can use `yield_per` to yield from sqlalchemy to the iterator (in this case the `for letter in letters_awaiting_sending` loop in letters_pdf_tasks.py) - this means every time we hit 10000 rows, we go back to the database to get the next 10k. This way, we only ever need 10k rows in memory at a time. This has some caveats, mostly around how we handle the data the query returns. They're a bit hard to parse but I'm pretty sure the notable limitations are: * It's dangerous to modify ORM objects returned by yield_per queries * It's dangerous to join in a yield_per query if you think there will be more than one row per item (for example, if you join from notification to service, there'll be multiple result rows containing the same service, and if these are split over different yield chunks, then we may experience undefined behaviour. These two limitations are focused around there being no guarantee of having one unique row per item. For more reading: https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per https://www.mail-archive.com/sqlalchemy@googlegroups.com/msg12443.html	2020-10-26 13:01:34 +00:00
Leo Hemsted	ed182c2a22	return just the columns we need for collating letters previously we were returning the entire ORM object. Returning columns has a couple of benefits: * Means we can join on to services there and then, avoiding second queries to get the crown status of the service later in the collate flow. * Massively reduces the amount of data we return - particularly free text fields like personalisation that could be potentially quite big. 5 columns rather than 26 columns. * Minor thing, but will skip some CPU cycles as sqlalchemy will no longer construct an ORM object and try and keep track of changes. We know this function doesn't change any of the values to persist them back, so this is an unnecessary step from sqlalchemy. Disadvantages are: * The dao_get_letters_to_be_printed return interface is now much more tightly coupled to the get_key_and_size_of_letters_to_be_sent_to_print function that calls it.	2020-10-23 20:01:18 +01:00
Leo Hemsted	f8fbaba77d	limit collate task to 50,000 messages we've seen issues where tasks mysteriously hang and do not process large volumes of letters - in this case >150k letters in created state. to try and get at least some letters out of the door, limit the query to only return 50k letters per postage type. We may need to run the task multiple times, or letters may get delayed until the next day when they'd be picked up (provided there's enough capacity then). The task should only be re-run AFTER the ftp tasks have all finished, and updated the letters to sending, or we run the risk of sending the same letters twice. For context, the largest ever letter day we've sent is ~65k in march of this year.	2020-10-23 15:28:34 +01:00
Leo Hemsted	4b61060d32	stream notifications when collating zip files we had issues where we had 150k 2nd class notifications, and the collate task never ran properly, presumably because the volume of data being returned was too big. to try and help with this, we can switch to streaming rather than using `.all` and building up lists of data. This should help, though the initial query may be a problem still.	2020-10-23 12:20:26 +01:00
Pea Tyczynska	9ac65ee95c	Start sending letters from insolvency service again	2020-10-21 16:56:18 +01:00
Pea Tyczynska	30bd311eb1	Temporarily do not send letters from Insolvency Service to DVLA. This is a temporary measure over the weekend so that DVLA can catch up with all other letters. We should revert this on Monday 19.10.2020	2020-10-16 16:13:32 +01:00
Rebecca Law	b2ff4277c9	Adding service_id to the sort order for the letters being sent to print. We have had a few instances where letters have caused problems. Particularly for precompiled letters, often the issue comes from the same service. The hope is that by adding a sort order this will help the print provider narrow down the problem. There is a small degradation of the performance of the query, but it's not enough to concern me.	2020-10-15 09:39:07 +01:00
Leo Hemsted	1e928a926a	rename sending_date to created_at we don't name letters based on the day we send them on, rather, the day we create them on. If we process a letter for a second time for whatever reason, even if it's a couple of days later, it'll still go in a folder based on the created_at timestamp. There's still a slight confusion, however - if the timestamp is after 5:30pm, the folder will be for the day after. However, still the day after creation, so I think created_at still makes the most sense. Remove the term `sending_date` to try and make this relationship more apparent.	2020-09-21 14:40:22 +01:00
Leo Hemsted	bb33927b3d	rename letter get_folder_name args `_now`? why would we ever use a different _now? instead say created_at, because that's what it'll always be set to, even if we're replaying old letters. We always set the folder name to when the letter was created_at, or we might not know where to look to find it. `dont_use_sending_date` doesn't really tell us what might happen if we don't use it - the answer is we return an empty string. we ignore the folder entirely. so lets call it that. Also, remove use of freeze_gun in the tests, to prove that we don't use the current time in any calculations. Also add an assert to a mock in the get_pdf_for_templated_letter test, because we were mocking but not asserting before, so the tests didn't fail when the function signature changed.	2020-09-21 14:32:57 +01:00
Chris Hill-Scott	0e1f6f31e4	Use constant for notification type Co-authored-by: Katie Smith <klssmith@users.noreply.github.com>	2020-09-09 11:12:06 +01:00
Chris Hill-Scott	cfda289746	Allow international letters to be cancelled Our code was assuming that any notifications with `international` set to `True` were text messages. It was then trying to look up delivery information for a notification which wasn’t sent to a phone number, causing an exception.	2020-09-09 10:55:55 +01:00
Pea M. Tyczynska	fbdfa6416f	Merge pull request #2921 from alphagov/remove-statsd-http-api-decorators Remove statsd http api decorators and turn statsd back on for celery apps	2020-07-14 10:16:44 +01:00
Pea M. Tyczynska	9186083ea7	Merge pull request #2796 from alphagov/split-letters-into-zips-based-on-postage Split letters into zips based on postage	2020-07-08 11:49:21 +01:00
Pea Tyczynska	9c4205c7c6	Remove statsd decorators from dao functions This done so that we do not use statsd on our http endpoint. We decided we do not need metrics that this gave us. If we change our minds, we will add Prometheus-friendly decorators instead in the future.	2020-07-07 18:02:24 +01:00
Pea Tyczynska	61de908c5d	Simplify putting letters in right postage folders	2020-07-02 16:26:44 +01:00
David McDonald	12f460adc5	Turn off statsd wrapper for synchronous statsd calls during POSTs This commit turns off StatsD metrics for the following - the `dao_create_notification` function - the `user-agent` metric - the response times and response codes per flask endpoint This has been done with the purpose of not having the creation of text messages or emails call out to StatsD during the request process. These are the three current metrics that are currently called during the processing of one of those requests and so have been removed from the API. The reason for removing the calls out to StatsD when processing a request to send a notification is that we have seen two incidents recently affected by DNS resolution for StatsD (either by a slow down in resolution time or a failure to resolve). These POST requests are our most critical code path and we don't want them to be affected by any potential unforeseen trouble with StatsD DNS resolution. We are not going to miss the removal of these metrics. - the `dao_create_notification` metric is rarely/never looked at - the `user-agent` metric is rarely/never looked at and can be got from our logs if we want it - the response times and response codes per flask endpoint are already exposed using the gds metrics python library I did not remove the statsd metrics from any other parts of the API because - As the POST notification endpoints are the main source of web traffic, we should have already removed most calls to StatsD which should greatly reduce the chance of their being any further issues with DNS resolution - Some of the other metrics still provide value so no point deleting them if we don't need to - The metrics on celery tasks will not affect any HTTP requests from users as they are async and also we do not currently have the infrastructure set up to replace them with a prometheus HTTP endpoint that can be scraped so this would require more work	2020-06-29 12:40:22 +01:00
Rebecca Law	ce32e577b7	Remove the use of schedule_for in post_notifications. Years ago we started to implement a way to schedule a notification. We hit a problem but we never came up with a good solution and the feature never made it back to the top of the priority list. This PR removes the code for scheduled_for. There will be another PR to drop the scheduled_notifications table and remove the schedule_notifications service permission Unfortunately, I don't think we can remove the `scheduled_for` attribute from the notification.serialized method because out clients might fail if something is missing. For now I have left it in but defaulted the value to None.	2020-06-24 14:54:40 +01:00
Pea Tyczynska	c96142ba5e	Change function and variable names for readability and consistency	2020-06-01 12:44:49 +01:00
Chris Hill-Scott	80fc5e6600	Paginate search results for notifications The standard way that we indicate that there are more results than can be returned is by paginating. So even though we don’t intend to paginate the search results in the admin app, it can still use the presence or absence of a ‘next’ link to determine whether or not to show a message about only showing the first 50 results.	2020-05-06 12:13:00 +01:00
Chris Hill-Scott	625aad97c9	Limit search by recipient to 50 results Things could get ugly if you use a short search string on a service with lots of notifications…	2020-05-06 11:44:35 +01:00
Chris Hill-Scott	1e20cb6be9	Stop searching the to field We were doing this temporarily while the `normalised_to` field was not populated for letters. Once we have a week of data in the `normalised_to` field we can stop looking in the `to` field. This should improve performance because: - it’s one `WHERE` clause not two - we had to do `ilike` on the `to` field, because we don’t lowercase its contents – even if the two where clauses are somehow paralleized it’s is slower than a `like` comparison, which is case-sensitive Depends on: - [ ] Tuesday 5 May (7 full days after deploying https://github.com/alphagov/notifications-api/pull/2814)	2020-04-30 10:51:30 +01:00
Chris Hill-Scott	26793899d4	Merge pull request #2814 from alphagov/search-letters Let users search for letters	2020-04-27 15:06:32 +01:00
Chris Hill-Scott	64674be817	Make allowed notification types explicit in search By not having a catch-all else, it makes it clearer what we’re expecting. And then we think it’s worth adding a comment explaining why we normalise as we do for letters and the `None` case.	2020-04-23 16:06:34 +01:00
Chris Hill-Scott	7897ae70ce	Let users search for letters Like we have search by email address or phone number, finding an individual letter is a common task. At the moment users are having to click through pages and pages of letters to find the one they’re looking for. We have to search in the `to` and `normalised_to` fields for now because we’re not populating the `normalised_to` column for letters at the moment.	2020-04-21 16:25:37 +01:00
Pea Tyczynska	9782cbc5b5	Check failure code on failure only. We should check error code on failure only, because if we get an error code on pending code, we should still set the notification to pending status.	2020-04-21 13:43:45 +01:00
Pea Tyczynska	385d249787	Take into consideration Firetext 000 code - no error. Also change logs from error to warning	2020-04-21 12:06:28 +01:00
Pea Tyczynska	b962dcac5d	Check Firetext code first to determine status and log the reason If there is no code, fall back to old way of determining status. If the code is not recognised, log an Error so we are notified and can look into it.	2020-04-20 18:33:19 +01:00
Pea Tyczynska	470d00c26e	Make comment clearer following review	2020-04-17 10:58:25 +01:00
Pea Tyczynska	a4507dbb3f	Use firetext response code to see if temporary or permanent failure if available	2020-04-17 10:58:25 +01:00
Katie Smith	bfd40a843c	Delete send-scheduled-notifications task This was added a few years ago, but never used.	2020-04-02 14:45:18 +01:00
Rebecca Law	ac8fc49539	Merge pull request #2770 from alphagov/increase-qry-limit Increase default query limit to 50,000 from 10,000	2020-03-31 17:15:12 +01:00
Rebecca Law	1444150d48	Add team key to the notifications to delete/insert into notifications/notification_history	2020-03-31 09:23:41 +01:00
Rebecca Law	5d0830d7f7	Rename function for clarity	2020-03-27 15:48:54 +00:00
Rebecca Law	8da73510c2	Delete all notification_status types. This will ensure the whole day has been deleted. The stats table could get the wrong updates if there is partial data for a day.	2020-03-27 12:14:30 +00:00
Rebecca Law	e5b6fa2143	Reduce log messages.	2020-03-25 08:08:33 +00:00
Rebecca Law	31f83bb9c4	Increase default query limit to 50,000 from 10,000	2020-03-24 17:04:38 +00:00
Rebecca Law	69102db4cb	Remove the hourly loop and replace it with a while loop, which check if we still have things to delete for the day. We are already limiting the query by a query limit.	2020-03-24 14:54:06 +00:00
Rebecca Law	3ca73a35ed	Change as per code review - fix test name - make query filter consistent with each other - add comment for clarity - add inner loop to continue to insert and delete notifications while the delete count is greater than 0	2020-03-24 14:17:47 +00:00
Rebecca Law	f04210ba7d	Refactor delete_notifications_older_than_retention_by_type to use a new strategy. The insert_notification_history_delete_notifications function uses a temp table to store the data to insert and delete. This will save extra queries while performing the insert and delete operations. The function is written in such a way that if the task is stop while processing when it's started up again it will just pick up where it left off. I've made a decision to delete all test data in one query, I don't anticipate a problem with that. The performance of this might also be better than last nights test because we are inserting everything we need for the NotificationHistory insert, so we don't need the join to Notifications to perform the insert.	2020-03-24 12:21:28 +00:00
Rebecca Law	80845e1abb	Transaction to insert history and delete notifications. Need to test the performance of this function, then we can call it from the task. - Create a temporary table to insert ids of the desired rows, limit by 10K (might be too low). - Insert into NotifcationHistory select from notification where id in temp table - Delete from Notifications where id in temp table - drop temp table We should be able to iterate of this. The query stats for the query to create the temp table are very good, 17ms.	2020-03-23 16:33:06 +00:00
Leo Hemsted	8ea09be1ff	order the insert/update by created_at you should always order by when doing a limit/offset, to guarantee that the second time you run that query, the order hasn't changed and you aren't just repeating the task with an overlap of notifications. Luckily, in this case we haven't lost any data, because: * we have an on conflict do update statement so when we returned duplicate rows it would just do an update * when we delete, we cross-reference with the notification history so if a row always got missed, we won't delete it. This resulted in, for example, govuk email still having a handful of notifications in the table from 9th despite the task running succesfully every day until the 18th of march. order by created_at ascending so that we start with oldest notifications first, in case it's interrupted part way through.	2020-03-20 19:11:24 +00:00
Leo Hemsted	dc5b56ff78	Change sql to chunk by hour to remove old notifications insert/update, and then delete notifications in hourly batches. This means that if the task gets interrupted part-way through, we'll have at least something to show for it. Previously we would insert and update into the history table but might not delete from the notification table properly. Keeping the offsets and limits for confidence around reliability and queries timing out. Keeping the join to notification_history to ensure we don't delete anything prematurely while our DB is in a bit of a weird state with lots of these tasks failing over the last week.	2020-03-20 19:07:08 +00:00
David McDonald	148a5ab456	Refactor dates being passed around I believe this way is nicer to read, we don't have to change between datetimes and strings and back.	2020-02-21 15:01:19 +00:00
David McDonald	6226d9e122	Don't send test letters to dvla to print	2020-02-21 15:01:19 +00:00
David McDonald	dc9bf757a8	Change which letters we want to be sent to look at all days Previously, when running the `collate_letter_pdfs_for_day` task, we would only send letters that were created between 5:30pm yesterday and 5:30 today. Now we send letters that were created before 5:30pm today and that are still waiting to be sent. This will help us automatically attempt to send letters that may have fallen through the gaps and not been sent the previous day when they should have been. Previously we solved the problem of letters that had fallen the gap by having to run the task with a date parameter for example `collate_letter_pdfs_for_day('2020-02-18'). We no longer need this date parameter as we will always look back across previous days too for letters that still need sending. Note, we have to change from using the pagination `list_objects_v2` to instead getting each individual notification from s3. We reduce load by using `HEAD` rather than `GET` but this will still greatly increase the number of API calls. We acknowledge there will be a small cost to this, say 50p for 5000 letters and think this is tolerable. Boto3 also handles retries itself so if when making one of the many HEAD requests, there is a networking blip then it should be retried automatically for us.	2020-02-21 15:01:19 +00:00

1 2 3 4 5 ...

367 Commits