notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-24 01:11:38 -05:00

Author	SHA1	Message	Date
Pea Tyczynska	87ab81912c	Merge pull request #3453 from alphagov/audit-cancel-alert-via-api-just-first-migration Migration adding cancelled_by_api_key_id to broadcast_message	2022-02-11 10:34:12 +00:00
Pea Tyczynska	00b63ca5c7	Migration adding cancelled_by_api_key_id to broadcast_message table. Also add created_by_api_key_id column that will supersede api_key_id_column. We do migration and code in separate PRs to make it easier to downgrade if anything goes wrong.	2022-02-09 17:26:37 +00:00
Pea Tyczynska	9ce6d2fe92	Merge pull request #3449 from alphagov/revert-3440-audit-api-key-id-when-cancelling-broadcast-via-api Revert "Audit api key id when cancelling broadcast via api"	2022-02-09 11:12:28 +00:00
Pea Tyczynska	a780933893	Revert "Audit api key id when cancelling broadcast via api"	2022-02-09 11:01:39 +00:00
Pea Tyczynska	d05bff9efc	Merge pull request #3440 from alphagov/audit-api-key-id-when-cancelling-broadcast-via-api Audit api key id when cancelling broadcast via api	2022-02-09 10:15:03 +00:00
Pea Tyczynska	f2bef7392e	api_key_id column will have to be dropped in a separate PR This is to avoid situation where we deploy migration already but don't deploy code yet and they are not compatible for a while which leads to errors.	2022-02-08 15:57:35 +00:00
Chris Hill-Scott	0614a70764	Merge pull request #3445 from alphagov/bump-utils-53.0.0 Bump utils to 53.0.0	2022-02-08 09:56:18 +00:00
Chris Hill-Scott	7f72d3a60f	Bump utils to 53.0.0 Changes: 53.0.0 --- * `notifications_utils.columns.Columns` has moved to `notifications_utils.insensitive_dict.InsensitiveDict` * `notifications_utils.columns.Rows` has moved to `notifications_utils.recipients.Rows` * `notifications_utils.columns.Cell` has moved to `notifications_utils.recipients.Cell` 52.0.0 --- * Deprecate the following unused `redis_client` functions: - `redis_client.increment_hash_value` - `redis_client.decrement_hash_value` - `redis_client.get_all_from_hash` - `redis_client.set_hash_and_expire` - `redis_client.expire` 51.3.1 --- * Bump govuk-bank-holidays to cache holidays for next year.	2022-02-08 09:45:10 +00:00
David McDonald	79766aeabd	Merge pull request #3447 from alphagov/remove-unused-functions Remove unused functions	2022-02-07 16:20:43 +00:00
David McDonald	1d8fafcdf4	Remove unused functions Can't see these being used anywhere so lets get rid of them	2022-02-07 15:58:04 +00:00
David McDonald	4dfdb837bb	Merge pull request #3446 from alphagov/where-to-get-creds Update where to get creds for sms providers	2022-02-07 12:09:49 +00:00
David McDonald	d82c84c358	Update where to get creds for sms providers Due to change in https://github.com/alphagov/notifications-credentials/pull/261	2022-02-07 10:46:05 +00:00
Leo Hemsted	4018fcba28	Merge pull request #3442 from alphagov/celery-docker add script to run celery from within docker	2022-02-03 12:51:01 +00:00
Chris Hill-Scott	19704a3e49	Merge pull request #3444 from alphagov/allow-admin-to-specify-domain-for-password-reset Allow admin app to specify domain for password reset	2022-02-03 09:15:03 +00:00
Pea Tyczynska	0737eceddb	Include check constraint in migration script Add check constraint that created_by_id should not be null, unless created_by_api_key_id is not null to the migration script. It is already in the models file. Also remove check constraint for cancelled_by_id from models, as this field would only be filled for broadcasts with cancelled status. Also add some spacing in that migration script so it is easier to read.	2022-02-02 17:21:58 +00:00
Chris Hill-Scott	07f584e1d5	Allow admin app to specify domain for password reset This follows the pattern for invite emails where the admin app tells the API which domain to use when generating the link. This will starting working once this admin change is merged: - [ ] https://github.com/alphagov/notifications-admin/pull/4150/files It won’t break anything if it’s merged before the admin change.	2022-02-02 17:15:09 +00:00
Pea Tyczynska	ac5967bc5a	Merge pull request #3430 from alphagov/rename_billing_report_column_sms_fragments Rename sms_fragments to sms_chargeable_units	2022-02-02 10:00:30 +00:00
Rebecca Law	c58246f558	Merge pull request #3441 from alphagov/move-log-message Moved log message inside the if statement where it actually happens.	2022-02-02 08:31:07 +00:00
Rebecca Law	09c8fbe982	Merge pull request #3418 from alphagov/letters-too-long Mark letters as validation-failed if the templated letter is too long.	2022-02-02 08:30:50 +00:00
Leo Hemsted	39848e6df0	move environment variables to their own lines and set -eu this means that if the environment variable can't be set (for example, if you don't have aws-cli installed) then there's a suitable error message early on.	2022-02-01 16:29:09 +00:00
Leo Hemsted	1f3785a7a3	add script to run celery from within docker as a team we primarily develop locally. However, we've been experiencing issues with pycurl, a subdependency of celery, that is notoriously difficult to install on mac. On top of the existing issues, we're also seeing it conflict with pyproj in bizarre ways (where the order of imports between pyproj and pycurl result in different configurations of dynamically linked C libraries being loaded. You are encouraged to attempt to install pycurl locally, following these instructions: https://github.com/alphagov/notifications-manuals/wiki/Getting-Started#pycurl However, if you aren't having any luck, you can instead now run celery in a docker container. `make run-celery-with-docker` This will build a container, install the dependencies, and run celery (with the default of four concurrent workers). It will pull aws variables from your aws configuration as boto would normally, and it will attempt to connect to your local database with the user `postgres`. If your local database is configured differently (for example, with a different user, or on a different port), then you can set the SQLALCHEMY_DATABASE_URI locally to override that.	2022-02-01 16:29:08 +00:00
Rebecca Law	8bf245f63a	Moved log message inside the if statement where it actually happens. This caught me out the other day when I was testing. I thought antivirus was enable, seeing this message didn't help with that assumption. Plus there is no point in showing the log if we don't actually call the task.	2022-01-27 15:05:07 +00:00
Chris Hill-Scott	df28f253fa	Merge pull request #3427 from alphagov/remove-name-uniqueness-endpoints Remove endpoints for checking name uniqueness	2022-01-27 13:17:41 +00:00
Pea Tyczynska	82f08f230c	Save api key id when cancelling broadcast by API call This is so that we can audit who cancelled the broadcast if there are any issues.	2022-01-26 17:26:58 +00:00
Pea Tyczynska	e1a8219eb1	DB Migration to allow auditing api key id when cancelling broadcast via API.	2022-01-26 17:26:05 +00:00
Leo Hemsted	19a11e57d2	Merge pull request #3432 from alphagov/cryptography unpin cryptography	2022-01-24 15:19:31 +00:00
Ben Thorner	68981a8b0d	Merge pull request #3436 from alphagov/stop-restart-reporting-180693991 Stop killing reporting processes after each task	2022-01-24 13:39:13 +00:00
Ben Thorner	7ad0c4103a	Stop killing reporting processes after each task Previously we think this setting was necessary to avoid a memory leak [1], but it's unclear if this is still an issue: - We've advanced two major versions of Celery. - Some of the tasks are now quicker and leaner. Restarting worker sub-processes after each task is a big problem for performance, as we move towards parallelising our reporting. This is something of a test to see if we can manage without this setting. Note that we need to unset the variable manually: cf unset-env notify-delivery-worker-reporting CELERYD_MAX_TASKS_PER_CHILD In the worst case we can always re-run any failed tasks. To check the worker is still behaving as expected, we can: - Monitor CPU / memory graphs for it. - Check `cf events` for unexpected restarts / crashes. - Compare numbers of task completion logs to previous days. - Check the number of new billing / status rows looks right. [1]: `ad419f7592`	2022-01-24 12:52:52 +00:00
Rebecca Law	c01c81326c	Update log message to something a little easier to read and query for.	2022-01-24 12:25:53 +00:00
Pea Tyczynska	4a90cde701	Merge pull request #3429 from alphagov/cancel_alert_via_api Cancel broadcast via API	2022-01-21 14:04:35 +00:00
Leo Hemsted	1ad25dbc05	Merge pull request #3434 from alphagov/prom-celery disable prometheus writing to files from celery apps	2022-01-21 11:34:07 +00:00
Leo Hemsted	24d260218f	Merge pull request #3401 from alphagov/hide-log-line don't log if we dont delete anything for a service	2022-01-21 11:33:56 +00:00
Leo Hemsted	246016a894	don't log if we dont delete anything for a service we try and delete for lots of services. this includes services that don't actually have anything to delete that day. that might be because they had a custom data retention so we always go to check them, or because they only sent test notifications (which we'll delete but not include in the count in the log line). we don't really need to see log lines saying that we didn't delete anything for that service - that's just a long list of boring log messages that will hide the actual interesting stuff - which services we did delete content for.	2022-01-21 11:04:37 +00:00
Leo Hemsted	9a01c703fa	disable prometheus writing to files from celery apps ## The existing situation To support multiple processes and eventlets recording metrics in parallel, prometheus uses files to store metrics. When you write a metric from a multiprocess app, it writes to a file. Prometheus identifies whether your app is multiprocess by looking for the existence of a `prometheus_multiproc_dir` environment var (in either case). Prometheus reads this variable at a module level (ie: at import time). Assuming it will always used within a web server, the gds_metrics library auto-sets this to `/tmp` on import, to ensure that prometheus will always be set up correctly. We also have a variety of metrics set up when we create the app. These are generally sensible metrics such as counting the number of database connections in use by measuring sqlalchemy connection events. ## The problem We have seen problems with our notify-delivery-worker-reporting app run out of space. The CELERYD_MAX_TASKS_PER_CHILD flag is set on that app which restarts each worker process every time a task runs (to avoid memory issues), however we've recently massively decreased the size and increased the number of tasks to parallelise nightly tasks. Each time a worker process restarts it will write a new file to disk. This meant that we quickly ran out of disc space, and then the entire app instance was killed. The big rub is that we don't log prometheus metrics from our worker apps! They don't expose an endpoint so there's no way to scrape them so we aren't getting any value from prometheus anyway! But because they use the same codebase they import gds_metrics and get that anyway. ## The solution gds_metrics sets the multiproc env var, however, by importing prometheus FIRST we ensure that the env var is unset at that point, and thus prometheus will harmlessly store the metrics in memory. To ensure that when we run the notify-api that still has the env var set so the stats are shared across all the gunicorn processes, we put this import as the first thing in run_celery.py	2022-01-21 11:01:39 +00:00
Pea Tyczynska	b6dd189462	Test cancel request via API returns 404 if service id does not match	2022-01-20 18:28:10 +00:00
Pea Tyczynska	52dbdb7518	Move validate_and_update_broadcast_message_status to a utils file This is because that function is used both when broadcast status is updated via API and via admin, so it's a shared resource. Also move and update tests for updating broadcast message status so things are tested at source and repetition is avoided.	2022-01-20 18:14:41 +00:00
Pea Tyczynska	c9afb2f038	Remove unnecessary error handling The context here should be enough for the users, custom error message is not needed.	2022-01-20 18:14:40 +00:00
Pea Tyczynska	c2a389e81a	Move updating user validation out of validate_and_update_broadcast_message_status As only 1 of 2 functions calling it needs that check, it's better to perform it inside that 1 function.	2022-01-20 18:14:39 +00:00
Ben Thorner	731ebed224	Merge pull request #3433 from alphagov/revert-parallel-status-180693991 Revert running status aggregation in parallel	2022-01-20 13:11:33 +00:00
Ben Thorner	0f6dea0deb	Revert running status aggregation in parallel The top-level task didn't run successfully after this was deployed due to the worker being killed due to heavy disk usage. While the more parallel version does log much more, it doesn't totally explain the disk behaviour. Nonetheless, reverting it is sensible to give us the time we need to investigate more.	2022-01-20 12:22:33 +00:00
Leo Hemsted	cdab82c1eb	unpin cryptography we previously pinned cryptography to versions less than 3.4 since after that point, cryptography started using rust as a dependency. This isn't an issue if you install from wheel, but we found that the version of pip bundled with the python buildpack was too old to support this. However, since upgrading from python 3.6 to python 3.9, the pip version has been bumped and we now no longer need to pin cryptography as it installs correctly.	2022-01-19 18:46:18 +00:00
Pea Tyczynska	a4c20e8ba6	Return 404 if reference from cancel message does not match If the reference from cancel CAP XML we received via API does not match with any existing broadcast, return 404. Do the same if service id doesn't match. Also refactor code to cancel broadcast out into separate function It should be a separate function that is only called by create_broadcast function. This will prevent create_broadcast from becoming too big and complex and doing too many things.	2022-01-19 15:42:27 +00:00
Pea Tyczynska	3b4a9d8942	Cancel broadcast via API When a service sends us an XML CAP broadcast message with Cancel status, and that broadcast is in broadcasting state, we cancel it.	2022-01-19 15:42:26 +00:00
Pea Tyczynska	940126abfb	Reject unapproved broadcast upon cancel API request When a service sends us a cancel broadcast XML via API, if that broadcast was not approved yet, reject it.	2022-01-19 15:41:38 +00:00
Ben Thorner	0a88724ff5	Merge pull request #3428 from alphagov/remove-dup-column Remove duplicate declaration for reference column	2022-01-19 13:49:44 +00:00
Ben Thorner	6be489daa7	Merge pull request #3425 from alphagov/parallelise-ft-status-180693991 Parallelise status aggregation by service and day	2022-01-19 13:49:28 +00:00
Rebecca Law	6cd7a23d3c	If there is an invalid letter that has not been updated to `validation-failed` because the `update-validation-failed-for-templated-letter` has not been picked up off the letter-tasks queue and the `collate-letter-pdfs-to-be-sent` has started. 1. The number of letters that we send to DVLA will be not be correct (see `20ead82463/app/celery/letters_pdf_tasks.py (L136)`) This may raise an alert with DVLA when they find we have sent them fewer letter than we have reported. 2. When we get the PDF from S3 we will get a file not found `20ead82463/app/celery/letters_pdf_tasks.py (L244)` The error will not prevent the collate task from completing but we will see an alert email for the exception and raise questions. Although this situation is very unlikely because we have a 15 minute window between the last letter deadline date and the time we kick off the collate task we should still mitigate these issues. I updated the queries to only return letters with billable_units > 0, all valid letters should have at least 1 billable unit.	2022-01-19 08:31:19 +00:00
Rebecca Law	841a4fc22f	Mark letters as validation-failed if the templated letter is too long. It is possible that the personalisation for a templated letter can make the letter exceed 10 pages or 5 sheets. We are not validating the letters posted via the API for this validation error. It is only possible to validate the letter once we create the PDF in notifications-template-preview. This means that the letter can only get a validation-failed status after the client has received a 201 from the POST to /v2/notifications. NOTE: we only validate the preview row of a CSV for this validation error, this change will mean that it is possible for a letter to be marked as validation-failed after a successful file upload. A new task to update the notification to `validation-failed` has been added to the API. If we find that the letter is too long once we have created the PDF we call the `update-validation-failed-for-templated-letter` task rather than `update-billable-units-for-letter` task. New work flow for a letter in brief: API - receives POST /v2/notifications :: save to db :: put CREATE_LETTERS_PDF task on queue for template preview to consume TEMPLATE-PREVIEW - consumes task CREATE_LETTERS_PDF :: create PDF :: count pages of PDF :: IF page count exceeds 10 pages put in the letters-invalid-pdf S3 bucket with metadata (similar to the precompiled letters) put `update-validation-failed-for-templated-letter` task on the queue for the API to consume ELSE put PDF in the `letters-pdf` bucket put `update-billable-units-for-letter` task on the queue API - consumes `update-billable-units-for-letter` OR `update-validation-failed-for-templated-letter` task :: IF `update-billable-units-for-letter` task: update billable units for notification as usual :: ELSE `update-validation-failed-for-templated-letter`: update notification_status = `validation-failed` ADMIN - view notification page for letter :: show validation letter for templated letter There will be 3 PRs in order to make this change, one for the API, template-preview and the admin app. Deployment plan Deploy Admin first Deploy API Deploy template-preview Related PRs: alphagov/notifications-template-preview#619 alphagov/notifications-admin#4107 https://www.pivotaltracker.com/story/show/169209742	2022-01-19 08:29:48 +00:00
Pea Tyczynska	d94517d379	Rename sms_fragments to sms_chargeable_units This field caused some confusion and lots of unnecessary work to our colleague because of unclear name. The field was named sms_fragments, where in fact the value of the field is: those sms fragments that go above free allowance multiplied by the rate multiplier. The new name was chosen through consultation with colleagues who use billing report the most.	2022-01-18 18:03:16 +00:00
Ben Thorner	9686595fa8	Minor tweaks to address comments on the PR To address: - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786867994 - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786853329 - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786848793 - https://github.com/alphagov/notifications-api/pull/3425#discussion_r786214794	2022-01-18 16:56:53 +00:00

1 2 3 4 5 ...

8668 Commits