Commit Graph

4911 Commits

Author SHA1 Message Date
David McDonald
1d8fafcdf4 Remove unused functions
Can't see these being used anywhere so lets get rid of them
2022-02-07 15:58:04 +00:00
Leo Hemsted
4018fcba28 Merge pull request #3442 from alphagov/celery-docker
add script to run celery from within docker
2022-02-03 12:51:01 +00:00
Chris Hill-Scott
07f584e1d5 Allow admin app to specify domain for password reset
This follows the pattern for invite emails where the admin app tells the
API which domain to use when generating the link.

This will starting working once this admin change is merged:
- [ ] https://github.com/alphagov/notifications-admin/pull/4150/files

It won’t break anything if it’s merged before the admin change.
2022-02-02 17:15:09 +00:00
Pea Tyczynska
ac5967bc5a Merge pull request #3430 from alphagov/rename_billing_report_column_sms_fragments
Rename sms_fragments to sms_chargeable_units
2022-02-02 10:00:30 +00:00
Rebecca Law
c58246f558 Merge pull request #3441 from alphagov/move-log-message
Moved log message inside the if statement where it actually happens.
2022-02-02 08:31:07 +00:00
Rebecca Law
09c8fbe982 Merge pull request #3418 from alphagov/letters-too-long
Mark letters as validation-failed if the templated letter is too long.
2022-02-02 08:30:50 +00:00
Leo Hemsted
1f3785a7a3 add script to run celery from within docker
as a team we primarily develop locally. However, we've been experiencing
issues with pycurl, a subdependency of celery, that is notoriously
difficult to install on mac. On top of the existing issues, we're also
seeing it conflict with pyproj in bizarre ways (where the order of
imports between pyproj and pycurl result in different configurations of
dynamically linked C libraries being loaded.

You are encouraged to attempt to install pycurl locally, following these
instructions: https://github.com/alphagov/notifications-manuals/wiki/Getting-Started#pycurl

However, if you aren't having any luck, you can instead now run celery
in a docker container.

`make run-celery-with-docker`

This will build a container, install the dependencies, and run celery
(with the default of four concurrent workers).

It will pull aws variables from your aws configuration as boto would
normally, and it will attempt to connect to your local database with the
user `postgres`. If your local database is configured differently (for
example, with a different user, or on a different port), then you can
set the SQLALCHEMY_DATABASE_URI locally to override that.
2022-02-01 16:29:08 +00:00
Rebecca Law
8bf245f63a Moved log message inside the if statement where it actually happens.
This caught me out the other day when I was testing. I thought antivirus
was enable, seeing this message didn't help with that assumption. Plus
there is no point in showing the log if we don't actually call the task.
2022-01-27 15:05:07 +00:00
Chris Hill-Scott
df28f253fa Merge pull request #3427 from alphagov/remove-name-uniqueness-endpoints
Remove endpoints for checking name uniqueness
2022-01-27 13:17:41 +00:00
Ben Thorner
7ad0c4103a Stop killing reporting processes after each task
Previously we think this setting was necessary to avoid a memory
leak [1], but it's unclear if this is still an issue:

- We've advanced two major versions of Celery.
- Some of the tasks are now quicker and leaner.

Restarting worker sub-processes after each task is a big problem
for performance, as we move towards parallelising our reporting.

This is something of a test to see if we can manage without this
setting. Note that we need to unset the variable manually:

   cf unset-env notify-delivery-worker-reporting CELERYD_MAX_TASKS_PER_CHILD

In the worst case we can always re-run any failed tasks. To check
the worker is still behaving as expected, we can:

- Monitor CPU / memory graphs for it.
- Check `cf events` for unexpected restarts / crashes.
- Compare numbers of task completion logs to previous days.
- Check the number of new billing / status rows looks right.

[1]: ad419f7592
2022-01-24 12:52:52 +00:00
Rebecca Law
c01c81326c Update log message to something a little easier to read and query for. 2022-01-24 12:25:53 +00:00
Pea Tyczynska
4a90cde701 Merge pull request #3429 from alphagov/cancel_alert_via_api
Cancel broadcast via API
2022-01-21 14:04:35 +00:00
Leo Hemsted
246016a894 don't log if we dont delete anything for a service
we try and delete for lots of services. this includes services that
don't actually have anything to delete that day. that might be because
they had a custom data retention so we always go to check them, or
because they only sent test notifications (which we'll delete but not
include in the count in the log line). we don't really need to see log
lines saying that we didn't delete anything for that service - that's
just a long list of boring log messages that will hide the actual
interesting stuff - which services we did delete content for.
2022-01-21 11:04:37 +00:00
Pea Tyczynska
52dbdb7518 Move validate_and_update_broadcast_message_status to a utils file
This is because that function is used both when broadcast status
is updated via API and via admin, so it's a shared resource.

Also move and update tests for updating broadcast message status
so things are tested at source and repetition is avoided.
2022-01-20 18:14:41 +00:00
Pea Tyczynska
c9afb2f038 Remove unnecessary error handling
The context here should be enough for the users, custom error
message is not needed.
2022-01-20 18:14:40 +00:00
Pea Tyczynska
c2a389e81a Move updating user validation out of validate_and_update_broadcast_message_status
As only 1 of 2 functions calling it needs that check, it's better
to perform it inside that 1 function.
2022-01-20 18:14:39 +00:00
Ben Thorner
0f6dea0deb Revert running status aggregation in parallel
The top-level task didn't run successfully after this was deployed
due to the worker being killed due to heavy disk usage. While the
more parallel version does log much more, it doesn't totally explain
the disk behaviour. Nonetheless, reverting it is sensible to give us
the time we need to investigate more.
2022-01-20 12:22:33 +00:00
Pea Tyczynska
a4c20e8ba6 Return 404 if reference from cancel message does not match
If the reference from cancel CAP XML we received via API does not
match with any existing broadcast, return 404.

Do the same if service id doesn't match.

Also refactor code to cancel broadcast out into separate function

It should be a separate function that is only called by create_broadcast
function. This will prevent create_broadcast from becoming too
big and complex and doing too many things.
2022-01-19 15:42:27 +00:00
Pea Tyczynska
3b4a9d8942 Cancel broadcast via API
When a service sends us an XML CAP broadcast message with Cancel
status, and that broadcast is in broadcasting state, we cancel it.
2022-01-19 15:42:26 +00:00
Pea Tyczynska
940126abfb Reject unapproved broadcast upon cancel API request
When a service sends us a cancel broadcast XML via API, if that
broadcast was not approved yet, reject it.
2022-01-19 15:41:38 +00:00
Ben Thorner
0a88724ff5 Merge pull request #3428 from alphagov/remove-dup-column
Remove duplicate declaration for reference column
2022-01-19 13:49:44 +00:00
Ben Thorner
6be489daa7 Merge pull request #3425 from alphagov/parallelise-ft-status-180693991
Parallelise status aggregation by service and day
2022-01-19 13:49:28 +00:00
Rebecca Law
6cd7a23d3c If there is an invalid letter that has not been updated to validation-failed because the update-validation-failed-for-templated-letter has not been picked up off the letter-tasks queue and the collate-letter-pdfs-to-be-sent has started.
1. The number of letters that we send to DVLA will be not be correct (see 20ead82463/app/celery/letters_pdf_tasks.py (L136))
This may raise an alert with DVLA when they find we have sent them fewer letter than we have reported.
2. When we get the PDF from S3 we will get a file not found 20ead82463/app/celery/letters_pdf_tasks.py (L244)
The error will not prevent the collate task from completing but we will see an alert email for the exception and raise questions.

Although this situation is very unlikely because we have a 15 minute window between the last letter deadline date and the time we kick off the collate task we should still mitigate these issues. I updated the queries to only return letters with billable_units > 0, all valid letters should have at least 1 billable unit.
2022-01-19 08:31:19 +00:00
Rebecca Law
841a4fc22f Mark letters as validation-failed if the templated letter is too long.
It is possible that the personalisation for a templated letter can make the letter exceed 10 pages or 5 sheets. We are not validating the letters posted via the API for this validation error. It is only possible to validate the letter once we create the PDF in notifications-template-preview. This means that the letter can only get a validation-failed status after the client has received a 201 from the POST to /v2/notifications.
NOTE: we only validate the preview row of a CSV for this validation error, this change will mean that it is possible for a letter to be marked as validation-failed after a successful file upload.

A new task to update the notification to `validation-failed` has been added to the API. If we find that the letter is too long once we have created the PDF we call the `update-validation-failed-for-templated-letter` task rather than `update-billable-units-for-letter` task.

New work flow for a letter in brief:
API - receives POST /v2/notifications
:: save to db
:: put CREATE_LETTERS_PDF task on queue for template preview to consume
TEMPLATE-PREVIEW - consumes task CREATE_LETTERS_PDF
:: create PDF
:: count pages of PDF
:: IF page count exceeds 10 pages
	 put in the letters-invalid-pdf S3 bucket with metadata (similar to the precompiled letters)
	 put `update-validation-failed-for-templated-letter` task on the queue for the API to consume
   ELSE
     put PDF in the `letters-pdf` bucket
     put `update-billable-units-for-letter` task on the queue
API - consumes `update-billable-units-for-letter` OR `update-validation-failed-for-templated-letter` task
:: IF `update-billable-units-for-letter` task:
   	update billable units for notification as usual
:: ELSE `update-validation-failed-for-templated-letter`:
   	update notification_status = `validation-failed`
ADMIN - view notification page for letter
:: show validation letter for templated letter

There will be 3 PRs in order to make this change, one for the API, template-preview and the admin app.

Deployment plan

Deploy Admin first
Deploy API
Deploy template-preview

Related PRs:
alphagov/notifications-template-preview#619
alphagov/notifications-admin#4107

https://www.pivotaltracker.com/story/show/169209742
2022-01-19 08:29:48 +00:00
Pea Tyczynska
d94517d379 Rename sms_fragments to sms_chargeable_units
This field caused some confusion and lots of unnecessary work
    to our colleague because of unclear name.

    The field was named sms_fragments, where in fact the value of
    the field is: those sms fragments that go above free allowance
    multiplied by the rate multiplier.

    The new name was chosen through consultation with colleagues
    who use billing report the most.
2022-01-18 18:03:16 +00:00
Ben Thorner
9686595fa8 Minor tweaks to address comments on the PR
To address:

- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786867994
- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786853329
- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786848793
- https://github.com/alphagov/notifications-api/pull/3425#discussion_r786214794
2022-01-18 16:56:53 +00:00
Ben Thorner
cfa6284af7 Remove duplicate declaration for reference column
This is identical to the declaration a few lines above.
2022-01-17 12:03:14 +00:00
Chris Hill-Scott
a4347a5165 Remove endpoints for checking name uniqueness
The code which called these endpoints was removed:
- for services in https://github.com/alphagov/notifications-admin/pull/4084/files
- for organisations in https://github.com/alphagov/notifications-admin/pull/4128/files

Therefore these endpoints are no longer needed.
2022-01-13 17:12:40 +00:00
Ben Thorner
086f0f50a6 Remove unnecessary extra method in status DAO
This makes it easier to see what is being queried.
2022-01-12 15:48:00 +00:00
Ben Thorner
9182ebf4e5 Parallelise status aggregation by service and day
This follows a similar approach as [1]. Recently we've seen lots
of errors from this task, which we think are a consequence of it
doing too much work and tripping Celery's visibility timeout.

While we can optimise the query [2], it's likely the errors will
return as the number of live services grows. Parallelising the
aggregation now will make it more futureproof.

[1]: https://github.com/alphagov/notifications-api/pull/3397
[2]: https://github.com/alphagov/notifications-api/pull/3417
2022-01-12 15:47:59 +00:00
Ben Thorner
c3da139e9c Remove redundant migration tasks (esp. for status)
These were added long ago [1][2] and aren't referenced in runbooks,
so it should be safe to delete them.

[1]: 13f3662051
[2]: b9953dd005
2022-01-12 15:47:58 +00:00
Ben Thorner
d772ae6b46 Standardise logs for status aggregation tasks
This will make it easier to parallelise by service later on.
2022-01-12 15:47:57 +00:00
Ben Thorner
4feed950c4 DRY-up loops to kick off status aggregation tasks
This will make it easier to parallelise by service in the following
commits, since we only have one loop to change.
2022-01-12 15:47:56 +00:00
Ben Thorner
ddbf556486 Rewrite task to aggregate status by service
This is a step towards parallelising the task by service and day.
2022-01-12 15:47:53 +00:00
Katie Smith
ed725c1513 Add endpoint to allow org team members to be removed
This is similar to the corresponding endpoint for services. However,
it is a little simpler since we don't need to worry about always having
at least one team member for an organisation.

The new dao function added, `dao_remove_user_from_organisation`, is also
simpler than `dao_remove_user_from_service` since we don't have any
organisation permissions to deal with.
2022-01-11 15:20:48 +00:00
Ben Thorner
081e0cab88 Merge pull request #3417 from alphagov/optimise-status-query-180693991
Optimise query to populate notification statuses
2022-01-11 14:18:36 +00:00
Ben Thorner
63b5204fb0 Optimise query to populate notification statuses
Investigation with EXPLAIN and EXPLAIN ANALYZE for the notification
history table shows this is another instance of [1] but for the key
type column. Swapping "!=" for "IN" solves the problem.

[1]: https://github.com/alphagov/notifications-api/pull/3360
2022-01-11 13:22:04 +00:00
Ben Thorner
e4dcea5396 Merge pull request #3421 from alphagov/explain-status-task-180693991
Add comment to explain status aggregation approach
2022-01-11 12:33:38 +00:00
Rebecca Law
2257cae398 Fix bug in organisation report for its services and usages.
If a service has not sent any SMS for the financial year the free allowance was showing up as 0 rather than the number in annual billing. The query has been updated to use an outer join so that the free allow will be returned when there is no ft_billing.

There is a potential performance enhancement to only return the data for the services of the organisation in the `fetch_sms_free_allowance_remainder_until_date` subquery. I will investigate in a subsequent PR.
2022-01-11 10:04:36 +00:00
Ben Thorner
a7b39a930c Add comment to explain status aggregation approach
This relates to the performance optimisation work we're doing [1].
Before optimising the task, it's worth asking if we can do less -
the comment explains why it has to be this way.

Some references to back up the comment:

- We do status updates in either table [2].
- We don't allow duplicate receipts for emails [3].
- We don't allow duplicate receipts for SMS [4].
- We don't expect duplicate receipts for letters.

This is something we would need to revisit if we want to support
additional status updates - we could reject based on the age of the
notification, rather than the status.

[1]: https://github.com/alphagov/notifications-api/pull/3417
[2]: 20ead82463/app/dao/notifications_dao.py (L538)
[3]: 20ead82463/app/celery/process_ses_receipts_tasks.py (L58)
[4]: 20ead82463/app/dao/notifications_dao.py (L129-L135)
2022-01-10 18:15:54 +00:00
Katie Smith
0b7410818e Allow get_all_notifications_for_service to accept POST requests
We want admin to send a POST request to this route if the data contains
a message recipient (a phone number or email address) so that this does
not show in the logs. This changes the route to accept both GET and POST
requests.
2022-01-04 14:04:03 +00:00
Rebecca Law
603acc8b1e Increase the SQL timeout for the notify-delivery-worker-reporting app.
When running the night reporting tasks we are seeing that some tasks are failing because the query is timing out. We need to revisit how to optimise the query but this will at least let the process finish.
2021-12-23 11:41:49 +00:00
David McDonald
3a214da379 Merge pull request #3408 from alphagov/db-connection-close
Close DB connection whilst making HTTP to SMS providers
2021-12-22 11:02:13 +00:00
David McDonald
2584946823 Close DB connection whilst making HTTP to SMS providers
At the moment, when we are processing and sending an SMS we open
a DB connection at the start of the celery task and then close it
at the end of the celery task. Nice and simple.

However, during that celery task we make an HTTP call out to our
SMS providers. If our SMS providers have problems or response times
start to slow then it means we have an open DB connection sat waiting
for our SMS providers to respond which could take seconds. If our
SMS providers grind to a halt, this would cause all of the
celery tasks to hold on to their connections and we would run out
of DB connections and Notify would fall over.

We think we can solve this by closing the DB session which releases
the DB connection back to the pool.

Note, we've seen this happen in staging during load testing if our
SMS provider stub has fallen over. We've never seen it in production
and it may be less unlikely to happen as we are balancing traffic
across two providers and they generally have very good uptime.

One downside to be aware of is there could be a slight increase in
time spent to send an SMS as we will now spend a bit of extra time
closing the DB session and then reopening it again after the HTTP
request is done.

Note, there is no reason this approach couldn't be copied for our
email provider too if it appears successful.
2021-12-21 17:45:53 +00:00
Pea Tyczynska
32cd7a0eb6 Merge pull request #3395 from alphagov/fix_org_usage_report
Fix calculating remaining free allowance for SMS
2021-12-21 15:02:54 +00:00
Ben Thorner
e55b654a0b Merge pull request #3407 from alphagov/downgrade-inbound-log
Downgrade log about orphaned inbound SMS
2021-12-21 13:36:10 +00:00
Ben Thorner
f65fb519c7 Merge pull request #3404 from alphagov/remove-redundant-conditional-180477467
Remove redundant conditional for letter branding
2021-12-21 13:35:59 +00:00
Ben Thorner
3d30965193 Downgrade log about orphaned inbound SMS
We can't control who might be sending messages on inbound numbers
that we own i.e. this log isn't an actionable error. Looks like it
used to represent something that _was_ an error [1], but that's not
the case anymore, so it seems reasonable to downgrade it.

[1]: d99ab329eb (diff-80d123d9abb40f80a221979940657a2751cc7cb33f255aa8f352a8324023e022L125)
2021-12-21 12:49:00 +00:00
Ben Thorner
de9ae08ecc Downgrade log for letter deletion exceptions
If the S3 object is missing [1], then that's what we want, so we
don't need such a severe log for it, but we still want to know as
it's not expected. This is separate to more general "ClientError"
exceptions, which could mean anything.

There weren't any tests to cover missing S3 objects, so I've added
one. I don't think we need a test for ClientErrors:

- If there was no handler, the task would fail and we'd learn about
it that way.

- The scope of the calling task is now much smaller, so it matters
less than it used to [2].

[1]: 81a79e56ce/app/letters/utils.py (L52)
[2]: f965322f25
2021-12-20 12:45:48 +00:00
Ben Thorner
76da31c32a Remove redundant conditional for letter branding
This is no longer used when creating a service [1]. It was likely
added at a migration point when Admin _did_ specify branding.

[1]: 50c3c3e10c/app/main/views/add_service.py (L15-L22)
2021-12-16 17:54:33 +00:00