Commit Graph

8690 Commits

Author SHA1 Message Date
Ben Thorner
a69d1635a1 Update FactStatus table in bulk for each service
Previously we were looping over data from the Notifications/History
table and then shovelling it into the status table, one row at a time
- plus an extra delete to clean up any existing data.

This replaces that with a batch insertion, similar to how we archive
notifications [1], but using a simple subquery (via "from_select" [2])
instead of a temporary table.

To make the select compatible with the insert, I've used "literal"
to inject the constant pieces of data, so each row has everything it
needs to go into the status table.

[1]: 9ce6d2fe92/app/dao/notifications_dao.py (L295)
[2]: https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.from_select
2022-02-16 13:40:05 +00:00
Ben Thorner
efde271c5a Rewrite FactStatus updates to be an upsert
This is consistent with the way we do billing updates [1] and is a
bit less clunky. Functionally it should be the same - note that the
tests already cover the "overwriting" behaviour if a row exists.

[1]: 9ce6d2fe92/app/dao/fact_billing_dao.py (L522)
2022-02-16 13:39:08 +00:00
Ben Thorner
7bc037f1ce Merge pull request #3459 from alphagov/fix-status-task-logs-180693991
Fix task name and action in status task logs
2022-02-16 13:38:38 +00:00
Ben Thorner
5206844a95 Merge pull request #3438 from alphagov/lower-query-timeout-180693991
Revert increased timeout for reporting worker
2022-02-16 13:38:29 +00:00
Ben Thorner
ef231d5de7 Fix task name and action in status task logs 2022-02-16 11:45:45 +00:00
Ben Thorner
8aeee7ce40 Merge pull request #3457 from alphagov/bump-beat-memory
Bump memory for Celery Beat to 512M
2022-02-14 14:22:47 +00:00
Ben Thorner
2d6ee2eb72 Bump memory for Celery Beat to 512M
This fixes the app crashing - apparently due to a lack of free
memory. Looking over the last month, we can see it's been hovering
at ~90% utilisation. 512M seems like a good compromise to avoid
this happening again vs paying for what we use.

The limit was last changed 5 years ago:

https://github.com/alphagov/notifications-api/pull/882
2022-02-14 14:13:21 +00:00
Chris Hill-Scott
d21c09f0fa Merge pull request #3456 from alphagov/error-if-cap-missing-references
Return 400 if references missing from cancel broadcast
2022-02-14 13:15:46 +00:00
Chris Hill-Scott
bbc444699a Return 400 if references missing from cancel broadcast
If someone tries to cancel a broadcast but the references don’t match
and existing broadcast we correctly return a 404.

If they don’t provide any references then we get an exception. This
commit catches the missing references and returns a 400. I think this
is more appropriate because it’s malformed request, rather than a
well-formed request that doesn’t match our data. It also lets us write a
more specific and helpful error message.
2022-02-14 12:34:09 +00:00
Ben Thorner
5a87d8c7d7 Merge pull request #3437 from alphagov/retry-parallel-status-180693991
Attempt 2 of parallelising status aggregation more
2022-02-14 10:50:26 +00:00
Pea Tyczynska
e5b1a65f1e Merge pull request #3448 from alphagov/drop_api_key_id_column_from_broadcast_message_table
Drop api_key_id column from broadcast_message table
2022-02-11 14:01:25 +00:00
Pea Tyczynska
66dba8a7c6 Merge pull request #3452 from alphagov/revert-3449-revert-3440-audit-api-key-id-when-cancelling-broadcast-via-api
Audit api key id when cancelling broadcast via api - second try
2022-02-11 12:13:33 +00:00
Pea Tyczynska
c55d917b28 Drop api_key_id column from broadcast_message table
This column has been superseded by a new column named
created_by_api_key_id.

Also create constraint checking that we know who created broadcast

Also move data so that constraint is met before instatiating it.
2022-02-11 12:12:00 +00:00
Pea Tyczynska
3dc1907321 Audit api key id when cancelling broadcast via api 2022-02-11 12:01:56 +00:00
Pea Tyczynska
87ab81912c Merge pull request #3453 from alphagov/audit-cancel-alert-via-api-just-first-migration
Migration adding cancelled_by_api_key_id to broadcast_message
2022-02-11 10:34:12 +00:00
Ben Thorner
966c4db8c6 Fix getting service IDs for status aggregation
Addresses [1].

Previously the query would always use UTC midnight, even after we
had switched to BST (+1h). We store timestamps as naive UTC in our
DB - without a timezone - but we want the query to work in terms
of GMT / BST so we adjust for that - BST midnight is 11PM in UTC.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r791998690
2022-02-10 10:51:45 +00:00
Ben Thorner
6e8f121548 Standardise how we query midnight-to-midnight
Partially addresses [1] (lots more detail to read in the comment).
I've also added some tests for the status DAO function to confirm
it behaves as expected across timezones.

[1]: https://github.com/alphagov/notifications-api/pull/3437#discussion_r802634913
2022-02-10 10:51:27 +00:00
Ben Thorner
7f4b140f97 Rename function to make it consistent
This is consistent with the new "on_date" function. It was going
off the edge of my screen before in some parts of the code.
2022-02-09 17:39:08 +00:00
Ben Thorner
1213463b8e Only aggregate status when necessary for a service
This takes a similar approach to the nightly deletion task so that
we only create sub-tasks when there are actually notifications to
aggregate for a given type and day [1].

We're making this change to stop the duplication errors we're getting
at the moment and ensure the task can scale to more messages and more
services. There are two parts to this:

- Each subtask should now run within the 5 minute visibility timeout.
However, they may still be duplicated if the parent task overruns [2].

- The parent task creates a mininal number of subtasks, and the query
to determine this is very fast for a normal process day (milliseconds).

Since all tasks will run quickly, there should be no more duplication.

In order to test this more nuanced task, I rewrote the tests:

- One test checks the subtask is called correctly.
- One test checks we create all the right subtasks.

[1]: https://github.com/alphagov/notifications-api/pull/3381
[2]: https://docs.google.com/document/d/1MaP6Nyy3nJKkuh_4lP1wuDm19X8LZITOLRd9n3Ax-xg/edit#heading=h.q3intzwqhfzl
2022-02-09 17:39:07 +00:00
Ben Thorner
c8db58d0e8 Reorder loops for creation status agg sub tasks
This will help tailor the innermost loop on services.
2022-02-09 17:39:06 +00:00
Ben Thorner
d6678b6a70 Remove unnecessary logs from status aggreagtion
These can be inferred elsewhere:

- Task creation is obvious from task execution. If we're concerned
about a specific service, we can check the updated times on the DB
records, since all records are recreated each time this runs.

- Task starting is already logged.

- Task completion is already logged. The number of rows updated can
also be inferred from the DB.

The log I've found useful is the one about fetching the data, and
I've also added another to time how long it takes to insert the data,
as both could be sources of poor performance.

Arguably we should use metrics for this sort of thing, but logs are
easier in practice for the metric systems we have.
2022-02-09 17:39:05 +00:00
Ben Thorner
018a253b6f Revert "Revert running status aggregation in parallel"
This reverts commit 0f6dea0deb.
2022-02-09 17:39:00 +00:00
Pea Tyczynska
00b63ca5c7 Migration adding cancelled_by_api_key_id to broadcast_message
table. Also add created_by_api_key_id column that will supersede
api_key_id_column.

We do migration and code in separate PRs to make it easier to
downgrade if anything goes wrong.
2022-02-09 17:26:37 +00:00
Pea Tyczynska
9ce6d2fe92 Merge pull request #3449 from alphagov/revert-3440-audit-api-key-id-when-cancelling-broadcast-via-api
Revert "Audit api key id when cancelling broadcast via api"
2022-02-09 11:12:28 +00:00
Pea Tyczynska
a780933893 Revert "Audit api key id when cancelling broadcast via api" 2022-02-09 11:01:39 +00:00
Pea Tyczynska
d05bff9efc Merge pull request #3440 from alphagov/audit-api-key-id-when-cancelling-broadcast-via-api
Audit api key id when cancelling broadcast via api
2022-02-09 10:15:03 +00:00
Pea Tyczynska
f2bef7392e api_key_id column will have to be dropped in a separate PR
This is to avoid situation where we deploy migration already
but don't deploy code yet and they are not compatible for a while
which leads to errors.
2022-02-08 15:57:35 +00:00
Chris Hill-Scott
0614a70764 Merge pull request #3445 from alphagov/bump-utils-53.0.0
Bump utils to 53.0.0
2022-02-08 09:56:18 +00:00
Chris Hill-Scott
7f72d3a60f Bump utils to 53.0.0
Changes:

53.0.0
---

* `notifications_utils.columns.Columns` has moved to
  `notifications_utils.insensitive_dict.InsensitiveDict`
* `notifications_utils.columns.Rows` has moved to
  `notifications_utils.recipients.Rows`
* `notifications_utils.columns.Cell` has moved to
  `notifications_utils.recipients.Cell`

52.0.0
---

* Deprecate the following unused `redis_client` functions:
  - `redis_client.increment_hash_value`
  - `redis_client.decrement_hash_value`
  - `redis_client.get_all_from_hash`
  - `redis_client.set_hash_and_expire`
  - `redis_client.expire`

51.3.1
---

* Bump govuk-bank-holidays to cache holidays for next year.
2022-02-08 09:45:10 +00:00
David McDonald
79766aeabd Merge pull request #3447 from alphagov/remove-unused-functions
Remove unused functions
2022-02-07 16:20:43 +00:00
David McDonald
1d8fafcdf4 Remove unused functions
Can't see these being used anywhere so lets get rid of them
2022-02-07 15:58:04 +00:00
David McDonald
4dfdb837bb Merge pull request #3446 from alphagov/where-to-get-creds
Update where to get creds for sms providers
2022-02-07 12:09:49 +00:00
David McDonald
d82c84c358 Update where to get creds for sms providers
Due to change in https://github.com/alphagov/notifications-credentials/pull/261
2022-02-07 10:46:05 +00:00
Leo Hemsted
4018fcba28 Merge pull request #3442 from alphagov/celery-docker
add script to run celery from within docker
2022-02-03 12:51:01 +00:00
Chris Hill-Scott
19704a3e49 Merge pull request #3444 from alphagov/allow-admin-to-specify-domain-for-password-reset
Allow admin app to specify domain for password reset
2022-02-03 09:15:03 +00:00
Pea Tyczynska
0737eceddb Include check constraint in migration script
Add check constraint that created_by_id should not be null, unless
created_by_api_key_id is not null to the migration script. It is
already in the models file.

Also remove check constraint for cancelled_by_id from models, as
this field would only be filled for broadcasts with cancelled
status.

Also add some spacing in that migration script so it is easier
to read.
2022-02-02 17:21:58 +00:00
Chris Hill-Scott
07f584e1d5 Allow admin app to specify domain for password reset
This follows the pattern for invite emails where the admin app tells the
API which domain to use when generating the link.

This will starting working once this admin change is merged:
- [ ] https://github.com/alphagov/notifications-admin/pull/4150/files

It won’t break anything if it’s merged before the admin change.
2022-02-02 17:15:09 +00:00
Pea Tyczynska
ac5967bc5a Merge pull request #3430 from alphagov/rename_billing_report_column_sms_fragments
Rename sms_fragments to sms_chargeable_units
2022-02-02 10:00:30 +00:00
Rebecca Law
c58246f558 Merge pull request #3441 from alphagov/move-log-message
Moved log message inside the if statement where it actually happens.
2022-02-02 08:31:07 +00:00
Rebecca Law
09c8fbe982 Merge pull request #3418 from alphagov/letters-too-long
Mark letters as validation-failed if the templated letter is too long.
2022-02-02 08:30:50 +00:00
Leo Hemsted
39848e6df0 move environment variables to their own lines and set -eu
this means that if the environment variable can't be set (for example,
if you don't have aws-cli installed) then there's a suitable error
message early on.
2022-02-01 16:29:09 +00:00
Leo Hemsted
1f3785a7a3 add script to run celery from within docker
as a team we primarily develop locally. However, we've been experiencing
issues with pycurl, a subdependency of celery, that is notoriously
difficult to install on mac. On top of the existing issues, we're also
seeing it conflict with pyproj in bizarre ways (where the order of
imports between pyproj and pycurl result in different configurations of
dynamically linked C libraries being loaded.

You are encouraged to attempt to install pycurl locally, following these
instructions: https://github.com/alphagov/notifications-manuals/wiki/Getting-Started#pycurl

However, if you aren't having any luck, you can instead now run celery
in a docker container.

`make run-celery-with-docker`

This will build a container, install the dependencies, and run celery
(with the default of four concurrent workers).

It will pull aws variables from your aws configuration as boto would
normally, and it will attempt to connect to your local database with the
user `postgres`. If your local database is configured differently (for
example, with a different user, or on a different port), then you can
set the SQLALCHEMY_DATABASE_URI locally to override that.
2022-02-01 16:29:08 +00:00
Rebecca Law
8bf245f63a Moved log message inside the if statement where it actually happens.
This caught me out the other day when I was testing. I thought antivirus
was enable, seeing this message didn't help with that assumption. Plus
there is no point in showing the log if we don't actually call the task.
2022-01-27 15:05:07 +00:00
Chris Hill-Scott
df28f253fa Merge pull request #3427 from alphagov/remove-name-uniqueness-endpoints
Remove endpoints for checking name uniqueness
2022-01-27 13:17:41 +00:00
Pea Tyczynska
82f08f230c Save api key id when cancelling broadcast by API call
This is so that we can audit who cancelled the broadcast if
there are any issues.
2022-01-26 17:26:58 +00:00
Pea Tyczynska
e1a8219eb1 DB Migration to allow auditing api key id when cancelling broadcast
via API.
2022-01-26 17:26:05 +00:00
Ben Thorner
0d71ee69f0 Revert increased timeout for reporting worker
This reverts commit 603acc8b1e +
This reverts commit edad1c9a21.

The cause of the slowness was fixed in [1] and since [2] we now
have data to prove it: each query to get the data is taking under
5 minutes, so it's safe to lower the timeout again.

[1]: https://github.com/alphagov/notifications-api/pull/3417
[2]: https://github.com/alphagov/notifications-api/pull/3437
2022-01-25 12:50:43 +00:00
Leo Hemsted
19a11e57d2 Merge pull request #3432 from alphagov/cryptography
unpin cryptography
2022-01-24 15:19:31 +00:00
Ben Thorner
68981a8b0d Merge pull request #3436 from alphagov/stop-restart-reporting-180693991
Stop killing reporting processes after each task
2022-01-24 13:39:13 +00:00
Ben Thorner
7ad0c4103a Stop killing reporting processes after each task
Previously we think this setting was necessary to avoid a memory
leak [1], but it's unclear if this is still an issue:

- We've advanced two major versions of Celery.
- Some of the tasks are now quicker and leaner.

Restarting worker sub-processes after each task is a big problem
for performance, as we move towards parallelising our reporting.

This is something of a test to see if we can manage without this
setting. Note that we need to unset the variable manually:

   cf unset-env notify-delivery-worker-reporting CELERYD_MAX_TASKS_PER_CHILD

In the worst case we can always re-run any failed tasks. To check
the worker is still behaving as expected, we can:

- Monitor CPU / memory graphs for it.
- Check `cf events` for unexpected restarts / crashes.
- Compare numbers of task completion logs to previous days.
- Check the number of new billing / status rows looks right.

[1]: ad419f7592
2022-01-24 12:52:52 +00:00