Commit Graph

1356 Commits

Author SHA1 Message Date
Rebecca Law
b170b5ed80 This change is a temporary fix to allow users for high volume services to use the admin app.
The trouble is the aggregate query to return the big blue numbers on the dashboard and /notifications/{notification_type} page is taking too long to return.
I have some ideas on how to improve the query, but should take some time to do some more research and test. In the meantime, let's just ignore "todays" total numbers for the high volume services. There are only two services that this will affect.
2021-06-02 10:31:38 +01:00
Rebecca Law
ed5e3b3d9c Removed the end date in the filter.
It's always going to be in the future anyway.
After some analysis the query does perform better without it.
I'll make a note to update other queries where we get todays
notification data to remove the end date filter in a separate PR.
2021-05-26 13:47:53 +01:00
Rebecca Law
782514a0f1 Update the dao_fetch_todays_stats_for_service query.
We have an index on Notifications(service_id, created_at), by updating the query to use between created_at rather than date(created_at) this query will use the index. Changing the query plan to use an index scan rather than a sequence scan, see query plans below.
This query is still rather slow but is improved by this update.

https://www.pivotaltracker.com/story/show/178263480

explain analyze
SELECT notification_type, notification_status, count(id)
FROM notifications
WHERE service_id = 'e791dbd4-09ea-413a-b773-ead8728ddb09'
AND date(created_at) = '2021-05-23'
AND key_type != 'test'
GROUP BY notification_type, notification_status;
                                                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=6326816.31..6326926.48 rows=24 width=22) (actual time=91666.805..91712.976 rows=10 loops=1)
   Group Key: notification_type, notification_status
   ->  Gather Merge  (cost=6326816.31..6326925.88 rows=48 width=22) (actual time=91666.712..91712.962 rows=30 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=6325816.28..6325920.31 rows=24 width=22) (actual time=91662.907..91707.027 rows=10 loops=3)
               Group Key: notification_type, notification_status
               ->  Sort  (cost=6325816.28..6325842.23 rows=10379 width=30) (actual time=91635.890..91676.225 rows=270884 loops=3)
                     Sort Key: notification_type, notification_status
                     Sort Method: external merge  Disk: 10584kB
                     Worker 0:  Sort Method: external merge  Disk: 10648kB
                     Worker 1:  Sort Method: external merge  Disk: 10696kB
                     ->  Parallel Seq Scan on notifications  (cost=0.00..6325123.93 rows=10379 width=30) (actual time=0.036..91513.985 rows=270884 loops=3)
                           Filter: (((key_type)::text <> 'test'::text) AND (service_id = 'e791dbd4-09ea-413a-b773-ead8728ddb09'::uuid) AND (date(created_at) = '2021-05-23'::date))
                           Rows Removed by Filter: 16191366
 Planning Time: 0.760 ms
 Execution Time: 91714.500 ms
(17 rows)

explain analyze
SELECT notification_type, notification_status, count(id)
FROM notifications
WHERE service_id = 'e791dbd4-09ea-413a-b773-ead8728ddb09'
AND created_at  >= '2021-05-22 23:00'
and created_at < '2021-05-23 23:00'
AND key_type != 'test'
GROUP BY notification_type, notification_status;
                                                                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=2114273.37..2114279.57 rows=24 width=22) (actual time=21032.076..21035.725 rows=10 loops=1)
   Group Key: notification_type, notification_status
   ->  Gather Merge  (cost=2114273.37..2114278.97 rows=48 width=22) (actual time=21032.056..21035.703 rows=30 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=2113273.35..2113273.41 rows=24 width=22) (actual time=21029.261..21029.265 rows=10 loops=3)
               Sort Key: notification_type, notification_status
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 25kB
               Worker 1:  Sort Method: quicksort  Memory: 25kB
               ->  Partial HashAggregate  (cost=2113272.56..2113272.80 rows=24 width=22) (actual time=21029.228..21029.230 rows=10 loops=3)
                     Group Key: notification_type, notification_status
                     ->  Parallel Bitmap Heap Scan on notifications  (cost=114455.71..2111695.14 rows=210322 width=30) (actual time=4983.790..20960.581 rows=271217 loops=3)
                           Recheck Cond: ((service_id = 'e791dbd4-09ea-413a-b773-ead8728ddb09'::uuid) AND (created_at >= '2021-05-22 23:00:00'::timestamp without time zone) AND (created_at < '2021-05-23 23:00:00'::timestamp without time zone))
                           Rows Removed by Index Recheck: 1456269
                           Filter: ((key_type)::text <> 'test'::text)
                           Heap Blocks: exact=12330 lossy=123418
                           ->  Bitmap Index Scan on ix_notifications_service_created_at  (cost=0.00..114329.51 rows=543116 width=0) (actual time=4973.139..4973.140 rows=813671 loops=1)
                                 Index Cond: ((service_id = 'e791dbd4-09ea-413a-b773-ead8728ddb09'::uuid) AND (created_at >= '2021-05-22 23:00:00'::timestamp without time zone) AND (created_at < '2021-05-23 23:00:00'::timestamp without time zone))
 Planning Time: 0.191 ms
 Execution Time: 21035.770 ms
(21 rows)
2021-05-25 08:00:24 +01:00
Leo Hemsted
c190886bfe tweak webauthn rest errors
simplify logic by changing the dao function to require a user id and a
webauthn cred id. Note that this changes the response from a 400 to a
404 if the cred is for a different user than the supplied id.

give a minimum length to the text fields in POSTS to create/update a
credential to avoid surprising unexpected edge cases involving empty
string names etc.
2021-05-12 17:48:38 +01:00
Leo Hemsted
e62e050963 add webauthn crud endpoints
added some simple validation to the delete endpoint for sanity, but
generally my assumption is that more validation will happen on the admin
side.

noteably im not checking whether the credentials are duplicated, nor is
there a uniqueness constraint in the database - I'm not sure if the
credential blob will always reliably be equivalent, and I believe the
browser should hopefully take care of dupes.
2021-05-12 17:48:37 +01:00
Katie Smith
1767535def Allow service.allowed_broadcast_provider to be "all"
We want to replace the value `None` for
service.allowed_broadcast_provider with the value of "all". As a first
step, we need to allow both values. Once notifications-admin has been
changed to pass through "all" and all the data in the database has been
updated, we can update the code to stop supporting both values.
2021-05-06 15:32:02 +01:00
Katie Smith
46fe3fca23 Merge pull request #3230 from alphagov/zipfile-names
Change letter zip file names for Insolvency Service letters
2021-05-06 13:57:18 +01:00
Katie Smith
8a34dccda0 Remove redundant join
This was left over from when we needed to tell if a notification was
sent by a crown or non-crown service.
2021-05-06 09:34:46 +01:00
Ben Thorner
bd45d788c0 Increase warning threshold for SMS failures
Second attempt [1]. This increases the threshold so:

- It's a more substantial amount of money lost (£16).

- It's 10% of the minimum free allowance for a service.

- It's greater than the threshold we have for TV numbers (500).

Having a higher threshold for this alert will help prevent wasted
effort investigating more negligible failures, and reduces the
ambiguity of whether we should take action: we should.

[1]: https://github.com/alphagov/notifications-api/pull/3221
2021-05-05 17:54:43 +01:00
Rebecca Law
4f196316aa Change the query to get the services to purge to use query on the db.Model rather than db.session.query.
`service_ids_to_purge` is a list of `row` object rather than a list of `UUID`.

NOTE: db.session.query(Service).filter(Service.id.notin_(services_with_data_retention)).all() would have also worked. It seems that only selecting attributes from the db.Model has caused the change.
2021-04-29 13:32:36 +01:00
Rebecca Law
68d28aa83b The update of SQLAlchemy 1.4.10 has caused some conflicts in our code. This PR fixes most of those conflicts.
- sqlalchemy.sql.expression.case must include an else statement.
- clearly define list of columns for inbound_sms_history insert, getting the list from InboundSmsHistory.__table__.c was causing data type errors.
- remove relationships when not needed, the foreign key relationship is established in the creation of the column. This will get rid of the warnings referenced here: http://sqlalche.me/e/14/qzyx.
- update queries now that he user relationship in ServiceUser db model has been removed.
- move the check that a template is archived to the view instead of the dao method. The check was clearing the session before the version history could be done.

Deleting notifications in the night tasks still needs to be
investigated. The raw sql is causing an error.
2021-04-29 13:32:36 +01:00
Rebecca Law
85895a9e8b Revert "Scheduled weekly dependency update for week 16" 2021-04-28 10:17:16 +01:00
Rebecca Law
f941768d8c Change the query to get the services to purge to use query on the db.Model rather than db.session.query.
`service_ids_to_purge` is a list of `row` object rather than a list of `UUID`.

NOTE: db.session.query(Service).filter(Service.id.notin_(services_with_data_retention)).all() would have also worked. It seems that only selecting attributes from the db.Model has caused the change.
2021-04-27 08:36:34 +01:00
Rebecca Law
1b070d69a1 The update of SQLAlchemy 1.4.10 has caused some conflicts in our code. This PR fixes most of those conflicts.
- sqlalchemy.sql.expression.case must include an else statement.
- clearly define list of columns for inbound_sms_history insert, getting the list from InboundSmsHistory.__table__.c was causing data type errors.
- remove relationships when not needed, the foreign key relationship is established in the creation of the column. This will get rid of the warnings referenced here: http://sqlalche.me/e/14/qzyx.
- update queries now that he user relationship in ServiceUser db model has been removed.
- move the check that a template is archived to the view instead of the dao method. The check was clearing the session before the version history could be done.

Deleting notifications in the night tasks still needs to be
investigated. The raw sql is causing an error.
2021-04-26 11:50:30 +01:00
Rebecca Law
ae57521b39 Simplify the get_free_sms_fragment limit for the case when the row is
missing, by setting the free allowance to the default.
2021-04-19 13:29:04 +01:00
Rebecca Law
d4009ffc52 Rename database management functions.
Rename @transactional to @autocommit.
Rename nested_transaction to tranaction.
2021-04-19 10:56:00 +01:00
Rebecca Law
93908bacda New strategy for transaction management.
Introduce a contextmanger function to handle exceptions and nested
transactions. Using the nested_transaction will start a
nested transaction with `db.session.begin_nested`, once the nested
transaction is complete the commit will happen.
`@transactional` has been updated to commit unless in a nested
transaction.
2021-04-14 07:04:17 +01:00
Rebecca Law
cf35135605 Adding @nested_transactional for transactions that require more than one
db update/insert.

Using a savepoint for the multiple transactions allows us to rollback if
there is an error when executing the second db transaction.
However, this does add a bit of complexity. Developers need to manage
the db session when calling multiple nested tranactions.

Unit tests have been added to test this functionality and some end to
end tests have been done to make sure all transactions are rollback if
there is an exception while executing the transaction.
2021-04-14 07:03:57 +01:00
Rebecca Law
69e5ddae4f When a service is associated with a organisation set the free allowance to
the default free allowance for the organisation type.

The update/insert for the default free allowance is done in a separate
transaction. Updates to services need to happen in a transaction to
trigger the insert into the ServicesHistory table. For that reason the
call to set_default_free_allowance_for_service is done after the service
is updated.
I've added a try/except around the set_default_free_allowance_for_service call to ensure we still get the update to the service but get an exception log if the update to annual_billing fails. I believe it's important to preserve the update to the service in the unlikely event that the annual_billing upsert fails.
2021-04-14 07:03:57 +01:00
Rebecca Law
da8a7a8db1 Avoid key errors by setting the year_start with 2020 or 2021
Remove db.create_service_with_organisation method
Update comment in command
2021-03-30 09:08:04 +01:00
Rebecca Law
7da5abc17b The free sms allowances are changing for the financial year starting
April 1 2021.

In this PR there is a command to set annual_billing for all active
services with the the new defaults.

The new method `set_default_free_allowance_for_service` will also be
called in a PR to follow that will set a services free allowance to the
default if the organisation for the service is changed.
2021-03-29 13:32:00 +01:00
Rebecca Law
057c4e4568 Quick fix to ensure that billing doesn't fail if the crown is not set
for the service.

The letters rates for cronw and non crown are the same. It would be nice
to remove the need for crown but for now this is a quick fix.
2021-03-25 08:42:46 +00:00
Pea Tyczynska
4c3d70fd55 Update usage endpoint with billing details for orgs and services 2021-03-19 16:49:48 +00:00
Ben Thorner
c76e789f1e Reduce extra S3 ops when working with letter PDFs
Previously we did some unnecessary work:

- Collate task. This had one S3 request to get a summary of the object,
which was then used in another request to get the full object. We only
need the size of the object, which is included in the summary [1].

- Archive task. This had one S3 request to get a summary of the object,
which was then used to make another request to delete it. We still need
both requests, but we can remove the S3.Object in the middle.

[1]: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#objectsummary
2021-03-16 12:53:13 +00:00
Ben Thorner
ff7eebc90a Simplify deleting old letters
Previously we made a call to S3 to list objects for a letter, even
though we already had the precise key of the single object to hand.
This removes the one usage of "get_s3_bucket_objects" and uses the
filename directly in the call to remove the object.
2021-03-15 17:18:20 +00:00
Leo Hemsted
6784ae62a6 Raise Exception if letter PDF not in S3
Previously, the function would just return a presumed filename. Now that
it actually checks s3, if the file doesn't exist it'll raise an
exception. By default that's a StopIteration at the end of the bucket
iterator, which isn't ideal as this will get supressed if the function
is called within a generator loop further up or anything.

There are a couple of places where we expect the file may not exist, so
we define a custom exception to rescue specifically here. I did consider
subclassing boto's ClientError, but this wasn't straightforward as the
constructor expects to know the operation that failed, which for me is a
signal that it's not an appropriate (re-)use of the class.
2021-03-15 17:18:11 +00:00
Ben Thorner
b43a367d5f Relax lookup of letter PDFs in S3 buckets
Previously we generated the filename we expected a letter PDF to be
stored at in S3, and used that to retrieve it. However, the generated
filename can change over the course of a notification's lifetime e.g.
if the service changes from crown ('.C.') to non-crown ('.N.').

The prefix of the filename is stable: it's based on properties of the
notification - reference and creation - that don't change. This commit
changes the way we interact with letter PDFs in S3:

- Uploading uses the original method to generate the full file name.
The method is renamed to 'generate_' to distinguish it from the new one.

- Downloading uses a new 'find_' method to get the filename using just
its prefix, which makes it agnostic to changes in the filename suffix.

Making this change helps to decouple our code from the requirements DVLA
have on the filenames. While it means more traffic to S3, we rely on S3
in any case to download the files. From experience, we know S3 is highly
reliable and performant, so don't anticipate any issues.

In the tests we favour using moto to mock S3, so that the behaviour is
realistic. There are a couple of places where we just mock the method,
since what it returns isn't important for the test.

Note that, since the new method requires a notification object, we need
to change a query in one place, the columns of which were only selected
to appease the original method to generate a filename.
2021-03-15 13:55:44 +00:00
David McDonald
41d95378ea Remove everything for the performance platform
We no longer will send them any stats so therefore don't need the code
- the code to work out the nightly stats
- the performance platform client
- any configuration for the client
- any nightly tasks that kick off the sending off the stats

We will require a change in cronitor as we no longer will have this task
run meaning we need to delete the cronitor check.
2021-03-15 12:04:53 +00:00
Leo Hemsted
ebd4eda8bd remove duplicate dao invite fns and improve naming 2021-03-12 13:56:05 +00:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
David McDonald
a2cc0df5a7 Merge pull request #3167 from alphagov/broadcast_services_history
Add service versioning to broadcast account type change
2021-03-11 17:07:24 +00:00
Rebecca Law
19f7a6ce38 Refactor method for deciding the failure type 2021-03-10 14:39:55 +00:00
Rebecca Law
a7a504a599 Merge pull request #3173 from alphagov/performance-platform-endpoints
Add an endpoint to return all the data required for the performance platform page
2021-03-10 13:27:08 +00:00
Rebecca Law
11d10d5293 Rename to performance_dashboard
Fix totals to return totals for all time rather than for date range.
Added more test data
2021-03-10 13:16:25 +00:00
David McDonald
8cf32d6f22 Add service versioning to broadcast account type change
We are using the `set_broadcast_service_type` route to make changes to
service objects. However, we had forgotten to add the `version_class`
decorator to it which will mean the changing of a service going from
training to live mode will also be recorded in the services_history
table for free. Whilst not essential, this easy change makes things more
consistent for how we update other services.
2021-03-08 14:09:24 +00:00
David McDonald
6b535fe946 Merge pull request #3166 from alphagov/email-auth-broadcast-bug
Email auth broadcast bug
2021-03-05 09:59:59 +00:00
Rebecca Law
b06850e611 Add an endpoint to return all the data required for the performance
platform page.
2021-03-05 09:59:03 +00:00
David McDonald
0ce539704e Fix bug with removing email auth for broadcast service
We accidently were removing the ability for a service to do email auth
if it was a broadcast service with email auth. This fixes it.

Note, it might be up for debate later whether we let broadcast services
use email auth (I think we should) so this might change in time, but we
will fix this bug regardless.

Note, worth glancing at `SERVICE_PERMISSION_TYPES` which contains a list
of permissions that a service might have to make sure I haven't missed
any others. The one that looks potentially dodgy is the
`EDIT_FOLDER_PERMISSIONS` permission but I can't see this being used
anywhere in the API or the admin app so think it is likely now defunct
and a user level permission so we don't need to worry about it.
2021-03-03 18:34:24 +00:00
Rebecca Law
0849070eca Add created_at and updated_at columns to ft_processing_time 2021-02-26 07:49:49 +00:00
Rebecca Law
21edf7bfdd Persist the processing time statistics to the database.
The performance platform is going away soon. The only stat that we do not have in our database is the processing time. Let me clarify the only statistic we don't have in our database that we can query efficiently is the processing time. Any queries on notification_history are too inefficient to use on a web page.
Processing time = the total number of normal/team emails and text messages plus the number of messages that have gone from created to sending within 10 seconds per whole day. We can then easily calculate the percentage of messages that were marked as sending under 10 seconds.
2021-02-26 07:49:49 +00:00
Pea Tyczynska
e0c73ac342 Send daily email with letter and sheet volumes to DVLA 2021-02-23 15:13:19 +00:00
Pea Tyczynska
c8ffebcce8 Query to get letter and sheet volumes
So we can send daily email with these volumes to DVLA.
2021-02-23 15:13:18 +00:00
David McDonald
abb3b3307c Fix flake8 2021-02-16 10:31:12 +00:00
David McDonald
6fcda6debb Make set_as_broadcast_service use a single DB commit
We don't want things in a half state if there is an error during the
method. Therefore, we move it all into a single function that is wrapped
in a transaction.

Note, we copy the approach of
https://github.com/alphagov/notifications-api/blob/master/app/dao/services_dao.py#L293
by having a single new dao function that does all the DB work.
2021-02-16 10:31:11 +00:00
David McDonald
4f7afa3fbe Set provider restriction 2021-02-16 10:31:08 +00:00
David McDonald
3b5d86c854 Add endpoint to set broadcast service channel 2021-02-16 10:31:01 +00:00
Leo Hemsted
4f89be6944 Revert "Merge pull request #3125 from alphagov/revert-retry"
This reverts commit 6b9a50beff, reversing
changes made to 33f93dfea2.
2021-02-09 17:01:04 +00:00
Leo Hemsted
bee0059e53 Revert "Merge pull request #3101 from alphagov/retry-broadcasts"
This reverts commit 1bd99c779d, reversing
changes made to d390eb2cac.
2021-02-08 11:02:34 +00:00
Leo Hemsted
bbae209200 check provider message status etc when sending rather than when retrying
previously if we were deciding whether to retry or not, it meant that
future events wouldn't have context of what the task is doing. We'd
run into issues with not knowing what references to include when
updating/cancelling in future events.

Instead of deciding whether to retry or not, always retry. Instead, when
any event sends, regardless of whether it is a first time or a retry,
check the status of previous events for that broadcast message. There
are a few things that will mean we don't send.

* If the finishes_at time has already elapsed (ie: we have been trying
  to resend this message and haven't had any luck and now the data is
  obselete)
* A previous event has no provider message (this means that we never
  picked the previous event off the queue for some reason)
* A previous event has a provider message that has anything other than
  an ack response. This includes sending (the old message is currently
  being sent), and technical-failure/returned-error (the old message is
  currently in the retry loop, having experienced issues).
2021-02-03 18:11:52 +00:00
Leo Hemsted
96a0935d1c update broadcast provider message status on success/error
so we can distinguish errorring messages that are currently retrying
from those that sent succesfully.
2021-02-03 18:03:16 +00:00