Commit Graph

3329 Commits

Author SHA1 Message Date
Leo Hemsted
f5198bf71d remove unnecessary job_types arg from remove_csv_files celery tasks 2019-01-22 10:31:37 +00:00
Leo Hemsted
e1760adcd3 suppress cronitor request errors 2019-01-18 15:36:53 +00:00
Leo Hemsted
754c65a6a2 create cronitor decorator that alerts if tasks fail
make a decorator that pings cronitor before and after each task run.
Designed for use with nightly tasks, so we have visibility if they
fail. We have a bunch of cronitor monitors set up - 5 character keys
that go into a URL that we then make a GET to with a self-explanatory
url path (run/fail/complete).

the cronitor URLs are defined in the credentials repo as a dictionary
of celery task names to URL slugs. If the name passed in to the
decorator  isn't in that dict, it won't run.

to use it, all you need to do is call `@cronitor(my_task_name)`
instead of `@notify_celery.task`, and make sure that the task name and
the matching slug are included in the credentials repo (or locally,
json dumped and stored in the CRONITOR_KEYS environment variable)
2019-01-18 15:36:53 +00:00
Leo Hemsted
d3d56a3224 separate nightly tasks and other scheduled tasks.
other tasks is anything that is run on a different frequency than
nightly
2019-01-18 15:36:53 +00:00
Rebecca Law
39c9b86c35 Merge pull request #2310 from alphagov/remove-used-query
Remove unused query
2019-01-18 10:47:45 +00:00
Pea (Malgorzata Tyczynska)
64191a93c2 Merge pull request #2308 from alphagov/precompiled_postage_response
Return notification postage in response for .post_precompiled_letter_notification
2019-01-18 10:19:46 +00:00
Rebecca Law
6ac1f39fd0 Remove dao_fetch_monthly_historical_stats_by_template, a query using NotificationHistory that is no longer used. 2019-01-17 17:20:21 +00:00
Rebecca Law
8d36d72494 Merge pull request #2306 from alphagov/drop-stats_template_usage_by_month
Drop stats_template_usage_by_month table as it is no longer needed.
2019-01-17 15:39:35 +00:00
Alexey Bezhan
90dc69b6bc Merge pull request #2303 from alphagov/ft-status-template-statistics
Change template statistics endpoint to use fact_notification_status_dao
2019-01-17 15:04:20 +00:00
Pea Tyczynska
9ab97d3481 Return notification postage in response for .post_precompiled_letter_notification 2019-01-16 16:57:57 +00:00
Pea (Malgorzata Tyczynska)
276a9a3828 Merge pull request #2293 from alphagov/choose_postage_for_precompiled
Choose postage on POST request for precompiled letters
2019-01-16 14:13:26 +00:00
Rebecca Law
e148eca6ff Drop stats_template_usage_by_month table as it is no longer needed. 2019-01-15 16:55:56 +00:00
Rebecca Law
a4d89359c5 Adding a filter to exclude test keys for the template monthly usage query.
Added a test.
2019-01-15 16:13:38 +00:00
Pea Tyczynska
ac3832a918 Remove old redis template cache 2019-01-15 14:46:40 +00:00
Pea Tyczynska
d36c4d8a78 Remove now unused methods that populated template usage redis cache 2019-01-15 14:38:45 +00:00
Pea Tyczynska
3ce0024eec Remove unused functions for getting template statistics 2019-01-15 12:15:20 +00:00
Pea Tyczynska
52831813d8 Change template statistics endpoint to use fact_notification_status_dao 2019-01-15 11:55:45 +00:00
Pea Tyczynska
5ebeb9937a Avoid call to database to get template in persist_notifications 2019-01-14 17:53:06 +00:00
Alexey Bezhan
876346f469 Add an option to group notification stats for 7 days by template
Currently, admin app requests service statistics (with notification
counts grouped by status) and template statistics (with counts by
template) in order to display the service dashboard.

Service statistics are gathered from FactNotificationStatus table
(counts for the last 7 days) combined with Notification (counts for
today).

Template statistics are currently gathered from redis cache, which
contains a separate counter per template per day. It's hard for us
to maintain consistency between redis and DB counts. Currently it
doesn't update the count for cancelled letters, counter resets in
the middle of the day might produce a wrong result for the rest of
the week and cleared redis cache can't be repopulated for services
with low data retention periods).

Since FactNotificationStatus already contains separate counts for
each template_id we can use the existing logic with some additional
filters to get separate counts for each template and status combination,
which would allow us to populate the service dashboard page from one
query response.
2019-01-14 16:58:57 +00:00
Rebecca Law
efad58edd8 There is no need to have a separate table to store template monthly statistics. It's easy enough to aggregate the stats from ft_notification_status.
This removes the nightly task, and all the dao methods.
The next PR will remove the table.
2019-01-14 16:30:36 +00:00
Rebecca Law
79f49ebdc2 Merge pull request #2298 from alphagov/use-ft-notification-statu-for-monthly-template-usage
Use ft_notification_status for monthly template usage
2019-01-14 15:43:41 +00:00
Rebecca Law
b5a3ef9576 Added order by.
Added more unit tests.
Remove comments.
2019-01-14 15:28:26 +00:00
Katie Smith
4f917067a5 Merge pull request #2295 from alphagov/move-unopenable-precompiled-letters
Move letters which can't be opened to invalid PDF bucket
2019-01-14 12:56:45 +00:00
Alexey Bezhan
5336a72d0f Merge pull request #2291 from alphagov/set-db-statement-timeout
Set statement timeout on all DB connections
2019-01-14 10:24:07 +00:00
Rebecca Law
92460fbd38 Merge branch 'master' into use-ft-notification-statu-for-monthly-template-usage 2019-01-11 17:11:23 +00:00
Rebecca Law
c3c9d1eac9 Add unit tests.
Fix data types in result set.
2019-01-11 17:09:42 +00:00
Katie Smith
a9b755b08c Move letters which can't be opened to invalid PDF bucket
If a precompiled letter can't be opened (e.g. because it isn't a valid
PDF) we were setting its billable units to 0, but not moving it to the
invalid PDF bucket. If a precompiled letter failed sanitisation, we were
moving it to the invalid PDF bucket but not setting its billable units
to 0.

This commit makes sure that we always set the billable units to 0
and move the PDF to the right bucket if it fails sanitisation or can't be
opened.
2019-01-11 16:59:07 +00:00
Pea (Malgorzata Tyczynska)
cfb55daa12 Merge pull request #2292 from alphagov/exclude_cancelled_from_requested
Remove cancelled from requested statuses in service statistics
2019-01-11 15:01:20 +00:00
Pea Tyczynska
bd5126481c Remove cancelled from requested statuses in service statistics 2019-01-11 14:27:54 +00:00
Pea Tyczynska
685bff40d1 Stop validate function from being too complex by moving subfunctions out of it 2019-01-10 17:31:32 +00:00
Rebecca Law
507138cc94 Create a new query for template monthly stats. 2019-01-10 16:24:51 +00:00
Leo Hemsted
20fe055ac9 remove unused usage functions 2019-01-10 16:08:15 +00:00
Pea Tyczynska
5a1094b6fd Throw error if postage parameter for precompiled POST request incorrect 2019-01-10 16:04:06 +00:00
Pea (Malgorzata Tyczynska)
c5e5c86982 Merge pull request #2290 from alphagov/its_not_a_failure_to_cancel
Cancelled notifications don't show as failures in statistics
2019-01-10 15:09:30 +00:00
Pea Tyczynska
56bae2b077 Allow users to set postage per precompiled letter 2019-01-09 17:49:19 +00:00
Alexey Bezhan
4a26ee1813 Set statement timeout on all DB connections
A recent issue with a long-running query (#2288) highlighted the
fact that even though the original HTTP connection might be closed
(for example after gorouter timeout of 15 minutes, which returns a
504 response to the client), the request worker will not be stopped.

This means that the worker is spending time and potentially DB
resources generating a response that will never be delivered.

Gunicorn's timeout setting only applies to sync workers and there
doesn't seem to be an option to interrupt individual requests in
gevent/eventlet deployments.

Since the most likely (and potentially most dangerous) scenario for
this is a long-running DB query, we can set a statement timeout on
our DB connections. This will raise a sqlalchemy.exc.OperationalError
(wrapping psycopg2.extensions.QueryCanceledError), interrupting the
request after the given timeout has been reached.

This is a Postgres client setting, so the database itself will abort
the transaction when it reaches the set timeout.

Since this will also apply to our celery tasks (including potentially
long-running nightly tasks) we set a timeout of 20 minutes to begin
with.

This can potentially be split in the future to set a different value
for each app, so that we could limit API requests even more.
2019-01-09 14:36:50 +00:00
Alexey Bezhan
1719f31909 Merge pull request #2284 from alphagov/add-count-pages-flag-to-service-notifications
Don't return pagination links for API Message log requests
2019-01-09 14:36:36 +00:00
Leo Hemsted
3e21f57481 fix platform admin stats row-order bug
now that we're reading from two tables (ft_notification_status and
notifications) for stats, we'll get a couple of rows for each
notification type. If a service doesn't have any rows in one of those
tables, the query will return a row with nulls for the notification
types and counts. Some services will have history but no stats from
today, others will have data from today but no history.

This commit acknowledges that any row might have nulls, not just the
first row.
2019-01-09 11:43:40 +00:00
Leo Hemsted
5d838415d3 fix filter to look at right table
a query for notifications was filtering on FtNotificationStatus - we
aren't joining to that table in the query, so sqlalchemy added a cross
join between ft_notification_status (3.7k rows) and Notifications (3.9m
rows), resulting in a 1.3 trillion row materialised table. This query
took 17 hours and pending.

Also, remove orders from querys other than the outer one, since we're
grouping anyway.
2019-01-09 11:26:08 +00:00
Pea Tyczynska
e3a79e80c9 Cancelled notifications don't show as failures in statistics 2019-01-08 17:50:34 +00:00
Alexey Bezhan
47c403f6ab Don't return pagination links for API Message log requests
Flask-SQLAlchemy paginate function issues a separate query to get
the total count of rows for a given filter. This query (with
filters used by the API integration Message log page) is slow for
services with large number of notifications.

Since Message log page doesn't actually allow users to paginate
through the response (it only shows the last 50 messages) we can
use limit instead of paginate, which requires passing in another
flag from admin to the dao method.

`count` flag has been added to `paginate` in March 2018, however
there was no release of flask-sqlalchemy since then, so we need
to pull the dev version of the package from Github.
2019-01-08 13:22:27 +00:00
Rebecca Law
8fbe60bb90 Remove unused query 2019-01-07 15:37:26 +00:00
Rebecca Law
4522547753 Merge branch 'master' into optimise-platform-admin-live-services 2019-01-07 15:25:50 +00:00
Rebecca Law
bd9a6352fd Optimise the query for getting the platform statistics for all services. The page should render for all time after this change.
This is one step closer to eliminating the need to read from NotificationHistory.
2019-01-04 16:45:39 +00:00
Leo Hemsted
021625abb3 make sure log line works if notification still in created 2019-01-03 17:08:17 +00:00
Leo Hemsted
2355ee011f log more info when we receive multiple delivery callbacks for one notification
Previously, we logged a warning containing the notification reference
and new status. However it wasn't a great message - this new one
includes the notification id, the old status, the time difference and
more.

This separates out logs for callbacks for notifications we don't know
(error level) and duplicates (info level).
2019-01-03 17:08:16 +00:00
Pea (Malgorzata Tyczynska)
60a384e687 Merge pull request #2269 from alphagov/choose_postage_on_template
Choose postage on template
2019-01-02 14:16:38 +00:00
Rebecca Law
39963d9784 Created a query to get the notification status counts per notification type and service for all service for a given date range.
The query follows the same pattern as the other queries, getting the statistics from the fact_notification_status table for dates older than today and union that with today.
Tests required.
2018-12-31 16:08:08 +00:00
Rebecca Law
941e14f71a Added the limit to the query for the services with data retention.
Also did a bit of refactoring.
2018-12-27 14:00:53 +00:00
Pea Tyczynska
4929a6ac08 Include postage in checking if template changed 2018-12-21 16:37:52 +00:00