Commit Graph

3246 Commits

Author SHA1 Message Date
David McDonald
946ba993b5 Catch TokenAlgorithmError
Instead of letting it go uncaught and causing an error, we now show the
user an appropriate error message.
2019-12-12 10:23:28 +00:00
David McDonald
c17c9ad1c6 Merge pull request #2673 from alphagov/billing-bug
Bill `NOTIFICATION_PENDING` notifications
2019-12-11 13:35:25 +00:00
Pea Tyczynska
c00f82b81b Co-Authored-By: Chris Hill-Scott <me@quis.cc>
Use .format instead of concatenation to avoid type issues

Trying to concatenate uuid onto a string was throwing an error.

Also it is not possible to use uuid in parametrize statements
it seems as it messes up with running tests on multiple threads
2019-12-11 11:18:42 +00:00
David McDonald
5438a4c126 Remove duplicate test
This test is testing the same things as
`test_fetch_billing_data_for_day_bills_correctly_for_status`
2019-12-10 10:18:00 +00:00
David McDonald
fc8a9c184b Bill NOTIFICATION_PENDING notifications
SMS and emails may be marked as `NOTIFICATION_PENDING`. These will be
billed as they will have been sent to the provider and will eventually
turn to a final state such as `NOTIFICATION_DELIVERED` or
`NOTIFICATION_PERMANENT_FAILURE`.

This change will fix a discrepency on the billing page were the number
of messages being billed was less than the number of messages reported
as sent on a services dashboard when some of those messages were in a
pending state.

In reality, I don't think this bug would have had any longer affects for
incorrect billing as messages would not stay in the pending state for
too long and billing calculations would happen after that point.
2019-12-10 10:07:56 +00:00
Leo Hemsted
6ac4595224 process letters for 10 days when updating ft_notification_status
sms and emails have a very predictable 72 hour lifecycle. letters, on
the other hand, have ridiculously complex lifecycles - they might not
get sent because it's a weekend, they might not get sent because they're
second class and are only processed on alternate days, they might not
get sent because a different letter in the same batch had an error that
we didn't know about. Either way, it's apparent that four days is
definitely not enough time to guarantee that letters have gone from
sending to delivered.

Extend the amount of days we process for letters to 10 days. Keep emails
and sms down at 4 to keep run-times shorter

We're deliberately not thinking about returned letters here at all.
2019-12-09 16:02:43 +00:00
Leo Hemsted
884cb24bfa remove day_start from create nightly notification status
it makes less sense once we introduce different start dates for letters
and emails. Also, we never use it, since we just call the day tasks
ourselves from commands.py
2019-12-09 16:02:21 +00:00
Pea M. Tyczynska
2019070536 Merge pull request #2667 from alphagov/warn-team-about-high-failure-rates
Warn team about high failure rates
2019-12-09 11:28:25 +00:00
Pea Tyczynska
08b12a6443 Test that test key notifications are excluded form tv numbers query 2019-12-06 17:05:43 +00:00
Pea Tyczynska
87bc86efa7 Reference dev runbook for instructions in the zendesk ticket 2019-12-06 17:05:43 +00:00
Pea Tyczynska
1b7b26bf24 Query directly for services with high failure rate 2019-12-06 16:57:56 +00:00
Pea Tyczynska
b8de67ae54 Update error message to include a url to offending service 2019-12-06 16:57:54 +00:00
Pea Tyczynska
339b6c0ec7 Refactor a test so it doesn't do query that's tested elsewhere 2019-12-06 16:57:54 +00:00
Pea Tyczynska
cfbb080f57 Simplify failure rate by building separate query 2019-12-06 16:57:44 +00:00
Pea Tyczynska
53efd87e28 Check for services sending sms messages to tv numbers 2019-12-06 16:57:34 +00:00
Pea Tyczynska
d72ab4f4a6 Send zendesk ticket when services found with high failure rates 2019-12-06 16:57:04 +00:00
David McDonald
396108313a Merge pull request #2670 from alphagov/uploads-endpoint
Uploads endpoint
2019-12-06 14:40:15 +00:00
Rebecca Law
921b90cdec Add type=int to request.args.get, if the arg is an int it's returned else None. This means we ignore the arg if its the wrong data type and we don't need to handle the error. 2019-12-06 13:10:38 +00:00
David McDonald
203e19bef3 Add uploads blueprint, the endpoint returns a combination of uploaded letters and jobs. The endpoint returns data about the uploaded letter or job, including notification statistics for the upload. The data is ordered by scheduled for and created_at.
It is likely this endppoint will need additional data for the UI to display, for the first iteration this will enable the /uploads page to show both letters and jobs. Only letter uploaded by the UI are included in the resultset.

Add file name to resultset.
2019-12-06 09:54:51 +00:00
Leo Hemsted
0448bca542 make create_nightly_notification_status_for_day take notification_type
the nightly task won't be affected, it'll just trigger three times more
sub-tasks.

this doesn't need to be a two-part deploy because we only trigger this
overnight, so as long as the deploy completes in daytime we don't need
to worry about celery task signatures
2019-12-05 14:43:33 +00:00
Leo Hemsted
30f361d318 fix flaky test 2019-12-05 14:43:33 +00:00
Leo Hemsted
8d160303a1 add transactional wrapper
and add case to get_notification_table_to_use test
2019-12-04 15:26:26 +00:00
Leo Hemsted
dd57468147 remove notification_type and service from create_ft_billing
they can both be inferred from the template, and specifying them just
leads to unnecessary risk of errors
2019-12-03 17:02:58 +00:00
Leo Hemsted
ed5a52fe0d make create_ft_billing ensure data is correct
that is, the template must belong to the named service, and the
template's template_type must match a provided notification_type
2019-12-03 16:05:48 +00:00
Leo Hemsted
d457db4164 make has_delete_task_run non-optional
just to ensure people think about the value of it when using the function
2019-12-03 14:19:14 +00:00
Leo Hemsted
d83827579e make ft billing nightly task only look at one table
follows same logic as the create_nightly_notification_status task, see previous commit
for logic
2019-12-03 14:19:13 +00:00
Leo Hemsted
913cf5e12d work out which table to get notification status data from
previously we checked notifications table, and if the results were
zero, checked the notification history table to see if there's data
in there. When we know that data isn't in notifications, we're still
checking. These queries take half a second per service, and we're
doing at least ten for each of the five thousand services we have in
notify. Most of these services have no data in either table for any
given day, and we can reduce the amount of queries we do by only
checking one table.

Check the data retention for a service, and then if the date is older
than the retention, get from history table.

NOTE: This requires that the delete tasks haven't run yet for the day!
If your retention is three days, this will look in the Notification
table for data from three days ago - expecting that shortly after the
task finishes, we'll delete that data.
2019-11-29 15:27:56 +00:00
Leo Hemsted
6b9afa358f update utils to bring in full welsh diacritics range
note: this includes updating the MMG api url to their v2a api. Their
previous API doesn't include support for capital o with grave accent
(Ò)
2019-11-28 15:12:52 +00:00
Leo Hemsted
f7fbd6de5b make 500s change priorities quicker
it's not acceptable for a constantly failing provider to take 50 minutes
to drain (5x reducing priority by 10). But similarly, we need _some_
delay, or a handful of concurrent failures will completely turn off a
provider, rendering the whole excercise kinda pointless. Setting the
delay before it tries to reduce priority again to one minute is nice
because it means that if one request times out and returns 502, then any
other requests that are in flight at that time will time out before the
one minute is up and not switch, but any requests made after the switch
that take sixty seconds to time out will affect it.
2019-11-28 13:29:39 +00:00
Leo Hemsted
2d7bf664f5 set updated_at manually to avoid ORM overwriting my changes
when ORM level changes are made (eg `my_model.my_column = my_value`),
the ORM will read the column definition to see if it should apply any
defaults.The updated_at columns that we use all define
`onupdate=datetime.datetime.utcnow`. We can't patch this out as the
function pointer to the original function has already been grabbed by
this at import time - so freezegun or `mocker.patch` won't work.

So we have to use the query syntax to set the `updated_at` timestamp in
the DB without going through the ORM layer.
2019-11-28 13:29:39 +00:00
Leo Hemsted
cfe82f8f4a make 500 error provider switches also check for recent changes
moving the logic and the test from switch provider on slow delivery to
dao reduce sms provider priority
2019-11-28 13:29:39 +00:00
Leo Hemsted
2a392e7137 update switch provider scheduled task
it now looks at both providers and works out whether to deprioritise
one, rather than binary switching from one to the other. If anything
has altered the priorities in the last ten minutes it won't take any
action. If both providers are slow it also won't take any action.
2019-11-28 13:29:38 +00:00
Leo Hemsted
3d87096353 add tests for send to providers 2019-11-28 13:29:02 +00:00
Leo Hemsted
992e211a8d update history and created_by when adjusting provider priorities
making sure that we don't close the transaction early, because we need
to keep the transaction open as it has the with_for_update clause on the
select to lock the table.

also make sure the tests clean up after themselves as they're adding
history rows etc
2019-11-28 13:29:02 +00:00
Leo Hemsted
4a6a228cc2 clear up where we use restore_provider_details
it now only needs to be used when you're:
* updating providers in ways that will create history (eg through
  regular api calls)
* altering more than just priority in test setup (eg setting inactive,
  deleting, or adding a provider)
2019-11-28 13:29:02 +00:00
Leo Hemsted
3c63ccb159 move from dao_toggle_sms_provider to dao_reduce_sms_provider_priority 2019-11-28 13:29:02 +00:00
Leo Hemsted
52a33f220b make tests use mmg 100% of the time by default
we randomly choose between sms providers now - this means that tests may
sometimes send firetext and sometimes mmg, so we'd need to patch out
different HTTP calls, expect different values in sent_by, etc etc.

To ensure tests are consistent, add a new fixture that is always used by
notify_db_session, which sets the priorities of the sms providers to
100% mmg 0% firetext. if you need to test other values, then you should
set the values manually in the test file
2019-11-28 13:29:02 +00:00
Leo Hemsted
e29546cb65 flake8 2019-11-28 13:29:02 +00:00
Leo Hemsted
28da190a1c remove get_current_provider
the function no longer makes sense now that we send through both at
the same time. mostly just used in old tests that we'll end up rewriting
shortly anyway
2019-11-28 13:29:02 +00:00
Leo Hemsted
8524bdb4bf clean up provider details rest test
we don't actually care what order the providers are returned in, so
no point making a flaky test that asserts the order. This is just seen
by platform admin and we do processing of the list on the admin side
anyway.
2019-11-28 13:29:02 +00:00
Leo Hemsted
8fa7cde593 remove unused provider dao functions 2019-11-28 13:29:01 +00:00
Leo Hemsted
fa7e0a1e84 add dao_reduce_sms_provider_priority function
retrive the sms providers from the DB, and decrease the chosen
provider's priority by 10, while increasing the other by 10.

add a check in to ensure we never decrease below 0 or increase above 100
- this is per provider, we don't check that the two add up to 100 or
  anything. If the values are outside of this range (eg: set via the UI)
then they'll probably* fix themselves at some point - we've added tests
to document these cases.

Use with_for_update to ensure that the method can only run once at a
time - other invocations of the function will be held on that line until
the currently running one ends and commits the transaction. This doesn't
affect anyone doing things from the UI.
2019-11-28 13:29:01 +00:00
Leo Hemsted
6f38cbbcf1 randomly choose from providers based on priority
todo: make sure if they don't add up to 100 we do something sensible,
especially if they're both 0.
2019-11-28 13:29:01 +00:00
Rebecca Law
4fd6f33af2 Merge pull request #2658 from alphagov/fix-letters-in-created-status
Alert if a letter doesn't make it past created status
2019-11-27 13:38:51 +00:00
Pea Tyczynska
c17100af37 Bump utils version and improve error message content 2019-11-26 11:19:01 +00:00
Pea Tyczynska
f4ba82225b Use new Template method .is_message_empty()
This method has been now added to Template subclasses
used by sms, emails and letters, so we can use it to valdiate if
message is not empty.

Use new template method .is_message_empty()

Refactor function name and add a test
2019-11-26 11:18:00 +00:00
Pea Tyczynska
9c804f701b Validate against messages with no content 2019-11-26 11:17:59 +00:00
Rebecca Law
e0b4b258aa Shortened the length of time to check for messages with the wrong state.
There is a chance that the there is an outstanding retry task that has yet to run but the task that are replayed here protect against the task running twice. So this just means it might get sent sooner than later.
2019-11-21 15:51:27 +00:00
Rebecca Law
5d6886242b Check that the request payload data is valid json.
By adding `force=True` to request.get_json() the mime type is ignore. If the data is not valid json the method will return a `BadRequestError` we catch that and throw our own error with a clear error message "Invalid JSON supplied in POST data".
If the json is valid return the json data or an empty dict if None is passed in.

This PR improves the error messages if the json is invalid, previously, the error message was "None object type" message which is not very helpful.
2019-11-21 15:23:11 +00:00
Leo Hemsted
c78a5d8536 Merge pull request #2662 from alphagov/utils-bump
Utils bump
2019-11-21 15:23:11 +00:00