it's not acceptable for a constantly failing provider to take 50 minutes
to drain (5x reducing priority by 10). But similarly, we need _some_
delay, or a handful of concurrent failures will completely turn off a
provider, rendering the whole excercise kinda pointless. Setting the
delay before it tries to reduce priority again to one minute is nice
because it means that if one request times out and returns 502, then any
other requests that are in flight at that time will time out before the
one minute is up and not switch, but any requests made after the switch
that take sixty seconds to time out will affect it.
when ORM level changes are made (eg `my_model.my_column = my_value`),
the ORM will read the column definition to see if it should apply any
defaults.The updated_at columns that we use all define
`onupdate=datetime.datetime.utcnow`. We can't patch this out as the
function pointer to the original function has already been grabbed by
this at import time - so freezegun or `mocker.patch` won't work.
So we have to use the query syntax to set the `updated_at` timestamp in
the DB without going through the ORM layer.
it now looks at both providers and works out whether to deprioritise
one, rather than binary switching from one to the other. If anything
has altered the priorities in the last ten minutes it won't take any
action. If both providers are slow it also won't take any action.
making sure that we don't close the transaction early, because we need
to keep the transaction open as it has the with_for_update clause on the
select to lock the table.
also make sure the tests clean up after themselves as they're adding
history rows etc
it now only needs to be used when you're:
* updating providers in ways that will create history (eg through
regular api calls)
* altering more than just priority in test setup (eg setting inactive,
deleting, or adding a provider)
we randomly choose between sms providers now - this means that tests may
sometimes send firetext and sometimes mmg, so we'd need to patch out
different HTTP calls, expect different values in sent_by, etc etc.
To ensure tests are consistent, add a new fixture that is always used by
notify_db_session, which sets the priorities of the sms providers to
100% mmg 0% firetext. if you need to test other values, then you should
set the values manually in the test file
the function no longer makes sense now that we send through both at
the same time. mostly just used in old tests that we'll end up rewriting
shortly anyway
we don't actually care what order the providers are returned in, so
no point making a flaky test that asserts the order. This is just seen
by platform admin and we do processing of the list on the admin side
anyway.
retrive the sms providers from the DB, and decrease the chosen
provider's priority by 10, while increasing the other by 10.
add a check in to ensure we never decrease below 0 or increase above 100
- this is per provider, we don't check that the two add up to 100 or
anything. If the values are outside of this range (eg: set via the UI)
then they'll probably* fix themselves at some point - we've added tests
to document these cases.
Use with_for_update to ensure that the method can only run once at a
time - other invocations of the function will be held on that line until
the currently running one ends and commits the transaction. This doesn't
affect anyone doing things from the UI.
This method has been now added to Template subclasses
used by sms, emails and letters, so we can use it to valdiate if
message is not empty.
Use new template method .is_message_empty()
Refactor function name and add a test
There is a chance that the there is an outstanding retry task that has yet to run but the task that are replayed here protect against the task running twice. So this just means it might get sent sooner than later.
By adding `force=True` to request.get_json() the mime type is ignore. If the data is not valid json the method will return a `BadRequestError` we catch that and throw our own error with a clear error message "Invalid JSON supplied in POST data".
If the json is valid return the json data or an empty dict if None is passed in.
This PR improves the error messages if the json is invalid, previously, the error message was "None object type" message which is not very helpful.
The assumption was that S3 would throw an exception if the object was uploaded twice. That's not the case the default behaviour is that if a file already exists it will be overwritten. So it is completely safe to run the task from the alert.
It can also mean that we don't need to wait 4hours 15 minutes. Shall I decease the amount of time before restarting the task?
When we upload a CSV for a job, we add the sender_id as metadata to the file that is uploaded on S3.
There is more than one place where we process rows from that CSV.
- process_job
- scheduled_job
- check_for_missing_rows_in_completed_jobs
- check_job_status
All of these places need to use the sender_id, now the sender_id is always read from the file metadata.
In a subsequent PR we can remove the optional sender_id parameter from process_job task.
The group by for the query was wrong which would result in 2 rows with different totals but the same unique key, so the second row would update the first row. Meaning we had incorrect numbers for the billing data.
Because some of the data had null for the sent_by column, the select would turn the Null --> dvla, but that same function was not used in the group by. So any time we had missing sent_by data we would end up with 2 rows where one would overwrite the other.
This is only necessary because there is currently a job that is old, but had 1 row created a couple days later. So now there is 1 notifications for the job where the rest have been purged.