And support this change across our code. Note, this is a halfway step
where it is not a list rather than a string but still only supports a
single secret, ie one item in the list.
These alerts are sent to our postal provider. And it usually arrives as they are getting ready to go home for the day or the weekend.
Which means they get missed/overlooked. They have agreed to get the alert an hour earlier, perhaps that will improved the response time.
Added the queue and task names for the new template preview task to the
config. Also added the new bucket name that template preview will use
for the sanitised letters to the config for all environments.
we generally aim to share the load between the two providers equally
(more or less). When one provider has struggled, we deprioritise them,
this commit adds a function that gradually restores balance. It checks
every five minutes, if it's been more than an hour since the providers
were last changed then it adjusts them towards a 50/50 split. Except
it's not quite 50/50 due to #reasons (we want to slightly favour MMG),
it's actually 60/40. That's defined in a new dict in config.py.
these URLs never change, and it lead to surprising issues where an
updated default MMG_URL wasn't actually respected on PaaS. These urls
aren't private and don't need to be stored in credentials.
By not defining them in the manifest, we expect them to use the default
unless `cf set-env` has been specifically used to modify them in an app.
we don't use it since we wrote our own provider stubs for performance
tests.
this removes it from the api - it's still in the DB and will be
retrieved by queries, but is set to disabled on prod
the nightly tasks need to run after the create nightly notification
status task - so that test notifications are still there to record
stats for, and to stop the risk of deleting notificaitons part-way
through recording stats for them.
There's a bug in pysftp that appears to cause quadratic performance loss. See https://github.com/paramiko/paramiko/issues/1141 for more details.
As a temporary band-aid fix, lower the size of the files we're sending.
* The `_should_record_notification_in_history_table` function stopped being
used in this commit: c23ae15f32
* `NOTIFICATIONS_ALERT` stopped being used in this commit: 5aa37f09b6
we build up one personalisation dict, and then pass it in to all the
different templates - so be careful editing things. also of note, we
check if the agreement_signed_on_behalf_of is set, and send a different
template with slightly different wording to the person who clicked the
confirm button.
Added a scheduled task to run once a day and check if there were any
letters from before 17.30 that still have a status of 'created'. This
logs an exception instead of trying to fix the error because the fix
will be different depending on which bucket the letter is in.
Added a task which runs twice a day on weekdays and checks for letters that have
been in the state of `pending-virus-check` for over 90 minutes. This is
just logging an exception for now, not trying to fix things, since we
will need to manually check where the issue was.
Setting `STATSD_HOST` for an env variable allows us to switch to a
local statsd_exporter on a per-app basis.
This also changes `STATSD_ENABLED` to be on when `STATSD_HOST` is set,
avoiding the need to set it separately.
Similar to MMG, there's a new env variable FIRETEXT_URL that can be
used to override the Firetext api URL.
This will be used to stub out both providers during the load test or
can be used to run a local API against a fake provider endpoint.
antivirus is sometimes tough to get running locally - now in dev
antivirus is skipped unless `ANTIVIRUS_ENABLED=1` is set on the command
line. on all other environments it is always enabled.
make a decorator that pings cronitor before and after each task run.
Designed for use with nightly tasks, so we have visibility if they
fail. We have a bunch of cronitor monitors set up - 5 character keys
that go into a URL that we then make a GET to with a self-explanatory
url path (run/fail/complete).
the cronitor URLs are defined in the credentials repo as a dictionary
of celery task names to URL slugs. If the name passed in to the
decorator isn't in that dict, it won't run.
to use it, all you need to do is call `@cronitor(my_task_name)`
instead of `@notify_celery.task`, and make sure that the task name and
the matching slug are included in the credentials repo (or locally,
json dumped and stored in the CRONITOR_KEYS environment variable)
A recent issue with a long-running query (#2288) highlighted the
fact that even though the original HTTP connection might be closed
(for example after gorouter timeout of 15 minutes, which returns a
504 response to the client), the request worker will not be stopped.
This means that the worker is spending time and potentially DB
resources generating a response that will never be delivered.
Gunicorn's timeout setting only applies to sync workers and there
doesn't seem to be an option to interrupt individual requests in
gevent/eventlet deployments.
Since the most likely (and potentially most dangerous) scenario for
this is a long-running DB query, we can set a statement timeout on
our DB connections. This will raise a sqlalchemy.exc.OperationalError
(wrapping psycopg2.extensions.QueryCanceledError), interrupting the
request after the given timeout has been reached.
This is a Postgres client setting, so the database itself will abort
the transaction when it reaches the set timeout.
Since this will also apply to our celery tasks (including potentially
long-running nightly tasks) we set a timeout of 20 minutes to begin
with.
This can potentially be split in the future to set a different value
for each app, so that we could limit API requests even more.
We don't use FUNCTIONAL_TEST_PROVIDER_SERVICE_ID or
UNCTIONAL_TEST_PROVIDER_SMS_TEMPLATE_ID anymore so we can safely
delete them from config and tests.
We don't seem to use recorded queries or modification tracking anywhere
in the app, and both features potentially increase memory usage.
This removes deprecated SQLALCHEMY_COMMIT_ON_TEARDOWN options. It's
been removed from the docs and the default matches the value we set
anyway.
Bumped notifications-utils to 3.7.0. Version 3.7.0 includes the
`convert_utc_to_bst` and `convert_bst_to_utc` functions and the
`LETTER_PROCESSING_DEADLINE` constant, so these have been removed from
this repo and anywhere using these has now been updated to get these
from `notifications-utils`.
Also bumped pytest by a patch version to bring in a bug fix.
When we first built letters you could only send them via a CSV upload, initially we needed a way to send those files to dvla per job.
We since stopped using this page. So let's delete it!
This was done so when notification is timed out from sending/pending
to temporary_failure, this change has to always be caught
in the ft_notification_status
Timeout sending notifications updates up to 4 days of notifications
older than 72 hours to correct failure status. It needs to run before
we update ft_notifications_status table. Otherwise the changes
don't get picked up. Notifications deletion tasks have to run
after those jobs in case our users set short data retention
policy.
- pass new, sanitised pdf for sending
- move invalid pdfs to a newly created bucket
- set status fro notifications that failed pdf validation to a new status validation-failed
- adjust existing tests