you can still use this flag locally but we have it enabled for all
environments and it doesn't need to be toggleable from credentials as it
isn't a secret value.
If we wish to turn redis off for a specific environment we can create a
PR to change the config.
It's worth noting this will break the Admin page to edit provider
priorities, which currently works in units of 10. We already have
work planned to fix this [^1] and it's not an immediate problem.
The automatic adjustment algorithm will continue to work properly
as it can cope with increments smaller than 10 [^2].
[^1]: https://www.pivotaltracker.com/story/show/181681739
[^2]: b145a29935/app/dao/provider_details_dao.py (L122)
This works in conjunction with the new SMS provider stub [^1].
Local testing:
- Run the migrations to add Reach as an inactive provider.
- Activate the Reach provider locally and deactivate the others.
update provider_details set priority = 100, active = false where notification_type = 'sms';
update provider_details set active = true where identifier = 'reach';
- Tweak your local environment to point at the SMS stub.
export REACH_URL="http://host.docker.internal:6300/reach"
- Start / restart Celery to pick up the config change.
- Send a SMS via the Admin app and see the stub log it.
- Reset your environment so you can send normal SMS.
update provider_details set active = true where notification_type = 'sms';
update provider_details set active = false where identifier = 'reach';
[^1]: https://github.com/alphagov/notifications-sms-provider-stub/pull/10
Currently "test_send_letter_notification_via_api" fails at the final
stage in create-fake-letter-response-file [^1]:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6011): Max retries exceeded with url: /notifications/letter/dvla (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff95ffc460>: Failed to establish a new connection: [Errno 111] Connection refused'))
This only applies when running in Docker so the default should still
be "localhost" for the Flask app itself.
[^1]: 5093064533/app/celery/research_mode_tasks.py (L57)
This changes the scheduled task to raise an alert if letters are still
sending from 1530 to 1700. DVLA have reported that our "monitoring is
executing just before we actually mark them as ‘despatched’ and send
you the feedback files." and asked us to make the check a little later.
We don't actually contact DVLA until the morning after the alert anyway,
so this won't affect the process of getting in touch with them.
This change will require Cronitor to be updated for the new time.
This makes a few changes to:
- Make local development consistent with our other apps. It's now
faster to start Celery locally since we don't try to build the
image each time - this is usually quick, but unnecessary.
- Add support for connecting to a local Redis instance. Note that
the previous suggestion of "REDIS = True" was incorrect as this
would be turned into the literal string "True".
I've also co-located and extended the recipes in the Makefile to
make them a bit more visible.
as a team we primarily develop locally. However, we've been experiencing
issues with pycurl, a subdependency of celery, that is notoriously
difficult to install on mac. On top of the existing issues, we're also
seeing it conflict with pyproj in bizarre ways (where the order of
imports between pyproj and pycurl result in different configurations of
dynamically linked C libraries being loaded.
You are encouraged to attempt to install pycurl locally, following these
instructions: https://github.com/alphagov/notifications-manuals/wiki/Getting-Started#pycurl
However, if you aren't having any luck, you can instead now run celery
in a docker container.
`make run-celery-with-docker`
This will build a container, install the dependencies, and run celery
(with the default of four concurrent workers).
It will pull aws variables from your aws configuration as boto would
normally, and it will attempt to connect to your local database with the
user `postgres`. If your local database is configured differently (for
example, with a different user, or on a different port), then you can
set the SQLALCHEMY_DATABASE_URI locally to override that.
Previously we think this setting was necessary to avoid a memory
leak [1], but it's unclear if this is still an issue:
- We've advanced two major versions of Celery.
- Some of the tasks are now quicker and leaner.
Restarting worker sub-processes after each task is a big problem
for performance, as we move towards parallelising our reporting.
This is something of a test to see if we can manage without this
setting. Note that we need to unset the variable manually:
cf unset-env notify-delivery-worker-reporting CELERYD_MAX_TASKS_PER_CHILD
In the worst case we can always re-run any failed tasks. To check
the worker is still behaving as expected, we can:
- Monitor CPU / memory graphs for it.
- Check `cf events` for unexpected restarts / crashes.
- Compare numbers of task completion logs to previous days.
- Check the number of new billing / status rows looks right.
[1]: ad419f7592
When running the night reporting tasks we are seeing that some tasks are failing because the query is timing out. We need to revisit how to optimise the query but this will at least let the process finish.
they share a lot with the reporting tasks (creating ft_billing and
ft_notification_status), in that they're run nightly, take a long time,
and we see error messages if they get run multiple times (due to
visibility timeout).
The periodic app has two concurrent processes - previously there was
just one delete task, which would use one of those processes, while the
other process would pick up anything else on the queue (at that time of
night, the regular provider switch checks and scheduled job checks).
However, when we switched to running the three delete notification types
separately, we saw visibility timeout issues - three tasks would be
created, all three would be picked up by one celery instance, the two
worker processes would start on two of them, and the third would sit on
the box, wait longer than the visibility timeout to be picked up (and
acknowledged), and so SQS would assume the task was lost and replay it.
it's queues all the way down!
By putting them on the reporting worker we can take advantage of tuning
that app (for example setting the prefetch multiplier to one) which is
designed to run large tasks. We've also got more concurrent workers on
this box, so we can run all three tasks at once.
We have made it so that gov.uk/alerts shows a ‘1 planned test’ banner
for the whole of the day when there has been an operator test on that
day.
We need to remove the banner when the day is over.
The most straightforward way to do this is to republish the site at the
start of every day. The gov.uk/alerts code[1] will work out if there are
or aren’t any planned tests to show that day.
1. 5a274af6d0/app/models/alerts.py (L38-L44)
This adds a task which is designed to be used if we want to recreate the
PDF for a precompiled letter (either one that has been created using the
API or one that has been uploaded through the website).
The task takes the `notification_id` of the letter and passes template
preview the details it needs in order to sanitise the original file and
then replace the version in the letters-pdf bucket with the freshly
sanitised version.