The standard way that we indicate that there are more results than can
be returned is by paginating. So even though we don’t intend to paginate
the search results in the admin app, it can still use the presence or
absence of a ‘next’ link to determine whether or not to show a message
about only showing the first 50 results.
we're seeing issues with email clients sniffing links, and causing them
to expire before the user gets a chance to click on them. Temporarily
disable the expiry while we work on a more permanent solution.
The link will still expire after half an hour, and sms codes aren't
affected by this change
If a service has permission to send international letters then it should
tell template preview, so that template preview knows what rule to
apply when it’s validating the address of the letter.
Depends on:
- [ ] https://github.com/alphagov/notifications-template-preview/pull/445
`allow_international_letters` is a new, required argument, so the tests
that make some `Row` objects need to provide that.
There are now 8 possible address columns (7 plus postcode) so the tests
need to expect that. But this won’t have any user-facing impact.
For services that have permission to send international letters we
should not reject letters that are addressed to another country. We
should still reject letters that are badly-addressed.
The '/service/monthly-data-by-service` endpoint which is used for the
'Monthly notification statuses for live services' Platform Admin report
did not including `pending` notifications. This updates the DAO function
that the endpoint calls to group `pending` and `sending` notifications together.
We were doing this temporarily while the `normalised_to` field was not
populated for letters. Once we have a week of data in the
`normalised_to` field we can stop looking in the `to` field. This should
improve performance because:
- it’s one `WHERE` clause not two
- we had to do `ilike` on the `to` field, because we don’t lowercase its
contents – even if the two where clauses are somehow paralleized it’s
is slower than a `like` comparison, which is case-sensitive
Depends on:
- [ ] Tuesday 5 May (7 full days after deploying https://github.com/alphagov/notifications-api/pull/2814)
We have seen the reporting app run out of memory multiple times when
dealing with overnight tasks. The app runs 11 worker threads and we
reduce this to 2 worker threads to put less pressure on a single
instance.
The number 2 was chosen as most of the tasks processed by the reporting
app only take a few minutes and only one or two usually take more than
an hour. This would mean with 2 processes across our current 2
instances, a long running task should hopefully only wait behind a few
short running tasks before being picked up and therefore we shouldn't
see large increase in overall time taken to run all our overnight
reporting tasks.
On top of reducing the concurrency for the reporting app, we also set
CELERYD_PREFETCH_MULTIPLIER=1. We do this as suggested by the celery
docs because this app deals with long running tasks.
https://docs.celeryproject.org/en/3.1/userguide/optimizing.html#optimizing-prefetch-limit
The chance in prefetch multiplier should again optimise the overall time
it takes to process our tasks by ensuring that tasks are given to
instances that have (or will soon have) spare workers to deal with them,
rather than committing to putting all the tasks on certain workers in
advance.
Note, another suggestion for improving suggested by the docs for
optimising is to start setting `ACKS_LATE` on the long running tasks.
This setting would effectively change us from prefetching 1 task per
worker to prefetching 0 tasks per worker and further optimise how we
distribute our tasks across instances. However, we decided not to try
this setting as we weren't sure whether it would conflict with our
visibility_timeout. We decided not to spend the time investigating but
it may be worth revisiting in the future, as long as tasks are
idempotent.
Overall, this commit takes us from potentially having all 18 of our
reporting tasks get fetched onto a single instance to now having a
process that will ensure tasks are distributed more fairly across
instances based on when they have available workers to process the
tasks.
What this setting does is best described in
https://medium.com/@taylorhughes/three-quick-tips-from-two-years-with-celery-c05ff9d7f9eb#d7ec
This should be useful for the reporting app because tasks run by this
app are long running (many seconds). Ideally this code change will
mean that we are quicker to process the overnight reporting tasks,
so they all finish earlier in the morning (although are not individually quicker).
This is only being set on the reporting celery app because this
change trying to do the minimum possible to improve the reliability and speed
of our overnight reporting tasks. It may very well be useful to set this
flag on all our apps, but this should be done with some more
consideration as some of them will deal with much faster tasks (sub
0.5s) and so it may be still be appropriate or may not. Proper
investigation would be needed.
Note, the celery docs on this are also worth a read:
https://docs.celeryproject.org/en/3.1/userguide/optimizing.html#optimizing-prefetch-limit.
However, the language can confuse this with setting with the prefetch
limit. The distinction is that prefetch grabs items off the queue, whereas the
-Ofair behaviour is to do with when items have already been prefetched
and then whether the master celery process straight away gives them to
the child (worker) processes or not.
Note, this behaviour is default for celery version 4 and above but we
are still on version 3.1.26 so we have to enable it ourselves.
We've seen only some of these reporting tasks happen but with no log
messages to indicate what happened and no app crashes. This hopefully
will give us a better picture of a timeline.
Note, I've tried to make our message format very consistent and good for
searching for in kibana so I've changed that across this whole file for
consistency.
The reporting worker tasks fetch large amounts of data from the db, do
some processing then store back in the database. As the reporting worker
only processes the create nightly billing/stats table tasks, which
aren't high performance or high volume, we're fine with the performance
hit from restarting the worker between every task (which based on
limited local testing takes about a second or so).
This causes some real funky shit with the app_context (used for
accessing current_app.logger). To access flask's global state we use the
standard way of importing `from flask import current_app`. However,
processes after the first one don't have the current_app available on
shut down (they're fine during the actual task running), and are unable
to call `with current_app.app_context()` to create it. They _are_ able
to call `with app.app_context()` to create it, where `app` is the
initial app that we pass in to `NotifyCelery.init_app`.
NotifyCelery.init_app is only called once, in the master process - I
think the application state is then stored and passed to the celery
workers. But then it looks like the teardown might clear it, but it
never gets set up again for the new workers? Unsure.
To fix this, store a copy of the initial flask app on the NotifyCelery
object and then use that from within the shutdown signal logging
function.
Nothing's ever easy ¯\_(ツ)_/¯
By not having a catch-all else, it makes it clearer what we’re
expecting. And then we think it’s worth adding a comment explaining why
we normalise as we do for letters and the `None` case.
create-letters-pdf-tasks is for contacting template preview.
letters-tasks is for doing other things around letters (including putting things on template preview queue)
At the moment we’re not consistent:
Precompiled (API and one-off):
`to` has the whole address
`normalised_to` has nothing
Templated (API, CSV and one off):
`to` has the first line of the address
`normalised_to` has nothing
This commit makes us consistently store the whole address in the `to`
field. We think that people might want to search by postcode, not just
first line of the address.
This commit also starts to populate the normalised_to field with the
address lowercased and with all spaces removed, to make it easier to
search on.
Like we have search by email address or phone number, finding an
individual letter is a common task. At the moment users are having to
click through pages and pages of letters to find the one they’re looking
for.
We have to search in the `to` and `normalised_to` fields for now because
we’re not populating the `normalised_to` column for letters at the
moment.