Previously "Result not found" would be returned when the id is not a valid uuid, which does not make sense.
Now the message says "notification_id is not a valid UUID", this should be a clearer message for the client service.
The command takes a service id and a day, grabs the historical data for
that day (potentially out of notification_history), and pops it in
redis (for eight days, same as if it were written to manually).
also, prefix template usage key with "service" to make clear that it's
a service id, and not an individual template id.
We've run into issues with redis expiring keys while we try and write
to them - short lived redis TTLs aren't really sustainable for keys
where we mutate the state. Template usage is a hash contained in redis
where we increment a count keyed by template_id each time a message is
sent for that template. But if the key expires, hincrby (redis command
for incrementing a value in a hash) will re-create an empty hash.
This is no good, as we need the hash to be populated with the last
seven days worth of data, which we then increment further. We can't
tell whether the hincrby created the key, so a different approach
entirely was needed:
* New redis key: <service_id>-template-usage-<YYYY-MM-DD>. Note: This
YYYY-MM-DD is BTC time so it lines up nicely with ft_billing table
* Incremented to from process_notification - if it doesn't exist yet,
it'll be created then.
* Expiry set to 8 days every time it's incremented to.
Then, at read time, we'll just read the last eight days of keys from
Redis, and sum them up. This works because we're only ever incrementing
from that one place - never setting wholesale, never recreating the
data from scratch. So we know that if the data is in redis, then it is
good and accurate data.
One thing we *don't* know and *cannot* reason about is what no key in
redis means. It could be either of:
* This is the first message that the service has sent today.
* The key was deleted from redis for some reason.
Since we set the TTL to so long, we'll never be writing to a key that
previously expired. But if there is a redis (or operator) error and the
key is deleted, then we'll have bad data - after any data loss we'll
have to rebuild the data.
be known. Added the notification id to the logging message so that
the notification can be traced through the logging system by knowing
the notification id, making it easier to debug. Also changed to raise an
exception so that alerts are generated. This way we should get an email
to say that there has been an error.
At the same time, decrease the number of workers from 5 to 4.
Effect on max db connections will be the same - although with a higher
"resting" number of connections.
Before:
12 (instances) * 5 (workers) * 20 (10 permanent + 10 overflow) = 1200
After:
12 (instances) * 4 (workers) * 25 (15 permanent + 10 overflow) = 1200
Sometimes, when a test using one of the set_config[_values] context managers
failed or raised an exception it would cause the context to not be able
to revert its config changes, resulting in a 'spooky action at a
distance' where random tests would start to fail for non-obvious reasons.
The main drive behind this is to allow us to enable http healthchecks on
the `/_status` endpoint. The healthcheck requests are happening directly
on the instances without going to the proxy to get the header properly
set.
In any case, endpoints like `/_status` should be generally accessible by
anything without requiring any form of authorization.
move_scanned_pdf_to_test_or_live_pdf_bucket and
move_failed_pdf to consolidate some code so it is easier to maintain in
future as so that _move_s3_object can be used for any new methods.
application. If the Anti-virus app fails due to s3 errors or ClamAV
so does not scan (even after retries) the file at all an error needs
to be raised and the notification set to technical-failure.
Files should be moved to a 'folder' a separate one for ERROR and FAILURE.
* Added new letter task to process the error
* Added a new method to letter utils.py to move a file into an error or
failure folder based on the input
* Added tests to test the task and the utils.py method