previously we'd see an error message in the logs:
`AttributeError: 'NoneType' object has no attribute 'status_code'`
because we were assuming the requests exception would always have a
response - it won't have a response if it wasn't able to create a
connection at all.
We can add a request_id for tasks that are not spawned by an HTTP request, for example scheduled or nightly tasks. That means you can match up all the tasks spawned by a single task, for example, create-night-billing spawns 4 tasks, those would all have the same idea. Not sure if that is helpful or not. Also it might be confusing to have a request_id for logs that were not started from a request so I have left it out.
Flake8 Bugbear checks for some extra things that aren’t code style
errors, but are likely to introduce bugs or unexpected behaviour. A
good example is having mutable default function arguments, which get
shared between every call to the function and therefore mutating a value
in one place can unexpectedly cause it to change in another.
This commit enables all the extra warnings provided by Flake8 Bugbear,
except for:
- the line length one (because we already lint for that separately)
- B903 Data class should either be immutable or use `__slots__` because
this seems to false-positive on some of our custom exceptions
- B902 Invalid first argument 'cls' used for instance method because
some SQLAlchemy decorators (eg `declared_attr`) make things that
aren’t formally class methods take a class not an instance as their
first argument
It disables:
- _B306: BaseException.message is removed in Python 3_ because I think
our exceptions have a custom structure that means the `.message`
attribute is still present
Matches the work done in other repos:
- https://github.com/alphagov/notifications-admin/pull/3172/files
The request_id is set is the task is created from a http request, if that task then calls through to another task this will set the request_id from the global context. We should then be able to follow the creation of a notification all the way from the original http request to the sending task.
We have been getting log lines of the following:
`API POST request on
https://api.notifications.service.gov.uk/notifications/sms/mmg failed
with None`
It's not clear what error caused the request to fail because the value
of `api_error.response` is always `None`.
There appears to be something wrong with this logging.
`raise_for_status` will raise an `HTTPError`, so then there should be no
reason to then pass that error into another `HTTPError` (which is
causing the response to be lost).
We can instead simply catch the `HTTPError` and log it's status
code.
This might not be perfect, but it's definitely an improvement and should
give us some more context about why these requests occasionally fail.
A few weeks ago, we deleted some pdf letters that had reached their
retention period. However, these letters were in the 'created' state so
it's very arguable that we should not have deleted them because we were
expecting to resend them and were unable to. Part of the reason for this
is that we marked the letters back to `created` as the status but we did
not nullify the `sent_at` timestamp, meaning the check on
ebb43082d5/app/dao/notifications_dao.py (L346)
did not catch it. Regardless of that check, which controls whether the
files were removed from S3, they were also archived into the
`notification_history` table as by default.
This commit does changes our code such that letters that are not in
their final state do not go through our retention process. This could
mean they violate their retention policy but that is likely the lesser
of two evils (the other being we delete them and are unable to resend
them).
Note, `sending` letters have been included in those not to be removed
because there is a risk that we give the letter to DVLA and put it in
`sending` but then they come back to us later telling us they've had
problems and require us to resend.
In IBAG format for broadcasts, we need to give sequential number
of previous message, and it needs to be formatted as a hex padded
with zeroes to be 8 character long.
This commit adds the necessary formatting.
This is for the IBAG format (similar to CAP format, but proprietary)
used in the XMLs that we exchange with broadcast providers (specifically
Vodafone).
some services only send to one provider. This is a platform admin
setting to allow us to test integrations and providers manually without
affecting other broadcasts from different services.
one-to-one - a service can either send to all as normal, or send to only
one provider.
Now that we’re grouping jobs sent from contact lists within their
parent, they shouldn’t also be listed on the jobs page at the top level.
The jobs page uses the uploads API, not the jobs API, so this commit
makes sure that filtering is happening in the proper place.
`service.id` is a uuid so will not be matched to anything in
`current_app.config.get('HIGH_VOLUME_SERVICE')` because that is a list
of strings.
This is why we are never falling into the first if statement and having
any metrics for high volume services on our dashboards at the moment.
Note, I had taken the existing line from the `post_notification`
endpoint, but that is using a serialised service which already has the
UUID converted to a string.
Changes the high volume and not high volume metrics to both only include
non test notifications. This is because when looking at the grafana
metrics, it was impossible to tell what affect the high volume/non high
volume effect was having vs the test/live notification effect.
This leaves us with no break down of high volume/not high volume sending
times for test notifications but I don't think we really need that.
We currently measure the sending time for all. This commit then breaks
it down into
- test keys and non test keys
- high volume services and non high volume services
Breaking it down into test keys and non test keys is important because
we don't care as much about sending test notifications within 10
seconds, only non test keys so we don't want our graphs to reflect poor
performance if it's just test keys affecting this
Breaking it down into high volume and non high volume will allow us to
easily debug issues with slow sending if they are high volume or non
high volume issues
previously we made some incorrect assumptions about set-up on staging
and prod - they currently don't have any cbc_proxy aws creds at all.
We shoudn't be attempting canaries or link tests when there's no AWS
infrastructure to connect to.
We also shouldn't bother writing a row into the database at all for the
broadcast_provider_message since we're not even attempting to send, and
we shouldn't get confused between messages that failed and messages we
never wanted to send at all.
At the moment we log everytime we get a bounce from SES, however we
don't link it to a particular notification so it's hard to know for what
sub reason a notifcation did not deliver by looking at the logs.
This commit changes this by now looking the bounce reason after we have
found the notification ID and including them together. So if you know
search for a notification ID in Kibana, you will see full logs for why
it failed to deliver.
this is a pretty big and convoluted refactor unfortunately.
Previously:
There was one global `cbc_proxy_client` object in apps. This class has
the information about how to invoke the bt-ee lambda, and handles all
calls to lambda. This includes calls to the canary too (which is a
separate lambda).
The future:
There's one global `cbc_proxy_client`. This knows about the different
provider functions and lambdas, and you'll need to ask this client for a
proxy for your chosen provider. call cbc_proxy_client.get_proxy('ee')`
and it'll return you a proxy that knows what ee's lambda function is,
how to transform any content in a way that is exclusive to ee, and in
future how to parse any response from ee.
The present:
I also cleaned up some duplicate tests.
I'm really not sure about the names of some of these variables - in
particular `cbc_proxy_client` isn't a client - it's more of a java style
factory, where you call a function on it to get the client of your
choice.