eventlet works by monkey-patching core IO libraries (such as ssl) to be
non-blocking. However, there's currently a bug: In the normal socket
library it may throw a timeout error as a `socket.timeout` exception.
However eventlet.green.ssl's patch raises an ssl.SSLError('timed out',)
instead. redispy handles socket.timeout but not ssl.SSLError, so we
solve this by monkey patching the monkey patching code to raise the
correct exception type 😱
Note: This code should _only_ be called when we're using eventlets, or
we'll run into issues with regular code failing with max recursion
errors. With that in mind we put this code in gunicorn_config, as that
isn't imported when we run celery or run flask locally.
This is to support a migration from Redislabs to PaaS native Redis,
allowing us to toggle between old and new using the env vars for
the instance - without needing to change the code.
This was inconsistent with the source data for the fixture being
overidden in some of the tests. We only need to set it in the env
once, so it makes sense to just put the code there.
Fixes this error (in Admin):
File "/home/vcap/app/app/notify_client/notification_api_client.py", line 74, in send_precompiled_letter
return self.post(url='/service/{}/send-pdf-letter'.format(service_id), data=data)
File "/home/vcap/app/app/notify_client/__init__.py", line 59, in post
return super().post(*args, **kwargs)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_python_client/base.py", line 48, in post
return self.request("POST", url, data=data)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_python_client/base.py", line 64, in request
response = self._perform_request(method, url, kwargs)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_python_client/base.py", line 118, in _perform_request
raise api_error
notifications_python_client.errors.HTTPError: 500 - Internal server error
Due to this error (in API):
File "/home/vcap/app/app/service/send_notification.py", line 178, in send_pdf_letter_notification
raise e
File "/home/vcap/app/app/service/send_notification.py", line 173, in send_pdf_letter_notification
letter = utils_s3download(current_app.config['TRANSIENT_UPLOADED_LETTERS'], file_location)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_utils/s3.py", line 53, in s3download
raise S3ObjectNotFound(error.response, error.operation_name)
notifications_utils.s3.S3ObjectNotFound: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
I checked the DB to verify the letter does actually exist i.e. it
is an instance of the problem we're fixing here.
We previously decided against this [^1] but the lambda alerts are
not a replacement for this one.
Even if both sets of alerts go off, we expect PagerDuty to aggregate
them into a single notification.
[^1]: 65c21f694c
It's worth noting this will break the Admin page to edit provider
priorities, which currently works in units of 10. We already have
work planned to fix this [^1] and it's not an immediate problem.
The automatic adjustment algorithm will continue to work properly
as it can cope with increments smaller than 10 [^2].
[^1]: https://www.pivotaltracker.com/story/show/181681739
[^2]: b145a29935/app/dao/provider_details_dao.py (L122)
The new alert happens earlier but is otherwise the same:
- We only create a ticket in Production.
- We only create a ticket on approval.
I took this opportunity to refactor the alert as a private function
and test this specifically in detail to avoid lots of repetitive
mocks, which are required when calling the main "update" function.
One test I haven't preserved was for when the "names" array is empty,
as this was added for a legacy data integrity scenario [^1].
[^1]: bf0bf4e31c
This was creating data in the Notifications table but the funcetion
under test was - now the timestamps are in the past - looking in the
NotificationHistory table. Freezing the time of the test fixes that.
We should lock these tests to run in a particular year [^1].
Fixes e.g.
> assert annual_billing[0].free_sms_fragment_limit == 150000
E assert 40000 == 150000
E + where 40000 = <AnnualBilling 8851fd4b-6316-4a53-b0e0-8202c803ae97>.free_sms_fragment_limit
[^1]: 8402e7c97b (diff-adcd90bfdc6b7777fdf309037ca6948bef4f4b858e22f8b2e46b71865d580fbaR60)
In response to: [^1].
The stacktrace conveys the same and more information. We don't do
anything different for each exception class, so there's no value
in having three of them over one exception.
I did think about DRYing-up the duplicate exception behaviour into
the base class one. This isn't ideal because the base class would
be making assumptions about how inheriting classes make requests,
which might change with future providers. Although it might be nice
to have more info in the top-level message, we'll still get it in
the stacktrace e.g.
ValueError: Expected 'code' to be '0'
During handling of the above exception, another exception occurred:
app.clients.sms.SmsClientResponseException: SMS client error (Invalid response JSON)
requests.exceptions.ReadTimeout
During handling of the above exception, another exception occurred:
app.clients.sms.SmsClientResponseException: SMS client error (Request failed)
[^1]: https://github.com/alphagov/notifications-api/pull/3493#discussion_r837363717
This works in conjunction with the new SMS provider stub [^1].
Local testing:
- Run the migrations to add Reach as an inactive provider.
- Activate the Reach provider locally and deactivate the others.
update provider_details set priority = 100, active = false where notification_type = 'sms';
update provider_details set active = true where identifier = 'reach';
- Tweak your local environment to point at the SMS stub.
export REACH_URL="http://host.docker.internal:6300/reach"
- Start / restart Celery to pick up the config change.
- Send a SMS via the Admin app and see the stub log it.
- Reset your environment so you can send normal SMS.
update provider_details set active = true where notification_type = 'sms';
update provider_details set active = false where identifier = 'reach';
[^1]: https://github.com/alphagov/notifications-sms-provider-stub/pull/10
This avoids duplicating it as we add a new provider and means we
can test it all in one place (although it wasn't tested before).
I'm not sure why the previous code did "super(..)__init__" in a
non-init function - it's a bit late! - so I've just replaced it
with a call to the new "init_app" function in the parent class.
The provider tests are coupled to actual data in the DB, but we
shouldn't have to overhaul the tests when this changes.
Assuming we don't delete old providers, just testing a subset of
the fixture data should give us enough confidence in the code.