There is something currently breaking our deployment and this is an attempt to revert recent changes to try and get things working again.
Signed-off-by: Carlo Costino <carlo.costino@gsa.gov>
This changeset undoes the recent changes we tried after the Python 3.13 update as they had no bearing on the SSL cert validation errors. Back to the drawing board!
Signed-off-by: Carlo Costino <carlo.costino@gsa.gov>
eventlet works by monkey-patching core IO libraries (such as ssl) to be
non-blocking. However, there's currently a bug: In the normal socket
library it may throw a timeout error as a `socket.timeout` exception.
However eventlet.green.ssl's patch raises an ssl.SSLError('timed out',)
instead. redispy handles socket.timeout but not ssl.SSLError, so we
solve this by monkey patching the monkey patching code to raise the
correct exception type 😱
Note: This code should _only_ be called when we're using eventlets, or
we'll run into issues with regular code failing with max recursion
errors. With that in mind we put this code in gunicorn_config, as that
isn't imported when we run celery or run flask locally.
Flake8 Bugbear checks for some extra things that aren’t code style
errors, but are likely to introduce bugs or unexpected behaviour. A
good example is having mutable default function arguments, which get
shared between every call to the function and therefore mutating a value
in one place can unexpectedly cause it to change in another.
This commit enables all the extra warnings provided by Flake8 Bugbear,
except for:
- the line length one (because we already lint for that separately)
- B903 Data class should either be immutable or use `__slots__` because
this seems to false-positive on some of our custom exceptions
- B902 Invalid first argument 'cls' used for instance method because
some SQLAlchemy decorators (eg `declared_attr`) make things that
aren’t formally class methods take a class not an instance as their
first argument
It disables:
- _B306: BaseException.message is removed in Python 3_ because I think
our exceptions have a custom structure that means the `.message`
attribute is still present
Matches the work done in other repos:
- https://github.com/alphagov/notifications-admin/pull/3172/files
We moved from sending statsd metrics to hosted graphite to sending to
one that is running on the paas. Therefore we no longer need to send
statsd metrics to a particular prefix at the statsd app as it is only
receiving statsd metrics from our apps (not other users like would have
been the case with HostedGraphite).
This should change no behaviour as the only place the environment
variable was being used was in the gunicorn config and it was an empty
string which is the default behaviour anyway as per:
https://docs.gunicorn.org/en/stable/settings.html#statsd-prefix
All our requests identify the web server they were served with via the
`server` response header. This opens us up to potential attackers who
might be crawling the internet looking for specific versions with known
vulnerabilities. As our dependencies are open source, this doesn't
affect any targeted attacks as they can just look at our repos on
github, but this theoretically will marginally improve security.
Regardless, the header isn't useful [1], we're not the first people to
want to get rid of it, and gunicorn are in the process of at least
amending it to remove the version information [2].
This shouldn't have any impact on us, though an empty string will be
passed through to debug information in event of a crash. That's fine
though, as we already know what version we're running.
[1] https://www.fastly.com/blog/headers-we-dont-want
[2] https://github.com/benoitc/gunicorn/issues/825
- Gunicorn can give us some interestings stats about what it is doing
that might prove useful when trying to work out the
gunicorn/eventlet/sqlalchemy kind of bottlenecking issues
- We already ship these env vars to the app so this is a pretty easy
change to make
- Uses statsd which is UDP so it shouldn't have any performance impact
on gunicorn
At the same time, decrease the number of workers from 5 to 4.
Effect on max db connections will be the same - although with a higher
"resting" number of connections.
Before:
12 (instances) * 5 (workers) * 20 (10 permanent + 10 overflow) = 1200
After:
12 (instances) * 4 (workers) * 25 (15 permanent + 10 overflow) = 1200