There are several reasons why we might get an `InvalidParameterValue`
from the SES API. One, as correctly identified before in
https://github.com/alphagov/notifications-api/pull/713/files
is if we allow an email address on our side that SES rejects.
However, there are other types of errors that could cause an
`InvalidParameterValue`. One example is a `Header too long: 'Subject'`
error that we have seen happen in production. This shouldn't raise an
`InvalidEmailError` as that is not appropriate.
Therefore, we introduce a new exception
`EmailClientNonRetryableException`, that represents any exception back
from an email client that we can use whenever we get a
`InvalidParameterValue` error.
Note, I chose `EmailClientNonRetryableException` rather than
`SESClientNonRetryableException` as our code needs to catch this
exception and it shouldn't be aware of what email client is being used,
it just needs to know that it came from one of the email clients (if in
time we have more than one).
In time, we may wish to extend the approach of having generic
`EmailClient` exceptions and `SMSClient` exceptions as this should be
the most extendable pattern and a good abstraction.
We shouldn't be logging PII so we should not log email addresses. We
remove the email address and just log the normal exception message.
Note, this meant before that you could see the email address and more
easily track down the notification ID in the database. Now instead, you
will need to search in the DB for notifications that have gone into
technical failure at the time of the log message (as we still don't
log the notification ID alongside the failure).
We have hit throttling limits from SES approximately once a week during
a spike of traffic from GOV.UK. The rate limiting usually only lasts a
couple of minutes but generates enough exceptions to cause a p1 but with
no potential action for the responder.
Therefore we downgrade the warning for this case to a warning and assume
traffic will level back out such that the problem resolves itself.
Note, we will still get exceptions if we go over our daily limit, rather
than our per minute sending limit, which does require immediate action
by someone responding.
If we were to continually go over our per second sending rate for a long
continous period of time, then there is a chance we may not be aware but
given the risk of this happening is low I think it's an acceptable risk
for the moment.
amazon SES only accepts domains encoded in punycode, an encoding that
converts utf-8 into an ascii encoding that browsers and mailservers
recognise.
We currently just send through emails as we store them (in full
unicode), which means any rogue characters break SES and cause us to
put the email in technical-failure. Most of these appear to be typos
and rogue control characters, but there is a small chance that it could
be a real domain (eg https://🅂𝖍𝐤ₛᵖ𝒓.ⓜ𝕠𝒃𝓲/🆆🆃🅵/).
We should encode to and reply-to-address emails as punycode to make
sure that they can always be sent. The chance that anyone actually uses
a unicode domain name for their email is probably pretty low, but we
should allow it for completeness.
brings in a fix to InvalidEmail/Phone/AddressExceptions not being
instantiated correctly. `exception.message` is not a python standard,
so we shouldn't be relying on it to transmit exception reasons -
rather we should be using `str(exception)` instead. This involved a
handful of small changes to the schema validation
If the notification has a status == sending then update the status otherwise do not update the status.
In other words do not change the status more than once.
- when a provider callback occurs and we update the status of the notification, also update the statistics table
Adds:
- Mapping object to the clients to handle mapping to various states from the response codes, this replaces the map.
- query lookup in the DAO to get the query based on response type / template type
Tests around rest class and dao to check correct updating of stats
Missing:
- multiple client callbacks will keep incrementing the counts of success/failure. This edge case needs to be handle in a future story.