To check if it was our congestion point for the soak test.
(Email stub is a bit slower to respond than Amazon SES, and
the difference could lead to us having to many connections to db open
as we wait for response from the stub)
If we set an environment variable, we can stub out calls to SES and send
them to our own stub app. If the environment variable is not set, things
work as normal.
To be used alongside
https://github.com/alphagov/notifications-email-provider-stub
us not recognising a code or provider not having sent the detailed
status.
It seems like Firetext is sometimes sending us permanent-failure
without detailed status. It could be due to:
- them really not sending any detailed status
- them sending a status code we don't recognise
- them sending 000 code that means 'no errors', which we ignore
To see which one it is, and to debug such issues quicker in the
future, this PR adds status and detailed status codes to the logs.
Also log detailed delivery status for firetext in the same place in addition
to it being logged from notifications_dao.
Logging detailed delivery statuses will help us see why messages
fail to deliver. In the future we could persist detailed delivery
status in the database.
https://github.com/alphagov/notifications-utils/pull/742
We had been seeing phone numbers making it through in jobs containing
this no-break space which were then causing errors as we could not
process the job as it thought it was an invalid number. This should
solve the problem and strip that special character.
we're using statsd to monitor how long provider requests are taking.
However, there's lots of busy work that happens inside our statsd
metrics timing window. Things like json dumping and loading, building
headers, exception handling, etc.
for firetext/mmg, the response object from requests has an elapsed
property [1], which captures from sending raw data to parsing the
response headers. for ses, it's a bit trickier, but boto3 exposes a few
event hooks [2]. it's hard to find them without stepping through the
code, but the interesting ones are before-call, after-call,
after-call-error, request-created, and response-received. The
before-call and after-call involve some marshalling, built-in retrying,
etc, while request-created and response-received are much lower level.
They might be called more than once per ses request, if boto3 itself
retries the request on 5xx, 429 and low level socket errors [3].
Add these as new `raw-request-time` metrics rather than overwriting to
avoid changing the meaning of an existing metric, and to let us compare
the metrics to see if there's a noticeable difference at all
[1] https://requests.readthedocs.io/en/master/api/#requests.Response.elapsed
[2] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/events.html
[3] https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#legacy-retry-mode
This copies what we do to a user's email address when archiving the user
by prefixing it with `_archived_{date}`. We already prefixed the
service name and email_reply_to with `_archived`, but this didn't allow
a service with the same name to be archived more than once.
The assumption that a commit is issued in between migration files was false. This will force the db connection to commit, the alter table commands will complete and the alembic version updated.
The `notifications`, `notification_history`, `templates` and `templates_history`
tables all had a check constraint on the postage column which specified
that the postage had to be `first` or `second` if the notification or
template was a letter. We now have two more options for postage -
`europe` and `rest-of-world`.
It's not possible to alter a check constraint, so the constraints have
to be dropped then recreated. We are not recreating the constraint on
the `notification_history` table since values here are always copied
from the `notifications` table.
The constraints get added as `NOT VALID` at first - this stage will lock
the tables, so updating the `notification` table and `templates` and
`templates_history` are done in separate migrations so that we don't lock
all tables at the same time. In a third migration we then run
`VALIDATE CONSTRAINT` for all tables - this will lock a row at a time,
not the whole table.
When running the purge command I found about 4 users who could not be
deleted because their user id was still referenced in the services table
as they had created the service yet they were not a member of that
service anymore.
I have fixed this by checking that if they are not a member but created
the service then we also delete the service for them.
Note, I've followed the previous convention of no tests for this
function. I've run it locally and executed the code path so there should
be no major flaws in the code. There is a small chance I wasn't able to
exactly replicate the state that existed in preview on my local but
hopefully it was close enough to be accurate.
This PR tries to parse the date, if that throws an error return now as the datereceived. This will at least allow the message to be persisted. Typically the DateReceived, provider_date, and the created_at date in the inbound_sms table are within a second of each other.