This slowed me down when making changes to the APIs. As well as
being unnecessary given the structural focus of these tests, I
found the way the test data was generated was quite confusing.
Before:
time pytest tests/app/billing/test_rest.py
...
pytest tests/app/billing/test_rest.py 4.16s user 0.43s system 55% cpu 8.355 total
After:
time pytest tests/app/billing/test_rest.py
...
pytest tests/app/billing/test_rest.py 2.16s user 0.25s system 70% cpu 3.413 total
This test should be focussed on the structural properties of the
API. We can leave more detailed testing of rate multipliers, etc.
to lower-level DAO tests.
This slowed me down when making changes to the DAO functions. It's
really not necessary to do 3 * 367 DB insertions for both tests.
Before:
time pytest tests/app/dao/test_ft_billing_dao.py -k for_year
...
pytest tests/app/dao/test_ft_billing_dao.py -k for_year 3.95s user 0.40s system 62% cpu 6.971 total
After:
time pytest tests/app/dao/test_ft_billing_dao.py -k for_year
...
pytest tests/app/dao/test_ft_billing_dao.py -k for_year 1.84s user 0.25s system 69% cpu 3.006 total
This is unnecessary since we now have separate "_variable_rates"
tests that check the behaviour for multiple rates explicitly for
the two types of notifications it affects.
This can be calculated from the "free_allowance_used" field and the
"chargeable_units" field, but having it included separately is more
convenient as it can be used directly in Admin [^1].
[^1]: 417e7370bb/app/templates/views/usage.html (L38-L39)
This represents the number of chargeable_units that were actually
free due to the free allowance - they won't be included in "cost".
Although the existing calculations in Admin [^1][^2] will still be
correct with a change in SMS rates - it's cost that's the problem
- it makes sense to have all the knowledge about calculating usage
consistently in these two APIs.
Note that the Integer casting is covered by the API-level tests in
test_rest.
[^1]: 474d7dfda8/app/main/views/dashboard.py (L490)
[^2]: c63660d56d/app/main/views/dashboard.py (L350)
This will replace the manual calculations in Admin [^1][^2] for SMS
and also in API [^3] for annual letter costs.
Doing the calculation here also means we correctly attribute free
allowance to the earliest rows in the billing table - Admin doesn't
know when a given rate was applied so can't do this without making
assumptions about when we change our rates.
Since the calculation now depends on annual billing, we need to
change all the tests to make sure a suitable row exists. I've also
adjusted the test data to match the assumption that there can only
be one SMS rate per bst_date.
Note about "OVER" clause
========================
Using "rows=" ("ROWS BETWEEN") makes more sense than "range=" as
we want the remainder to be incremental within each group in a
"GROUP BY" clause, as well as between groups i.e
# ROWS BETWEEN (arbitrary numbers to illustrate)
date=2021-04-03, units=3, cost=3.29
date=2021-04-03, units=2, cost=4.17
date=2021-04-04, units=2, cost=5.10
vs.
# RANGE BETWEEN
date=2021-04-03, units=3, cost=4.17
date=2021-04-03, units=2, cost=4.17
date=2021-04-04, units=2, cost=5.10
See [^4] for more details and examples.
[^1]: https://github.com/alphagov/notifications-admin/blob/master/app/templates/views/usage.html#L60
[^2]: 072c3b2079/app/billing/billing_schemas.py (L37)
[^3]: 474d7dfda8/app/templates/views/usage.html (L98)
[^4]: https://learnsql.com/blog/difference-between-rows-range-window-functions/
There is no such thing as a "billing unit". The data this field
contained was also a confusing mixture of two types:
- For emails and letters, it was just "notifications_sent".
- For SMS, it was the "chargeable_units" (billable * multiplier).
This replaces the single, ambiguous "billing_units" field with
"chargeable_units" and "notifications_sent" in both usage APIs.
Once Admin is using them we can remove the old field.
We want to query for service usage in the BST financial year:
2022-04-01T00:00:00+01:00 to 2023-03-31T23:59:59+01:00 =>
2022-04-01 to 2023-03-31 # bst_date
Previously we were only doing this explicitly for the monthly API
and it seemed like the yearly usage API was incorrectly querying:
2022-03-31T23:00:00+00:00 to 2023-03-30T23:00:00+00:00 =>
2022-03-31 to 2023-03-30 # "bst_date"
However, it turns out this isn't a problem for two reasons:
1. We've been lucky that none of our rates have changed since 2017,
which is long ago enough that no one would care.
2. There's a quirk somewhere in Sqlalchemy / Postgres that has been
compensating for the lack of explicit BST conversion.
To help ensure we do this consistently in future I've DRYed-up the
BST conversion into a new utility. I could have just hard-coded the
dates but it seemed strange to have the knowledge twice.
I've also adjusted the tests so they detect if we accidentally use
data from a different financial year. (2) is why none of the test
assertions actually need changing and users won't be affected.
Sqlalchemy / Postgres quirk
===========================
The following queries were run on the same data but results differ:
FactBilling.query.filter(FactBilling.bst_date >= datetime(2021,3,31,23,0), FactBilling.bst_date <= '2021-04-05').order_by(FactBilling.bst_date).first().bst_date
datetime.date(2021, 4, 1)
FactBilling.query.filter(FactBilling.bst_date >= '2021-03-31 23:00:00', FactBilling.bst_date <= '2021-04-05').order_by(FactBilling.bst_date).first().bst_date
datetime.date(2021, 3, 31)
Looking at the actual query for the first item above still suggests
the results should be the same, but for the use of "timestamp".
SELECT ...
FROM ft_billing
WHERE ft_billing.service_id = '16b60315-9dab-45d3-a609-e871fbbf5345'::uuid AND ft_billing.bst_date >= '2016-03-31T23:00:00'::timestamp AND ft_billing.bst_date <= '2017-03-31T22:59:59.999999'::timestamp AND ft_billing.notification_type IN ('email', 'letter') GROUP BY ft_billing.rate, ft_billing.notification_type UNION ALL SELECT sum(ft_billing.notifications_sent) AS notifications_sent, sum(ft_billing.billable_units * ft_billing.rate_multiplier) AS billable_units, ft_billing.rate AS ft_billing_rate, ft_billing.notification_type AS ft_billing_notification_type
FROM ft_billing
WHERE ft_billing.service_id = '16b60315-9dab-45d3-a609-e871fbbf5345'::uuid AND ft_billing.bst_date >= '2016-03-31T23:00:00'::timestamp AND ft_billing.bst_date <= '2017-03-31T22:59:59.999999'::timestamp AND ft_billing.notification_type = 'sms' GROUP BY ft_billing.rate, ft_billing.notification_type) AS anon_1 ORDER BY anon_1.notification_type, anon_1.rate
If we try some manual queries with and without '::timestamp' we get:
select distinct(bst_date) from ft_billing where bst_date >= '2022-04-20T23:00:00' order by bst_date desc;
bst_date
------------
2022-04-21
2022-04-20
select distinct(bst_date) from ft_billing where bst_date >= '2022-04-20T23:00:00'::timestamp order by bst_date desc;
bst_date
------------
2022-04-21
2022-04-20
It looks like this is happening because all client connections are
aware of the local timezone, and naive datetimes are interpreted as
being in UTC - not necessarily true, but saves us here!
The monthly API datetimes were pre-converted to dates, so none of
this was relevant for deciding exactly which date to use.
These tests weren't checking anything structural about the APIs
beyond what's covered by the other tests. They represent edge
cases that we can check at a lower level instead.
It was also unclear what these tests were actually testing, as
the term "all cases" is vague. Looking at the test data, there
are variations in rates, multipliers and billable units for SMS
and letters, which I've summarised as "variable rates".
Note: I've removed part of the test data - for the first class
letter rate - as it's not clearly adding anything.
It was unclear why we had both of these tests when the one for
the financial year is more comprehensive - by checking data in
and beyond the specified financial year.
The only thing we lose in this file is checking multiple SMS
rates, which we will fix in the next commit when we import some
tests that are specific to variable rates.
This avoids a bunch of boilerplate and makes it easier to see what
is being passed in the request.
Note that the "invalid_schema" test is slightly different because
it was previously passing an invalid JSON object - "{}" - instead
of a valid JSON but invalid by the schema - "\"{}\"". The new test
seems more valuable than the old one.
`charset-normalizer` is now used by default if installed instead of
`chardet` (https://pyup.io/changelogs/beautifulsoup4/#4.11.0). We do
have `charset-normalizer` installed because it's a subdependency of the
requests library, so it is being used.
This caused the `test_content_too_long_returns_400` to fail since it
now thought that the encoding of `ŵ` is `{'encoding': 'Big5',
'language': 'Chinese', 'confidence': 1.0}`.
There are two options for fixing this
- change the test content so that it doesn't just contain a single
letter - the docs state that you shouldn't run character detection on
very tiny content
- add `chardet` as a requirement, so that the code functions exactly the
same as before
I've chose the first option, since this avoids adding a dependency and
we should never have messages consisting of a single character.
We have three different ways of checking the formats of datetimes.
1. The built-in way that comes with the jsonschema package ("date-time")
2. A new way we added for broadcasts ("datetime") 61a5730596
3. An old way we defined in
"/tests/app/public_contracts/schemas/v0/definitions.json"
In order to simplify things and make it clearer how datetimes are being
validated, this replaces the few places where we were using option 3 with option 1
instead. Option 3 was only being used to validate code that is no longer
used, the initial version of the API.
The big breaking change for our code (not mentioned in the changelog) is
that the built-in validator for the `date-time` format now requires the
`rfc3339-validator` package instead of the `strict-rfc3339` package.
This updates the requirements file to use `rfc3339-validator`. Without
this change, wrong `date-time` formats would always silently pass validation.
We were using the Draft4Validator in one place, so this updates it to
the Draft7Validator instead.
The schemas were mostly using draft 4 of the JSON schema, though there
were a couple of schemas that were already of version 7. This updates
them all to version 7, which is the latest version fully supported by
the jsonschema Python package. There are some breaking changes in the
newer version of the schema, but I could not see anywhere would these
affect us. Some of these schemas were not valid in version 4, but are
now valid in version 7 because `"required": []` was not valid in earlier
versions.
This is to support a migration from Redislabs to PaaS native Redis,
allowing us to toggle between old and new using the env vars for
the instance - without needing to change the code.
This was inconsistent with the source data for the fixture being
overidden in some of the tests. We only need to set it in the env
once, so it makes sense to just put the code there.
Fixes this error (in Admin):
File "/home/vcap/app/app/notify_client/notification_api_client.py", line 74, in send_precompiled_letter
return self.post(url='/service/{}/send-pdf-letter'.format(service_id), data=data)
File "/home/vcap/app/app/notify_client/__init__.py", line 59, in post
return super().post(*args, **kwargs)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_python_client/base.py", line 48, in post
return self.request("POST", url, data=data)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_python_client/base.py", line 64, in request
response = self._perform_request(method, url, kwargs)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_python_client/base.py", line 118, in _perform_request
raise api_error
notifications_python_client.errors.HTTPError: 500 - Internal server error
Due to this error (in API):
File "/home/vcap/app/app/service/send_notification.py", line 178, in send_pdf_letter_notification
raise e
File "/home/vcap/app/app/service/send_notification.py", line 173, in send_pdf_letter_notification
letter = utils_s3download(current_app.config['TRANSIENT_UPLOADED_LETTERS'], file_location)
File "/home/vcap/deps/0/python/lib/python3.9/site-packages/notifications_utils/s3.py", line 53, in s3download
raise S3ObjectNotFound(error.response, error.operation_name)
notifications_utils.s3.S3ObjectNotFound: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
I checked the DB to verify the letter does actually exist i.e. it
is an instance of the problem we're fixing here.
The new alert happens earlier but is otherwise the same:
- We only create a ticket in Production.
- We only create a ticket on approval.
I took this opportunity to refactor the alert as a private function
and test this specifically in detail to avoid lots of repetitive
mocks, which are required when calling the main "update" function.
One test I haven't preserved was for when the "names" array is empty,
as this was added for a legacy data integrity scenario [^1].
[^1]: bf0bf4e31c
This was creating data in the Notifications table but the funcetion
under test was - now the timestamps are in the past - looking in the
NotificationHistory table. Freezing the time of the test fixes that.