This can happen in the following scenario (primarily for letters):
1. A service has a mixture of "delivered" and "sending" letters,
which the status task aggregates into two rows:
sending | 123
delivered | 456
2. After the 7 day retention has passed, only the "delivered" letters
will be archived [^1].
3. The status task now looks at the history table [^2], which means
it only sees the "delivered" letters.
4. The "sending" letters are eventually "delivered" and archived (before
the 10 day aggregation cutoff).
5. But the status aggregation task doesn't run.
This commit fixes (5).
[^1]: https://github.com/alphagov/notifications-api/pull/3063
[^2]: f87ebb094d/app/dao/fact_notification_status_dao.py (L51)
We had the `only` defined in the Meta class, and this wasn't working -
any extra fields were also being loaded. This moves it to the point
where the class is instantiated, which now works.
We have a lot of cases in the schemas where we're excluding a field that
doesn't actually exist on the schema anyway. This is often because a
model has been deleted, and the schema has not been updated. These
excluded fields have no effect at the moment, but Marshmallow 3 does
raise an error if you try and exclude non-existent fields.
There should be no change to what gets (de)serialized after this change.
* notify_db fixture creates the database connection and ensures the test
db exists and has migrations applied etc. It will run once per session
(test run).
* notify_db_session fixture runs after your test finishes and deletes
all non static (eg type table) data.
In unit tests that hit the database (ie: most of them), 99% of the time
we will need to use notify_db_session to ensure everything is reset. The
only time we don't need to use it is when we're querying things such as
"ensure get X works when database is empty". This is such a low
percentage of tests that it's easier for us to just use
notify_db_session every time, and ensure that all our tests run much
more consistently, at the cost of a small bit of performance when
running tests.
We used to use notify_db to access the session object for manually
adding, committing, etc. To dissuade usage of that fixture I've moved
that to the `notify_db_session`. I've then removed all uses of notify_db
that I could find in the codebase.
As a note, if you're writing a test that uses a `sample_x` fixture, all
of those fixtures rely on notify_db_session so you'll get the teardown
functionality for free. If you're just calling eg `create_x` db.py
functions, then you'll need to make you add notify_db_session fixture to
your test, even if you aren't manually accessing the session.
this was added five years ago but never used. if we want to bring back
variable rates per client we might as well get a fresh start since a lot
has changed since then.
an API build failed because one of the tests expected the database to be
empty, but it actually wasn't. This is probably because another test run
earlier on that worker did not clear down properly.
I took the opportunity to refresh all of these tests to ensure they all
correctly tear down, and also use the more modern admin_request fixture
ratehr than the old client one that required us to specify headers and
do json parsing etc.
This slowed me down when making changes to the APIs. As well as
being unnecessary given the structural focus of these tests, I
found the way the test data was generated was quite confusing.
Before:
time pytest tests/app/billing/test_rest.py
...
pytest tests/app/billing/test_rest.py 4.16s user 0.43s system 55% cpu 8.355 total
After:
time pytest tests/app/billing/test_rest.py
...
pytest tests/app/billing/test_rest.py 2.16s user 0.25s system 70% cpu 3.413 total
This test should be focussed on the structural properties of the
API. We can leave more detailed testing of rate multipliers, etc.
to lower-level DAO tests.
This slowed me down when making changes to the DAO functions. It's
really not necessary to do 3 * 367 DB insertions for both tests.
Before:
time pytest tests/app/dao/test_ft_billing_dao.py -k for_year
...
pytest tests/app/dao/test_ft_billing_dao.py -k for_year 3.95s user 0.40s system 62% cpu 6.971 total
After:
time pytest tests/app/dao/test_ft_billing_dao.py -k for_year
...
pytest tests/app/dao/test_ft_billing_dao.py -k for_year 1.84s user 0.25s system 69% cpu 3.006 total
This is unnecessary since we now have separate "_variable_rates"
tests that check the behaviour for multiple rates explicitly for
the two types of notifications it affects.
This can be calculated from the "free_allowance_used" field and the
"chargeable_units" field, but having it included separately is more
convenient as it can be used directly in Admin [^1].
[^1]: 417e7370bb/app/templates/views/usage.html (L38-L39)
This represents the number of chargeable_units that were actually
free due to the free allowance - they won't be included in "cost".
Although the existing calculations in Admin [^1][^2] will still be
correct with a change in SMS rates - it's cost that's the problem
- it makes sense to have all the knowledge about calculating usage
consistently in these two APIs.
Note that the Integer casting is covered by the API-level tests in
test_rest.
[^1]: 474d7dfda8/app/main/views/dashboard.py (L490)
[^2]: c63660d56d/app/main/views/dashboard.py (L350)
This will replace the manual calculations in Admin [^1][^2] for SMS
and also in API [^3] for annual letter costs.
Doing the calculation here also means we correctly attribute free
allowance to the earliest rows in the billing table - Admin doesn't
know when a given rate was applied so can't do this without making
assumptions about when we change our rates.
Since the calculation now depends on annual billing, we need to
change all the tests to make sure a suitable row exists. I've also
adjusted the test data to match the assumption that there can only
be one SMS rate per bst_date.
Note about "OVER" clause
========================
Using "rows=" ("ROWS BETWEEN") makes more sense than "range=" as
we want the remainder to be incremental within each group in a
"GROUP BY" clause, as well as between groups i.e
# ROWS BETWEEN (arbitrary numbers to illustrate)
date=2021-04-03, units=3, cost=3.29
date=2021-04-03, units=2, cost=4.17
date=2021-04-04, units=2, cost=5.10
vs.
# RANGE BETWEEN
date=2021-04-03, units=3, cost=4.17
date=2021-04-03, units=2, cost=4.17
date=2021-04-04, units=2, cost=5.10
See [^4] for more details and examples.
[^1]: https://github.com/alphagov/notifications-admin/blob/master/app/templates/views/usage.html#L60
[^2]: 072c3b2079/app/billing/billing_schemas.py (L37)
[^3]: 474d7dfda8/app/templates/views/usage.html (L98)
[^4]: https://learnsql.com/blog/difference-between-rows-range-window-functions/
There is no such thing as a "billing unit". The data this field
contained was also a confusing mixture of two types:
- For emails and letters, it was just "notifications_sent".
- For SMS, it was the "chargeable_units" (billable * multiplier).
This replaces the single, ambiguous "billing_units" field with
"chargeable_units" and "notifications_sent" in both usage APIs.
Once Admin is using them we can remove the old field.
This makes it easier to extend each function with costs and free
allowances - especially for SMS.
I've chosen to duplicate the "WHERE" clause in each subquery vs.
the top-level query. This will make more sense in later commits
where we start adding free allowance calculations, which need to
be done on a yearly basis - knowledge the subqueries should have.
We want to query for service usage in the BST financial year:
2022-04-01T00:00:00+01:00 to 2023-03-31T23:59:59+01:00 =>
2022-04-01 to 2023-03-31 # bst_date
Previously we were only doing this explicitly for the monthly API
and it seemed like the yearly usage API was incorrectly querying:
2022-03-31T23:00:00+00:00 to 2023-03-30T23:00:00+00:00 =>
2022-03-31 to 2023-03-30 # "bst_date"
However, it turns out this isn't a problem for two reasons:
1. We've been lucky that none of our rates have changed since 2017,
which is long ago enough that no one would care.
2. There's a quirk somewhere in Sqlalchemy / Postgres that has been
compensating for the lack of explicit BST conversion.
To help ensure we do this consistently in future I've DRYed-up the
BST conversion into a new utility. I could have just hard-coded the
dates but it seemed strange to have the knowledge twice.
I've also adjusted the tests so they detect if we accidentally use
data from a different financial year. (2) is why none of the test
assertions actually need changing and users won't be affected.
Sqlalchemy / Postgres quirk
===========================
The following queries were run on the same data but results differ:
FactBilling.query.filter(FactBilling.bst_date >= datetime(2021,3,31,23,0), FactBilling.bst_date <= '2021-04-05').order_by(FactBilling.bst_date).first().bst_date
datetime.date(2021, 4, 1)
FactBilling.query.filter(FactBilling.bst_date >= '2021-03-31 23:00:00', FactBilling.bst_date <= '2021-04-05').order_by(FactBilling.bst_date).first().bst_date
datetime.date(2021, 3, 31)
Looking at the actual query for the first item above still suggests
the results should be the same, but for the use of "timestamp".
SELECT ...
FROM ft_billing
WHERE ft_billing.service_id = '16b60315-9dab-45d3-a609-e871fbbf5345'::uuid AND ft_billing.bst_date >= '2016-03-31T23:00:00'::timestamp AND ft_billing.bst_date <= '2017-03-31T22:59:59.999999'::timestamp AND ft_billing.notification_type IN ('email', 'letter') GROUP BY ft_billing.rate, ft_billing.notification_type UNION ALL SELECT sum(ft_billing.notifications_sent) AS notifications_sent, sum(ft_billing.billable_units * ft_billing.rate_multiplier) AS billable_units, ft_billing.rate AS ft_billing_rate, ft_billing.notification_type AS ft_billing_notification_type
FROM ft_billing
WHERE ft_billing.service_id = '16b60315-9dab-45d3-a609-e871fbbf5345'::uuid AND ft_billing.bst_date >= '2016-03-31T23:00:00'::timestamp AND ft_billing.bst_date <= '2017-03-31T22:59:59.999999'::timestamp AND ft_billing.notification_type = 'sms' GROUP BY ft_billing.rate, ft_billing.notification_type) AS anon_1 ORDER BY anon_1.notification_type, anon_1.rate
If we try some manual queries with and without '::timestamp' we get:
select distinct(bst_date) from ft_billing where bst_date >= '2022-04-20T23:00:00' order by bst_date desc;
bst_date
------------
2022-04-21
2022-04-20
select distinct(bst_date) from ft_billing where bst_date >= '2022-04-20T23:00:00'::timestamp order by bst_date desc;
bst_date
------------
2022-04-21
2022-04-20
It looks like this is happening because all client connections are
aware of the local timezone, and naive datetimes are interpreted as
being in UTC - not necessarily true, but saves us here!
The monthly API datetimes were pre-converted to dates, so none of
this was relevant for deciding exactly which date to use.