mirror of
https://github.com/GSA/notifications-api.git
synced 2025-12-24 01:11:38 -05:00
limit means we only return 50k letters, if there are more than that for a service we'll skip them and they won't be picked up until the next day. If you remove the limit, sqlalchemy prefetches query results so it can build up ORM results, for example collapsing joined rows into single objects with chidren. SQLAlchemy streams the data into a buffer, and normally will still prefetch the entire resultset so it can ensure integrity of the session, (so that if you modify one result that is duplicated further down in the results, both rows are updated in the session for example). However, we don't care about that, but we do care about preventing the result set taking up too much memory. We can use `yield_per` to yield from sqlalchemy to the iterator (in this case the `for letter in letters_awaiting_sending` loop in letters_pdf_tasks.py) - this means every time we hit 10000 rows, we go back to the database to get the next 10k. This way, we only ever need 10k rows in memory at a time. This has some caveats, mostly around how we handle the data the query returns. They're a bit hard to parse but I'm pretty sure the notable limitations are: * It's dangerous to modify ORM objects returned by yield_per queries * It's dangerous to join in a yield_per query if you think there will be more than one row per item (for example, if you join from notification to service, there'll be multiple result rows containing the same service, and if these are split over different yield chunks, then we may experience undefined behaviour. These two limitations are focused around there being no guarantee of having one unique row per item. For more reading: https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per https://www.mail-archive.com/sqlalchemy@googlegroups.com/msg12443.html