mirror of https://github.com/GSA/notifications-api.git synced 2026-02-28 05:50:27 -05:00

Go to file

Leo Hemsted 3bc3ed88b3 use yield_per instead of limit

limit means we only return 50k letters, if there are more than that for
a service we'll skip them and they won't be picked up until the next
day.

If you remove the limit, sqlalchemy prefetches query results so it can
build up ORM results, for example collapsing joined rows into single
objects with chidren. SQLAlchemy streams the data into a buffer, and
normally will still prefetch the entire resultset so it can ensure
integrity of the session, (so that if you modify one result that is
duplicated further down in the results, both rows are updated in the
session for example). However, we don't care about that, but we do care
about preventing the result set taking up too much memory. We can use
`yield_per` to yield from sqlalchemy to the iterator (in this case the
`for letter in letters_awaiting_sending` loop in letters_pdf_tasks.py) -
this means every time we hit 10000 rows, we go back to the database to
get the next 10k. This way, we only ever need 10k rows in memory at a
time.

This has some caveats, mostly around how we handle the data the query
returns. They're a bit hard to parse but I'm pretty sure the notable
limitations are:

* It's dangerous to modify ORM objects returned by yield_per queries
* It's dangerous to join in a yield_per query if you think there will be
  more than one row per item (for example, if you join from notification
  to service, there'll be multiple result rows containing the same
  service, and if these are split over different yield chunks, then we
  may experience undefined behaviour.

These two limitations are focused around there being no guarantee of
having one unique row per item.

For more reading:
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
https://www.mail-archive.com/sqlalchemy@googlegroups.com/msg12443.html

2020-10-26 13:01:34 +00:00

app

use yield_per instead of limit

2020-10-26 13:01:34 +00:00

docker

clean up docker and makefile

2019-10-11 13:55:21 +01:00

migrations

Revert "Remove the upload letters permission"

2020-10-23 15:14:37 +01:00

paas-failwhale

add api-paas-failwhale

2020-05-12 16:04:18 +01:00

scripts

run migrations if app is down

2020-06-26 15:28:28 +01:00

test_csv_files

…

tests

use yield_per instead of limit

2020-10-26 13:01:34 +00:00

.cfignore

…

.flake8

…

.gitignore

bump utils

2019-08-02 12:41:03 +01:00

.pyup.yml

…

application.py

Revert "testing build failure"

2019-05-16 17:06:34 +01:00

deploy-exclude.lst

…

gunicorn_config.py

Add GDSMetrics package

2020-04-20 18:39:45 +01:00

LICENSE

…

Makefile

add api-paas-failwhale

2020-05-12 16:04:18 +01:00

manifest.yml.j2

manifest: add cbc proxy env vars

2020-10-20 16:59:50 +01:00

Procfile

…

pytest.ini

don't expire email sign in codes on use

2020-05-04 12:01:57 +01:00

README.md

Tidy up Readme

2020-01-07 10:26:07 +00:00

requirements_for_test.txt

add more friendly datetime validator to jsonschema

2020-07-09 14:19:58 +01:00

requirements-app.txt

Bump utils to 42.2.1

2020-09-29 12:34:44 +01:00

requirements.txt

Bump utils to 42.2.1

2020-09-29 12:34:44 +01:00

run_celery.py

Revert "testing build failure"

2019-05-16 17:06:34 +01:00

runtime.txt

clean up docker and makefile

2019-10-11 13:55:21 +01:00

setup.cfg

…

statsd_mapping.yml

Add statsd exporter metric mapping configuration file

2019-04-24 13:50:13 +01:00

README.md

GOV.UK Notify API

Contains:

the public-facing REST API for GOV.UK Notify, which teams can integrate with using our clients
an internal-only REST API built using Flask to manage services, users, templates, etc (this is what the admin app talks to)
asynchronous workers built using Celery to put things on queues and read them off to be processed, sent to providers, updated, etc

Setting Up

Python version

At the moment we run Python 3.6 in production. You will run into problems if you try to use Python 3.5 or older, or Python 3.7 or newer.

AWS credentials

To run the API you will need appropriate AWS credentials. You should receive these from whoever administrates your AWS account. Make sure you've got both an access key id and a secret access key.

Your aws credentials should be stored in a folder located at ~/.aws. Follow Amazon's instructions for storing them correctly.

Virtualenv

mkvirtualenv -p /usr/local/bin/python3 notifications-api

`environment.sh`

Creating the environment.sh file. Replace [unique-to-environment] with your something unique to the environment. Your AWS credentials should be set up for notify-tools (the development/CI AWS account).

Create a local environment.sh file containing the following:

echo "
export NOTIFY_ENVIRONMENT='development'

export MMG_API_KEY='MMG_API_KEY'
export FIRETEXT_API_KEY='FIRETEXT_ACTUAL_KEY'
export NOTIFICATION_QUEUE_PREFIX='YOUR_OWN_PREFIX'

export FLASK_APP=application.py
export FLASK_DEBUG=1
export WERKZEUG_DEBUG_PIN=off
"> environment.sh

NOTES:

Replace the placeholder key and prefix values as appropriate
The SECRET_KEY and DANGEROUS_SALT should match those in the notifications-admin app.
The unique prefix for the queue names prevents clashing with others' queues in shared amazon environment and enables filtering by queue name in the SQS interface.

Postgres

Install Postgres.app. You will need admin on your machine to do this.

Choose the version with Additional Releases - you want 9.6. Once you run the app, open the sidebar, remove the default v11 server and create and initialise a v9.6 server.

Redis

To switch redis on you'll need to install it locally. On a OSX we've used brew for this. To use redis caching you need to switch it on by changing the config for development:

    REDIS_ENABLED = True

To run the application

First, run scripts/bootstrap.sh to install dependencies and create the databases.

You need to run the api application and a local celery instance.

There are two run scripts for running all the necessary parts.

scripts/run_app.sh

scripts/run_celery.sh

Optionally you can also run this script to run the scheduled tasks:

scripts/run_celery_beat.sh

To test the application

First, ensure that scripts/bootstrap.sh has been run, as it creates the test database.

Then simply run

make test

That will run flake8 for code analysis and our unit test suite. If you wish to run our functional tests, instructions can be found in the notifications-functional-tests repository.

To update application dependencies

requirements.txt file is generated from the requirements-app.txt in order to pin versions of all nested dependencies. If requirements-app.txt has been changed (or we want to update the unpinned nested dependencies) requirements.txt should be regenerated with

make freeze-requirements

requirements.txt should be committed alongside requirements-app.txt changes.

To run one off tasks

Tasks are run through the flask command - run flask --help for more information. There are two sections we need to care about: flask db contains alembic migration commands, and flask command contains all of our custom commands. For example, to purge all dynamically generated functional test data, do the following:

Locally

flask command purge_functional_test_data -u <functional tests user name prefix>

On the server

cf run-task notify-api "flask command purge_functional_test_data -u <functional tests user name prefix>"

All commands and command options have a --help command if you need more information.

To create a new worker app

You need to:

Create new entries for your app in manifest.yml.j2 and scripts/paas_app_wrapper.sh (example)
Update the jenkins deployment job in the notifications-aws repo (example)
Add the new worker's log group to the list of logs groups we get alerts about and we ship them to kibana (example)
Optionally add it to the autoscaler (example)

Important:

Before pushing the deployment change on jenkins, read below about the first time deployment.

First time deployment of your new worker

Our deployment flow requires that the app is present in order to proceed with the deployment.

This means that the first deployment of your app must happen manually.

To do this:

Ensure your code is backwards compatible
From the root of this repo run CF_APP=<APP_NAME> make <cf-space> cf-push

Once this is done, you can push your deployment changes to jenkins to have your app deployed on every deployment.

Languages

Python 98.5%

HCL 0.6%

Jinja 0.5%

Shell 0.3%

Makefile 0.1%