Changed the '/service/unique' endpoint to optionally accept the
service_id parameter. It now doesn't matter if a user tries to change
the capitalization or add punctuation to their own service name. But
there should still be an error if a user tries to change the punctuation
or capitalization of another service.
service_id needs to be allowed to be None until notifications-admin is
updated to always pass in the service_id.
Flask-SQLAlchemy sets up a connection pool with 5 connections and will
create up to 10 additional connections if all the pool ones are in use.
If all connections in the pool and all overflow connections are in
use, SQLAlchemy will block new DB sessions until a connection becomes
available. If a session can't acquire a connections for a specified
time (we set it to 30s) then a TimeoutError is raised.
By default db.session is deleted with the related context object
(so when the request is finished or app context is discarded).
This effectively limits the number of concurrent requests/tasks with
multithreaded gunicorn/celery workers to the maximum DB connection pool
size. Most of the time these limits are fine since the API requests are
relatively quick and are mainly interacting with the database anyway.
Service callbacks however have to make an HTTP request to a third party.
If these requests start taking a long time and the number of threads is
larger than the number of DB connections then remaining threads will
start blocking and potentially failing if it takes more than 30s to
acquire a connection. For example if a 100 threads start running tasks
that take 20s each with a max DB connection pool size of 10 then first 10
threads will acquire a connection right away, next 10 tasks will block for
20 seconds before the initial connections are released and all other tasks
will raise a TimeoutError after 30 seconds.
To avoid this, we perform all database operations at the beginning of
the task and then explicitly close the DB session before sending the
HTTP request to the service callback URL. Closing the session ends
the transaction and frees up the connection, making it available for
other tasks. Making calls to the DB after calling `close` will acquire
a new connection. This means that tasks are still limited to running
at most 15 queries at the same time, but can have a lot more concurrent
HTTP requests in progress.
Celery tasks require an active app context in order to use app logger,
database or app configuration values. Since there's no builtin support
we create this context by using a custom celery task class NotifyTask.
However, NotifyTask was using `current_app`, which itself is only
available within an app context so the code was pushing an initial
context in run_celery.py.
This works with the default prefork pool implementation, but raises
a "working outside of application context" error with an eventlet
pool since that initial application context is local to a thread
and eventlet celery worker pool will create multiple green threads.
To avoid this, we bind NotifyTask to the app variable with a closure
and use that variable instead of `current_app` to create the context
for executing the task. This avoids any issues caused by shared
initial app context being lost when spawning additional worker threads.
We still need to keep the context push in run_celery.py for prefork
workers since it's required for logging events outside of tasks
(eg logging during `worker_process_shutdown` signal processing).
Similar to https://github.com/alphagov/notifications-api/pull/1515
This lets the admin app pass in a domain to use for email auth links,
so that when it’s running on a different URL users who try to sign in
will get an email auth link for the domain they sign in on, not the
default admin domain for the environment in which the API is running.
- Changed the organisation DAO update method to only make 1 query
- Updated the update rest endpoint to not return an organisation when
the update is successful
- If the SMS client sends a status code that we do not recognize raise a ClientException and set the notification status to technical-failure
- Simplified the code in process_client_response, using a simple map.
they were not included in nightly task since that runs off
NotificationHistory, which doesn't include test keys. However, when you
load the page we top up the nightly stats with today's data from the
Notifications table, which *does* include test data.
notable things that have been kept until migration is complete:
* passing in `organisation` to update_service will update email branding
* both `/email-branding` and `/organisation` hit the same code
* service endpoints still return organisation as well as email branding
logging at info level and as such no longer prints out the celery task
timing which are found to be use to find out if a tasks has been called
but also the timing for the task. Added an extra timing message for
celery tasks so that it can be determined if the these are less frequent
than the API calls and provide more useful information
This PR is a proposal to reduce the average messages we see for a single notification from about 7 messages to 2.
Messaging would change to something like this:
February 2nd 2018, 15:39:05.885 Full delivery response from Firetext for notification: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
{'status': ['0'], 'reference': ['8eda51d5-cd82-4569-bfc9-d5570cdf2126'], 'time': ['2018-02-02 15:39:01'], 'code': ['000']}
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:57.727 SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to provider firetext at 2018-02-02 15:38:56.716814
February 2nd 2018, 15:38:56.727 Starting sending SMS 8eda51d5-cd82-4569-bfc9-d5570cdf2126 to provider at 2018-02-02 15:38:56.408181
February 2nd 2018, 15:38:56.727 Firetext request for 8eda51d5-cd82-4569-bfc9-d5570cdf2126 finished in 0.30376038211397827
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 sent to the priority-tasks queue for delivery
To somthing like this:
February 2nd 2018, 15:39:05.885 Firetext callback return status of 0 for reference: 8eda51d5-cd82-4569-bfc9-d5570cdf2126
February 2nd 2018, 15:38:49.449 sms 8eda51d5-cd82-4569-bfc9-d5570cdf2126 created at 2018-02-02 15:38:48.439113
The NOTIFY_ENVIRONMENT variable is set to `production` from the run_paas_app script, but that is overwritten with `live` in the create_app function when starting an application.
Although this is confusing and it would be good to resolve that. It is a larger piece of work. For now I have included booth strings in the if condition, that way when we do migrate the code we will not have an issue with these two methods.
if the international_billing_rates.yml has `dlr: null`, that means we
don't know what delivery receipts they provide - they might not provide
any. So if we do get an update, we don't know for sure that the message
was actually delivered - lets not update it.