Commit Graph

343 Commits

Author SHA1 Message Date
Ben Thorner
015152bab2 Add boilerplate for sending SMS via Reach
This works in conjunction with the new SMS provider stub [^1].

Local testing:

- Run the migrations to add Reach as an inactive provider.
- Activate the Reach provider locally and deactivate the others.

      update provider_details set priority = 100, active = false where notification_type = 'sms';
      update provider_details set active = true where identifier = 'reach';

- Tweak your local environment to point at the SMS stub.

      export REACH_URL="http://host.docker.internal:6300/reach"

- Start / restart Celery to pick up the config change.
- Send a SMS via the Admin app and see the stub log it.
- Reset your environment so you can send normal SMS.

      update provider_details set active = true where notification_type = 'sms';
      update provider_details set active = false where identifier = 'reach';

[^1]: https://github.com/alphagov/notifications-sms-provider-stub/pull/10
2022-03-30 13:38:46 +01:00
Ben Thorner
3fab7a0ca9 Fix letter functional tests to work in Docker
Currently "test_send_letter_notification_via_api" fails at the final
stage in create-fake-letter-response-file [^1]:

        requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6011): Max retries exceeded with url: /notifications/letter/dvla (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff95ffc460>: Failed to establish a new connection: [Errno 111] Connection refused'))

This only applies when running in Docker so the default should still
be "localhost" for the Flask app itself.

[^1]: 5093064533/app/celery/research_mode_tasks.py (L57)
2022-03-09 11:07:50 +00:00
Katie Smith
d53ef27b7f Make letters still sending check later
This changes the scheduled task to raise an alert if letters are still
sending from 1530 to 1700. DVLA have reported that our "monitoring is
executing just before we actually mark them as ‘despatched’ and send
you the feedback files." and asked us to make the check a little later.

We don't actually contact DVLA until the morning after the alert anyway,
so this won't affect the process of getting in touch with them.

This change will require Cronitor to be updated for the new time.
2022-02-28 11:50:55 +00:00
Ben Thorner
c9a9640a4b Iterate local development with Docker
This makes a few changes to:

- Make local development consistent with our other apps. It's now
faster to start Celery locally since we don't try to build the
image each time - this is usually quick, but unnecessary.

- Add support for connecting to a local Redis instance. Note that
the previous suggestion of "REDIS = True" was incorrect as this
would be turned into the literal string "True".

I've also co-located and extended the recipes in the Makefile to
make them a bit more visible.
2022-02-24 17:15:41 +00:00
Ben Thorner
5206844a95 Merge pull request #3438 from alphagov/lower-query-timeout-180693991
Revert increased timeout for reporting worker
2022-02-16 13:38:29 +00:00
Leo Hemsted
1f3785a7a3 add script to run celery from within docker
as a team we primarily develop locally. However, we've been experiencing
issues with pycurl, a subdependency of celery, that is notoriously
difficult to install on mac. On top of the existing issues, we're also
seeing it conflict with pyproj in bizarre ways (where the order of
imports between pyproj and pycurl result in different configurations of
dynamically linked C libraries being loaded.

You are encouraged to attempt to install pycurl locally, following these
instructions: https://github.com/alphagov/notifications-manuals/wiki/Getting-Started#pycurl

However, if you aren't having any luck, you can instead now run celery
in a docker container.

`make run-celery-with-docker`

This will build a container, install the dependencies, and run celery
(with the default of four concurrent workers).

It will pull aws variables from your aws configuration as boto would
normally, and it will attempt to connect to your local database with the
user `postgres`. If your local database is configured differently (for
example, with a different user, or on a different port), then you can
set the SQLALCHEMY_DATABASE_URI locally to override that.
2022-02-01 16:29:08 +00:00
Ben Thorner
0d71ee69f0 Revert increased timeout for reporting worker
This reverts commit 603acc8b1e +
This reverts commit edad1c9a21.

The cause of the slowness was fixed in [1] and since [2] we now
have data to prove it: each query to get the data is taking under
5 minutes, so it's safe to lower the timeout again.

[1]: https://github.com/alphagov/notifications-api/pull/3417
[2]: https://github.com/alphagov/notifications-api/pull/3437
2022-01-25 12:50:43 +00:00
Ben Thorner
7ad0c4103a Stop killing reporting processes after each task
Previously we think this setting was necessary to avoid a memory
leak [1], but it's unclear if this is still an issue:

- We've advanced two major versions of Celery.
- Some of the tasks are now quicker and leaner.

Restarting worker sub-processes after each task is a big problem
for performance, as we move towards parallelising our reporting.

This is something of a test to see if we can manage without this
setting. Note that we need to unset the variable manually:

   cf unset-env notify-delivery-worker-reporting CELERYD_MAX_TASKS_PER_CHILD

In the worst case we can always re-run any failed tasks. To check
the worker is still behaving as expected, we can:

- Monitor CPU / memory graphs for it.
- Check `cf events` for unexpected restarts / crashes.
- Compare numbers of task completion logs to previous days.
- Check the number of new billing / status rows looks right.

[1]: ad419f7592
2022-01-24 12:52:52 +00:00
Rebecca Law
603acc8b1e Increase the SQL timeout for the notify-delivery-worker-reporting app.
When running the night reporting tasks we are seeing that some tasks are failing because the query is timing out. We need to revisit how to optimise the query but this will at least let the process finish.
2021-12-23 11:41:49 +00:00
Leo Hemsted
f6d210f1e6 put delete tasks on the reporting worker
they share a lot with the reporting tasks (creating ft_billing and
ft_notification_status), in that they're run nightly, take a long time,
and we see error messages if they get run multiple times (due to
visibility timeout).

The periodic app has two concurrent processes - previously there was
just one delete task, which would use one of those processes, while the
other process would pick up anything else on the queue (at that time of
night, the regular provider switch checks and scheduled job checks).
However, when we switched to running the three delete notification types
separately, we saw visibility timeout issues - three tasks would be
created, all three would be picked up by one celery instance, the two
worker processes would start on two of them, and the third would sit on
the box, wait longer than the visibility timeout to be picked up (and
acknowledged), and so SQS would assume the task was lost and replay it.

it's queues all the way down!

By putting them on the reporting worker we can take advantage of tuning
that app (for example setting the prefetch multiplier to one) which is
designed to run large tasks. We've also got more concurrent workers on
this box, so we can run all three tasks at once.
2021-12-03 13:28:16 +00:00
Chris Hill-Scott
0236318189 Republish gov.uk/alerts every night to clear down planned tests
We have made it so that gov.uk/alerts shows a ‘1 planned test’ banner
for the whole of the day when there has been an operator test on that
day.

We need to remove the banner when the day is over.

The most straightforward way to do this is to republish the site at the
start of every day. The gov.uk/alerts code[1] will work out if there are
or aren’t any planned tests to show that day.

1. 5a274af6d0/app/models/alerts.py (L38-L44)
2021-11-15 14:23:32 +00:00
Katie Smith
3d4796c924 Add task to resanitise and replace a PDF for precompiled letter
This adds a task which is designed to be used if we want to recreate the
PDF for a precompiled letter (either one that has been created using the
API or one that has been uploaded through the website).

The task takes the `notification_id` of the letter and passes template
preview the details it needs in order to sanitise the original file and
then replace the version in the letters-pdf bucket with the freshly
sanitised version.
2021-11-10 09:51:31 +00:00
Richard Baker
e10f45b3a7 Cast Celery worker_max_tasks_per_child to int or None
We use this config option when running workers that process non-memory-safe tasks to restart the worker after n tasks.

Celery 5 requires this to be passed as an int or None.

Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
2021-11-05 11:09:09 +00:00
Ben Thorner
d0550533a7 Remove redundant polling_interval setting
This appeared without explanation in [1], but it's the same as the
default value [2] so we don't need to specify it - doing so gives
the impression we made a decision, but that's not clear here.

[1]: https://github.com/alphagov/notifications-api/pull/2142/files#diff-84f1a9419471e289c6b6e2b0209b329e20df6cef81d1f7f0a193ddc2fc6ad69dR153
[2]: https://docs.celeryproject.org/en/stable/getting-started/backends-and-brokers/sqs.html#polling-interval
2021-11-01 09:54:07 +00:00
Ben Thorner
44b3b42aba Rewrite config to fix deprecation warnings
The new format was introduced in Celery 4 [1] and is due for removal
in Celery 6 [2], hence the warnings e.g.

    [2021-10-26 14:31:57,588: WARNING/MainProcess] /Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/utils.py:206: CDeprecationWarning:
        The 'CELERY_TIMEZONE' setting is deprecated and scheduled for removal in
        version 6.0.0. Use the timezone instead

      alternative=f'Use the {_TO_NEW_KEY[setting]} instead')

This rewrites the config to match our other apps [3][4]. Some of the
settings have been removed entirely:

- "CELERY_ENABLE_UTC = True" - this has been enabled by default since
  Celery 3 [5].

- "CELERY_ACCEPT_CONTENT = ['json']", "CELERY_TASK_SERIALIZER = 'json'"
  - these are the default settings since Celery 4 [6][7].

Finally, this removes a redundant (and broken) bit of development config
- NOTIFICATION_QUEUE_PREFIX - that should be set in environment.sh [8].

[1]: https://docs.celeryproject.org/en/stable/history/whatsnew-4.0.html#lowercase-setting-names
[2]: https://docs.celeryproject.org/en/stable/history/whatsnew-5.0.html#step-2-update-your-configuration-with-the-new-setting-names
[3]: 252ad01d39/app/config.py (L27)
[4]: 03df0d9252/app/__init__.py (L33)
[5]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-enable_utc
[6]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-task_serializer
[7]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-accept_content
[8]: 2edbdec4ee/README.md (environmentsh)
2021-11-01 09:54:05 +00:00
Leo Hemsted
19394ab9dd construct celery queues once in the base config
previously, we were confusing things by appending to CELERY_QUEUES in
both dev and test configs - these are executed at import time, so the
list contained all queues twice, regardless of what config you're
actually using.

Fortunately, the -Q command that we supply the workers with overrides
this config option, so other environments weren't affected. Given that,
we can tidy up this code by just declaring it in the base config every
time
2021-11-01 09:54:04 +00:00
Katie Smith
04bfd6bfdb Trigger task to publish alerts when sending or cancelling alert
When we send or cancel a broadcast message, we now trigger a task
in govuk-alerts repo that polls our API for alerts and
publishes a fresh list of alerts.

Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>
2021-10-18 08:41:24 +01:00
Chris Hill-Scott
544bfbf569 Add separate config item for failed login count
It’s confusing that changing `MAX_VERIFY_CODE_COUNT` also limits the
number of failed login attempts that a user of text messages 2FA can
make.

This makes the parameters independent, and adds a test to make sure any
future changes which affect the limit of failed login attempts are
covered.
2021-10-04 10:45:07 +01:00
Chris Hill-Scott
786893d920 Reduce max concurrent 2 factor codes
I was doing some analysis and saw that in the last 24 hours the most
codes that anyone had was in a 15 minute window was 3.

So I think we can safely reduce this to 5 to get a bit more security
with enough headroom to not have any negative impact to the user.
2021-10-04 10:45:06 +01:00
Ben Thorner
e1dec3f9b8 Switch to per-app secrets from internal APIs
Relates to: [1]

[1]: https://github.com/alphagov/notifications-credentials/pull/231
2021-08-05 17:24:56 +01:00
Ben Thorner
4b7ad89f6a Add pretend authenticated API for govuk-alerts
We can define the API properly in future work. I've used a separate
blueprint from "broadcasts" since this API is purely internal, and
it's helpful to make it clear it's specific to govuk-alerts.
2021-08-03 15:58:28 +01:00
Ben Thorner
3e32fc99b8 Rename ADMIN_CLIENT_USER_NAME to say CLIENT_ID
"user name" implies we're doing basic auth, which we're not. We
should use the standard terminology for bearer tokens.
2021-08-03 15:58:27 +01:00
Ben Thorner
49455d9890 Support granular API auth for internal apps
Previously we just had a single array of API keys / secrets, any of
which could be used to get past the "requires_admin_auth" check.
While multiple keys are necessary to allow for rotation, we should
avoid giving other apps access this way (too much privilege).

This converts the existing config vars into a new dictionary, keyed
by client_id. We can then use the dictionary to scope auth for new
API consumers like gov.uk/alerts to just the endpoints they need to
access, while maintaining existing access for the Admin app.

Once the new dictionary is available as a JSON environment variable,
we'll be able to remove the old credentials / config. In the next
commits, we'll look at more tests for the new functionality.
2021-07-29 12:53:02 +01:00
Ben Thorner
5e9d8e5fa0 Auto expire old broadcast messages
Since the expiry is sent as part of the message payload, we don't
need to invoke the CBC proxies (and indeed there's no way to do so
for an expired alert). In future we plan to extend this task so it
triggers the regeneration of content on gov.uk/alerts.

It's worth noting that 'finishes_at' can theoretically be None, in
which case it's unclear when the alert should expire. While alerts
from the Admin app should always have an expiry [1], we have many
in the DB that don't, so it's worth checking for this scenario.

[1]: 078ac10c8d/app/models/broadcast_message.py (L255)
2021-07-21 13:05:11 +01:00
David McDonald
f194231d87 Make check-if-letters-still-in-created run at 7am
If this alert goes off in the morning, it usually means we need to do
something, ideally quite quickly as it indicates a potential problem
with the sending of letters over to DVLA the night before.

Given this goes off at 9am at the moment, but actually some people start
work earlier, if we alert at 7am it means it will likely be looked at
earlier in the day and we can potentially fix any problems with letters
sooner than later.
2021-04-27 11:26:18 +01:00
Rebecca Law
f3fdd3b09b Add internation api key for firetext.
We want to start using Firetext for sending international SMS. They
require us to use a different API key for international SMS because it
requires a new code path to switch the sender ID to something that the
country will accept.
This PR does not include switching the sender of international SMS to
Firetext but sets us up to do so.
2021-04-20 13:58:55 +01:00
David McDonald
514afeb6f3 Set CBC_PROXY_ENABLED per environment, not dynamically
Previously we looked at whether an environment was given AWS access keys
to decide if the `CBC_PROXY_ENABLED` setting was true. Given that all
environments (apart from development) are currently hooked up to our AWS
cell broadcast accounts, it doesn't feel too useful to have a dynamic
switch when we can just hardcode it.

On top of that, this lays the groundwork for having `CBC_PROXY_ENABLED`
to be True even if an individual application doesn't have the CBC PROXY
aws access keys as in future only the broadcasts worker will have the
AWS keys but all the other apps will know that cell broadcasting is
indeed turned on for that environment.
2021-04-09 11:56:00 +01:00
Katie Smith
c3d9aca43a Remove redundant comment
We no longer have a noop client
2021-04-09 11:54:32 +01:00
David McDonald
6d410daae4 Remove the emergency alerts canary
See https://github.com/alphagov/notifications-broadcasts-infra/pull/197
for why we no longer need this and we get to delete some code!
2021-03-26 18:31:53 +00:00
David McDonald
41d95378ea Remove everything for the performance platform
We no longer will send them any stats so therefore don't need the code
- the code to work out the nightly stats
- the performance platform client
- any configuration for the client
- any nightly tasks that kick off the sending off the stats

We will require a change in cronitor as we no longer will have this task
run meaning we need to delete the cronitor check.
2021-03-15 12:04:53 +00:00
David McDonald
8325431462 Move saving of processing time into separate task
We current do this as part of send-daily-performance-platform-stats but
now this moves it into its own separate task. This is for two reasons
- we will shortly get rid of the send-daily-performance-platform-stats
  task as we no longer will need to send anything to performance
  platform
- even if we did decide to keep the task
  send-daily-performance-platform-stats and remove the specific bits
  that relate to the performance platform, it's probably nicer to
  rewrite the new task from scratch to make sure it's all clear and easy
  to understand
2021-03-15 11:44:01 +00:00
Ben Thorner
a91fde2fda Run auto-correct on app/ and tests/ 2021-03-12 11:45:45 +00:00
Richard Baker
2e4ac1f09c Enable EE Cell Broadcasts in production environment
Removes the configuration override for Live, so the base configuration is
used, enabling cell broadcasting for all MNOs.

Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
2021-03-02 09:34:35 +00:00
Rebecca Law
3df334d099 Simplify config and add json loads 2021-02-26 12:19:03 +00:00
Rebecca Law
acfb759cb9 Change DVLA_EMAIL_ADDRESS to a list 2021-02-26 11:21:16 +00:00
Pea Tyczynska
f3e0cfc727 Pull DVLA address from credentials on staging
So that we can test this flow on staging.
2021-02-24 11:34:29 +00:00
Pea Tyczynska
5c22c926b0 Stub DVLA email for all envs except prod
In prod we will get it from Credentials. In other envs, we don't
really want to send real email.
2021-02-23 15:13:52 +00:00
Pea Tyczynska
e0c73ac342 Send daily email with letter and sheet volumes to DVLA 2021-02-23 15:13:19 +00:00
David McDonald
c03ad82227 Turn on o2 and three mnos in prod
Supporting infrastructure is ready for these two mnos
2021-02-23 13:56:47 +00:00
Katie Smith
c59e0091ee Stop emailing Notify when an MOU is signed
We've decided we don't get any value from these emails any more, so this
stops us (Notify support) receiving them. We still let teams know an MOU
has been signed.
2021-02-18 09:07:19 +00:00
Rebecca Law
77b76ea0a4 Rename variable, it's a better name now. 2021-02-17 13:15:29 +00:00
Rebecca Law
e77534fb17 Send text message that are to an international number from a number rather than "Notify"
Update `send_user_2fa_code` to send from number when recipient is international
Update `update_user_attribute` to send from number when recipient is international
2021-02-17 12:14:47 +00:00
David McDonald
75f8db19eb Merge pull request #3120 from alphagov/update-service-broadcast-settings
Update service broadcast settings
2021-02-16 14:50:18 +00:00
Leo Hemsted
bbab7437f4 flake8 2021-02-16 12:23:02 +00:00
David McDonald
9f4b82f074 Make service a member of the broadcast organisation
We will use this to easily identify all our broadcast services. There
could be other ways to deal with finding and seeing all broadcast
services but this is a good and easy way to start.
2021-02-16 10:31:06 +00:00
Leo Hemsted
3e82691818 enable cell broadcast on prod for vodafone only
nb: will need cbc aws key to be set in credentials before deploy
2021-02-15 17:38:50 +00:00
David McDonald
a1e539e785 Merge pull request #3132 from alphagov/created-letters-runbook
Improvements to our letter checking tasks
2021-02-12 16:30:42 +00:00
David McDonald
5526c89c34 Rename task and function for clarity
This doesn't just relate to precompiled letters, it's actually just
checking that there are not any letters still waiting for a virus check
that should not be. This change to the naming makes it more accurate
and therefore easy to understand
2021-02-10 15:23:53 +00:00
David McDonald
1b9d8252ec Rename task and function for clarity
This doesn't just relate to templated letters, it's actually just
checking that there are not any letters still in created that should not
be. This change to the naming makes it more accurate and therefore easy
to understand
2021-02-10 15:23:52 +00:00
Katie Smith
5eebcf6452 Put service callback retries on a different queue
At the moment, if a service callback fails, it will get put on the retry queue.
This causes a potential problem though:

If a service's callback server goes down, we may generate a lot of retries and
this may then put a lot of items on the retry queue. The retry queue is also
responsible for other important parts of Notify such as retrying message
delivery and we don't want a service's callback server going down to have an
impact on the rest of Notify.

Putting the retries on a different queue means that tasks get processed
faster than if they were put back on the same 'service-callbacks' queue.
2021-02-09 13:31:16 +00:00