notifications-api

mirror of https://github.com/GSA/notifications-api.git synced 2025-12-10 07:12:20 -05:00

Author	SHA1	Message	Date
Ben Thorner	015152bab2	Add boilerplate for sending SMS via Reach This works in conjunction with the new SMS provider stub [^1]. Local testing: - Run the migrations to add Reach as an inactive provider. - Activate the Reach provider locally and deactivate the others. update provider_details set priority = 100, active = false where notification_type = 'sms'; update provider_details set active = true where identifier = 'reach'; - Tweak your local environment to point at the SMS stub. export REACH_URL="http://host.docker.internal:6300/reach" - Start / restart Celery to pick up the config change. - Send a SMS via the Admin app and see the stub log it. - Reset your environment so you can send normal SMS. update provider_details set active = true where notification_type = 'sms'; update provider_details set active = false where identifier = 'reach'; [^1]: https://github.com/alphagov/notifications-sms-provider-stub/pull/10	2022-03-30 13:38:46 +01:00
Ben Thorner	3fab7a0ca9	Fix letter functional tests to work in Docker Currently "test_send_letter_notification_via_api" fails at the final stage in create-fake-letter-response-file [^1]: requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=6011): Max retries exceeded with url: /notifications/letter/dvla (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff95ffc460>: Failed to establish a new connection: [Errno 111] Connection refused')) This only applies when running in Docker so the default should still be "localhost" for the Flask app itself. [^1]: `5093064533/app/celery/research_mode_tasks.py (L57)`	2022-03-09 11:07:50 +00:00
Katie Smith	d53ef27b7f	Make letters still sending check later This changes the scheduled task to raise an alert if letters are still sending from 1530 to 1700. DVLA have reported that our "monitoring is executing just before we actually mark them as ‘despatched’ and send you the feedback files." and asked us to make the check a little later. We don't actually contact DVLA until the morning after the alert anyway, so this won't affect the process of getting in touch with them. This change will require Cronitor to be updated for the new time.	2022-02-28 11:50:55 +00:00
Ben Thorner	c9a9640a4b	Iterate local development with Docker This makes a few changes to: - Make local development consistent with our other apps. It's now faster to start Celery locally since we don't try to build the image each time - this is usually quick, but unnecessary. - Add support for connecting to a local Redis instance. Note that the previous suggestion of "REDIS = True" was incorrect as this would be turned into the literal string "True". I've also co-located and extended the recipes in the Makefile to make them a bit more visible.	2022-02-24 17:15:41 +00:00
Ben Thorner	5206844a95	Merge pull request #3438 from alphagov/lower-query-timeout-180693991 Revert increased timeout for reporting worker	2022-02-16 13:38:29 +00:00
Leo Hemsted	1f3785a7a3	add script to run celery from within docker as a team we primarily develop locally. However, we've been experiencing issues with pycurl, a subdependency of celery, that is notoriously difficult to install on mac. On top of the existing issues, we're also seeing it conflict with pyproj in bizarre ways (where the order of imports between pyproj and pycurl result in different configurations of dynamically linked C libraries being loaded. You are encouraged to attempt to install pycurl locally, following these instructions: https://github.com/alphagov/notifications-manuals/wiki/Getting-Started#pycurl However, if you aren't having any luck, you can instead now run celery in a docker container. `make run-celery-with-docker` This will build a container, install the dependencies, and run celery (with the default of four concurrent workers). It will pull aws variables from your aws configuration as boto would normally, and it will attempt to connect to your local database with the user `postgres`. If your local database is configured differently (for example, with a different user, or on a different port), then you can set the SQLALCHEMY_DATABASE_URI locally to override that.	2022-02-01 16:29:08 +00:00
Ben Thorner	0d71ee69f0	Revert increased timeout for reporting worker This reverts commit `603acc8b1e` + This reverts commit `edad1c9a21`. The cause of the slowness was fixed in [1] and since [2] we now have data to prove it: each query to get the data is taking under 5 minutes, so it's safe to lower the timeout again. [1]: https://github.com/alphagov/notifications-api/pull/3417 [2]: https://github.com/alphagov/notifications-api/pull/3437	2022-01-25 12:50:43 +00:00
Ben Thorner	7ad0c4103a	Stop killing reporting processes after each task Previously we think this setting was necessary to avoid a memory leak [1], but it's unclear if this is still an issue: - We've advanced two major versions of Celery. - Some of the tasks are now quicker and leaner. Restarting worker sub-processes after each task is a big problem for performance, as we move towards parallelising our reporting. This is something of a test to see if we can manage without this setting. Note that we need to unset the variable manually: cf unset-env notify-delivery-worker-reporting CELERYD_MAX_TASKS_PER_CHILD In the worst case we can always re-run any failed tasks. To check the worker is still behaving as expected, we can: - Monitor CPU / memory graphs for it. - Check `cf events` for unexpected restarts / crashes. - Compare numbers of task completion logs to previous days. - Check the number of new billing / status rows looks right. [1]: `ad419f7592`	2022-01-24 12:52:52 +00:00
Rebecca Law	603acc8b1e	Increase the SQL timeout for the `notify-delivery-worker-reporting` app. When running the night reporting tasks we are seeing that some tasks are failing because the query is timing out. We need to revisit how to optimise the query but this will at least let the process finish.	2021-12-23 11:41:49 +00:00
Leo Hemsted	f6d210f1e6	put delete tasks on the reporting worker they share a lot with the reporting tasks (creating ft_billing and ft_notification_status), in that they're run nightly, take a long time, and we see error messages if they get run multiple times (due to visibility timeout). The periodic app has two concurrent processes - previously there was just one delete task, which would use one of those processes, while the other process would pick up anything else on the queue (at that time of night, the regular provider switch checks and scheduled job checks). However, when we switched to running the three delete notification types separately, we saw visibility timeout issues - three tasks would be created, all three would be picked up by one celery instance, the two worker processes would start on two of them, and the third would sit on the box, wait longer than the visibility timeout to be picked up (and acknowledged), and so SQS would assume the task was lost and replay it. it's queues all the way down! By putting them on the reporting worker we can take advantage of tuning that app (for example setting the prefetch multiplier to one) which is designed to run large tasks. We've also got more concurrent workers on this box, so we can run all three tasks at once.	2021-12-03 13:28:16 +00:00
Chris Hill-Scott	0236318189	Republish gov.uk/alerts every night to clear down planned tests We have made it so that gov.uk/alerts shows a ‘1 planned test’ banner for the whole of the day when there has been an operator test on that day. We need to remove the banner when the day is over. The most straightforward way to do this is to republish the site at the start of every day. The gov.uk/alerts code[1] will work out if there are or aren’t any planned tests to show that day. 1. `5a274af6d0/app/models/alerts.py (L38-L44)`	2021-11-15 14:23:32 +00:00
Katie Smith	3d4796c924	Add task to resanitise and replace a PDF for precompiled letter This adds a task which is designed to be used if we want to recreate the PDF for a precompiled letter (either one that has been created using the API or one that has been uploaded through the website). The task takes the `notification_id` of the letter and passes template preview the details it needs in order to sanitise the original file and then replace the version in the letters-pdf bucket with the freshly sanitised version.	2021-11-10 09:51:31 +00:00
Richard Baker	e10f45b3a7	Cast Celery worker_max_tasks_per_child to int or None We use this config option when running workers that process non-memory-safe tasks to restart the worker after n tasks. Celery 5 requires this to be passed as an int or None. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-11-05 11:09:09 +00:00
Ben Thorner	d0550533a7	Remove redundant polling_interval setting This appeared without explanation in [1], but it's the same as the default value [2] so we don't need to specify it - doing so gives the impression we made a decision, but that's not clear here. [1]: https://github.com/alphagov/notifications-api/pull/2142/files#diff-84f1a9419471e289c6b6e2b0209b329e20df6cef81d1f7f0a193ddc2fc6ad69dR153 [2]: https://docs.celeryproject.org/en/stable/getting-started/backends-and-brokers/sqs.html#polling-interval	2021-11-01 09:54:07 +00:00
Ben Thorner	44b3b42aba	Rewrite config to fix deprecation warnings The new format was introduced in Celery 4 [1] and is due for removal in Celery 6 [2], hence the warnings e.g. [2021-10-26 14:31:57,588: WARNING/MainProcess] /Users/benthorner/.pyenv/versions/notifications-api/lib/python3.6/site-packages/celery/app/utils.py:206: CDeprecationWarning: The 'CELERY_TIMEZONE' setting is deprecated and scheduled for removal in version 6.0.0. Use the timezone instead alternative=f'Use the {_TO_NEW_KEY[setting]} instead') This rewrites the config to match our other apps [3][4]. Some of the settings have been removed entirely: - "CELERY_ENABLE_UTC = True" - this has been enabled by default since Celery 3 [5]. - "CELERY_ACCEPT_CONTENT = ['json']", "CELERY_TASK_SERIALIZER = 'json'" - these are the default settings since Celery 4 [6][7]. Finally, this removes a redundant (and broken) bit of development config - NOTIFICATION_QUEUE_PREFIX - that should be set in environment.sh [8]. [1]: https://docs.celeryproject.org/en/stable/history/whatsnew-4.0.html#lowercase-setting-names [2]: https://docs.celeryproject.org/en/stable/history/whatsnew-5.0.html#step-2-update-your-configuration-with-the-new-setting-names [3]: `252ad01d39/app/config.py (L27)` [4]: `03df0d9252/app/__init__.py (L33)` [5]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-enable_utc [6]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-task_serializer [7]: https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-accept_content [8]: `2edbdec4ee/README.md (environmentsh)`	2021-11-01 09:54:05 +00:00
Leo Hemsted	19394ab9dd	construct celery queues once in the base config previously, we were confusing things by appending to CELERY_QUEUES in both dev and test configs - these are executed at import time, so the list contained all queues twice, regardless of what config you're actually using. Fortunately, the -Q command that we supply the workers with overrides this config option, so other environments weren't affected. Given that, we can tidy up this code by just declaring it in the base config every time	2021-11-01 09:54:04 +00:00
Katie Smith	04bfd6bfdb	Trigger task to publish alerts when sending or cancelling alert When we send or cancel a broadcast message, we now trigger a task in govuk-alerts repo that polls our API for alerts and publishes a fresh list of alerts. Co-authored-by: Pea Tyczynska <pea.tyczynska@digital.cabinet-office.gov.uk>	2021-10-18 08:41:24 +01:00
Chris Hill-Scott	544bfbf569	Add separate config item for failed login count It’s confusing that changing `MAX_VERIFY_CODE_COUNT` also limits the number of failed login attempts that a user of text messages 2FA can make. This makes the parameters independent, and adds a test to make sure any future changes which affect the limit of failed login attempts are covered.	2021-10-04 10:45:07 +01:00
Chris Hill-Scott	786893d920	Reduce max concurrent 2 factor codes I was doing some analysis and saw that in the last 24 hours the most codes that anyone had was in a 15 minute window was 3. So I think we can safely reduce this to 5 to get a bit more security with enough headroom to not have any negative impact to the user.	2021-10-04 10:45:06 +01:00
Ben Thorner	e1dec3f9b8	Switch to per-app secrets from internal APIs Relates to: [1] [1]: https://github.com/alphagov/notifications-credentials/pull/231	2021-08-05 17:24:56 +01:00
Ben Thorner	4b7ad89f6a	Add pretend authenticated API for govuk-alerts We can define the API properly in future work. I've used a separate blueprint from "broadcasts" since this API is purely internal, and it's helpful to make it clear it's specific to govuk-alerts.	2021-08-03 15:58:28 +01:00
Ben Thorner	3e32fc99b8	Rename ADMIN_CLIENT_USER_NAME to say CLIENT_ID "user name" implies we're doing basic auth, which we're not. We should use the standard terminology for bearer tokens.	2021-08-03 15:58:27 +01:00
Ben Thorner	49455d9890	Support granular API auth for internal apps Previously we just had a single array of API keys / secrets, any of which could be used to get past the "requires_admin_auth" check. While multiple keys are necessary to allow for rotation, we should avoid giving other apps access this way (too much privilege). This converts the existing config vars into a new dictionary, keyed by client_id. We can then use the dictionary to scope auth for new API consumers like gov.uk/alerts to just the endpoints they need to access, while maintaining existing access for the Admin app. Once the new dictionary is available as a JSON environment variable, we'll be able to remove the old credentials / config. In the next commits, we'll look at more tests for the new functionality.	2021-07-29 12:53:02 +01:00
Ben Thorner	5e9d8e5fa0	Auto expire old broadcast messages Since the expiry is sent as part of the message payload, we don't need to invoke the CBC proxies (and indeed there's no way to do so for an expired alert). In future we plan to extend this task so it triggers the regeneration of content on gov.uk/alerts. It's worth noting that 'finishes_at' can theoretically be None, in which case it's unclear when the alert should expire. While alerts from the Admin app should always have an expiry [1], we have many in the DB that don't, so it's worth checking for this scenario. [1]: `078ac10c8d/app/models/broadcast_message.py (L255)`	2021-07-21 13:05:11 +01:00
David McDonald	f194231d87	Make check-if-letters-still-in-created run at 7am If this alert goes off in the morning, it usually means we need to do something, ideally quite quickly as it indicates a potential problem with the sending of letters over to DVLA the night before. Given this goes off at 9am at the moment, but actually some people start work earlier, if we alert at 7am it means it will likely be looked at earlier in the day and we can potentially fix any problems with letters sooner than later.	2021-04-27 11:26:18 +01:00
Rebecca Law	f3fdd3b09b	Add internation api key for firetext. We want to start using Firetext for sending international SMS. They require us to use a different API key for international SMS because it requires a new code path to switch the sender ID to something that the country will accept. This PR does not include switching the sender of international SMS to Firetext but sets us up to do so.	2021-04-20 13:58:55 +01:00
David McDonald	514afeb6f3	Set `CBC_PROXY_ENABLED` per environment, not dynamically Previously we looked at whether an environment was given AWS access keys to decide if the `CBC_PROXY_ENABLED` setting was true. Given that all environments (apart from development) are currently hooked up to our AWS cell broadcast accounts, it doesn't feel too useful to have a dynamic switch when we can just hardcode it. On top of that, this lays the groundwork for having `CBC_PROXY_ENABLED` to be True even if an individual application doesn't have the CBC PROXY aws access keys as in future only the broadcasts worker will have the AWS keys but all the other apps will know that cell broadcasting is indeed turned on for that environment.	2021-04-09 11:56:00 +01:00
Katie Smith	c3d9aca43a	Remove redundant comment We no longer have a noop client	2021-04-09 11:54:32 +01:00
David McDonald	6d410daae4	Remove the emergency alerts canary See https://github.com/alphagov/notifications-broadcasts-infra/pull/197 for why we no longer need this and we get to delete some code!	2021-03-26 18:31:53 +00:00
David McDonald	41d95378ea	Remove everything for the performance platform We no longer will send them any stats so therefore don't need the code - the code to work out the nightly stats - the performance platform client - any configuration for the client - any nightly tasks that kick off the sending off the stats We will require a change in cronitor as we no longer will have this task run meaning we need to delete the cronitor check.	2021-03-15 12:04:53 +00:00
David McDonald	8325431462	Move saving of processing time into separate task We current do this as part of send-daily-performance-platform-stats but now this moves it into its own separate task. This is for two reasons - we will shortly get rid of the send-daily-performance-platform-stats task as we no longer will need to send anything to performance platform - even if we did decide to keep the task send-daily-performance-platform-stats and remove the specific bits that relate to the performance platform, it's probably nicer to rewrite the new task from scratch to make sure it's all clear and easy to understand	2021-03-15 11:44:01 +00:00
Ben Thorner	a91fde2fda	Run auto-correct on app/ and tests/	2021-03-12 11:45:45 +00:00
Richard Baker	2e4ac1f09c	Enable EE Cell Broadcasts in production environment Removes the configuration override for Live, so the base configuration is used, enabling cell broadcasting for all MNOs. Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>	2021-03-02 09:34:35 +00:00
Rebecca Law	3df334d099	Simplify config and add json loads	2021-02-26 12:19:03 +00:00
Rebecca Law	acfb759cb9	Change DVLA_EMAIL_ADDRESS to a list	2021-02-26 11:21:16 +00:00
Pea Tyczynska	f3e0cfc727	Pull DVLA address from credentials on staging So that we can test this flow on staging.	2021-02-24 11:34:29 +00:00
Pea Tyczynska	5c22c926b0	Stub DVLA email for all envs except prod In prod we will get it from Credentials. In other envs, we don't really want to send real email.	2021-02-23 15:13:52 +00:00
Pea Tyczynska	e0c73ac342	Send daily email with letter and sheet volumes to DVLA	2021-02-23 15:13:19 +00:00
David McDonald	c03ad82227	Turn on o2 and three mnos in prod Supporting infrastructure is ready for these two mnos	2021-02-23 13:56:47 +00:00
Katie Smith	c59e0091ee	Stop emailing Notify when an MOU is signed We've decided we don't get any value from these emails any more, so this stops us (Notify support) receiving them. We still let teams know an MOU has been signed.	2021-02-18 09:07:19 +00:00
Rebecca Law	77b76ea0a4	Rename variable, it's a better name now.	2021-02-17 13:15:29 +00:00
Rebecca Law	e77534fb17	Send text message that are to an international number from a number rather than "Notify" Update `send_user_2fa_code` to send from number when recipient is international Update `update_user_attribute` to send from number when recipient is international	2021-02-17 12:14:47 +00:00
David McDonald	75f8db19eb	Merge pull request #3120 from alphagov/update-service-broadcast-settings Update service broadcast settings	2021-02-16 14:50:18 +00:00
Leo Hemsted	bbab7437f4	flake8	2021-02-16 12:23:02 +00:00
David McDonald	9f4b82f074	Make service a member of the broadcast organisation We will use this to easily identify all our broadcast services. There could be other ways to deal with finding and seeing all broadcast services but this is a good and easy way to start.	2021-02-16 10:31:06 +00:00
Leo Hemsted	3e82691818	enable cell broadcast on prod for vodafone only nb: will need cbc aws key to be set in credentials before deploy	2021-02-15 17:38:50 +00:00
David McDonald	a1e539e785	Merge pull request #3132 from alphagov/created-letters-runbook Improvements to our letter checking tasks	2021-02-12 16:30:42 +00:00
David McDonald	5526c89c34	Rename task and function for clarity This doesn't just relate to precompiled letters, it's actually just checking that there are not any letters still waiting for a virus check that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:53 +00:00
David McDonald	1b9d8252ec	Rename task and function for clarity This doesn't just relate to templated letters, it's actually just checking that there are not any letters still in created that should not be. This change to the naming makes it more accurate and therefore easy to understand	2021-02-10 15:23:52 +00:00
Katie Smith	5eebcf6452	Put service callback retries on a different queue At the moment, if a service callback fails, it will get put on the retry queue. This causes a potential problem though: If a service's callback server goes down, we may generate a lot of retries and this may then put a lot of items on the retry queue. The retry queue is also responsible for other important parts of Notify such as retrying message delivery and we don't want a service's callback server going down to have an impact on the rest of Notify. Putting the retries on a different queue means that tasks get processed faster than if they were put back on the same 'service-callbacks' queue.	2021-02-09 13:31:16 +00:00

1 2 3 4 5 ...

343 Commits