Commit Graph

7928 Commits

Author SHA1 Message Date
Rebecca Law
dda7f0d47f Revert "Improve sender task" 2021-02-16 10:19:53 +00:00
Katie Smith
8e91eccc94 Merge pull request #3140 from alphagov/fix-flake8
Fix flake8
2021-02-16 09:28:23 +00:00
Rebecca Law
681ad6db56 Merge pull request #3134 from alphagov/improve-sender-task
Improve sender task
2021-02-16 09:26:04 +00:00
Katie Smith
6b8ebb3421 Fix linting errors 2021-02-16 09:03:38 +00:00
Katie Smith
64ed50da17 Fix flake8 config
Having the `select` meant that flake8 was only checking the `B901` rule
and ignoring all others.
2021-02-16 09:02:45 +00:00
Rebecca Law
d67f9fcfd6 Use from_id_and_service_id method from SerialisedTemplate.
Minor updates as per requested from review
2021-02-15 12:41:50 +00:00
Leo Hemsted
fed0d4c40e Merge pull request #3137 from alphagov/revert-revert-revert
Bring back retry logic
2021-02-15 12:21:13 +00:00
Chris Hill-Scott
c83ffbe9b3 Merge pull request #3138 from alphagov/no-cancel-or-update-broadcast-api
Don’t accept cancel or update via broadcast API
2021-02-15 11:19:38 +00:00
Chris Hill-Scott
e8a79f5413 Don’t accept cancel or update via broadcast API
We don’t support these methods at the moment. Instead we were just
ignoring the `msgType` field, so issuing one of these commands would
cause a new alert to be broadcast 🙃

We might want to support `Cancel` in the future, but for now let’s
reject anything that isn’t `Alert` (CAP terminology for the initial
broadcast).
2021-02-15 09:32:33 +00:00
Leo Hemsted
62cf9f60a9 catch boto exceptions
these will happen if, for example, you have issues connecting to AWS or
permission issues.

Still failover if we get one of these exceptions, as I think it might be
possible to have a problem only related to one of the lambdas.
2021-02-12 19:48:32 +00:00
Leo Hemsted
baa0d3e5d5 allow self approval on development 2021-02-12 17:02:16 +00:00
David McDonald
a1e539e785 Merge pull request #3132 from alphagov/created-letters-runbook
Improvements to our letter checking tasks
2021-02-12 16:30:42 +00:00
Katie Smith
9b940637a5 Merge pull request #3135 from alphagov/cbc-proxy-refactor
Refactor the CBC proxy classes
2021-02-12 09:58:39 +00:00
Katie Smith
bc4a2005d4 Refactor the CBC proxy classes
By creating a new `CBCProxyOne2ManyClient` class for the three One2Many
clients to inherit from. These three clients are the same apart from the
`lambda_name` and the `failover_lambda_name`.
2021-02-11 16:56:15 +00:00
Rebecca Law
61af203ad6 User cache for service in send_to_provider methods.
This will remove a call to the db if the service exists in the cache.
2021-02-11 16:45:52 +00:00
Rebecca Law
200f8aad81 Use the cached template object.
By adding SerialisedTemplate we can avoid a database call for the template. This is useful when sending many many emails/sms for the same template/version.
2021-02-11 16:45:52 +00:00
Rebecca Law
2270832873 Remove validate_and_format for email and phone numbers, using normalised_to instead because that function has already happened when persisting the notificaiton.
Remove 2 extra select queries after the update and commit. Once a transaction is committed SQLAlchemy will query for the db model if referenced after a commit.
2021-02-11 16:45:46 +00:00
David McDonald
8f99da525d Merge pull request #3133 from alphagov/bump-migration-rev
Bump the migration revision ID by 1
2021-02-10 17:52:47 +00:00
David McDonald
35ffb048e5 Bump the migration revision ID by 1
Concourse had passed the tests before someone elses migration had beaten
me to getting merged. So I merged and got an error.

This should fix it
2021-02-10 17:40:56 +00:00
David McDonald
5f58f6c698 Merge pull request #3128 from alphagov/move-provider-restriction
Move provider restriction into broadcast service settings table
2021-02-10 17:32:11 +00:00
Katie Smith
4be3af5410 Merge pull request #3119 from alphagov/new-callbacks-q
Put service callback retries on a different queue
2021-02-10 15:56:46 +00:00
David McDonald
5526c89c34 Rename task and function for clarity
This doesn't just relate to precompiled letters, it's actually just
checking that there are not any letters still waiting for a virus check
that should not be. This change to the naming makes it more accurate
and therefore easy to understand
2021-02-10 15:23:53 +00:00
David McDonald
1b9d8252ec Rename task and function for clarity
This doesn't just relate to templated letters, it's actually just
checking that there are not any letters still in created that should not
be. This change to the naming makes it more accurate and therefore easy
to understand
2021-02-10 15:23:52 +00:00
David McDonald
3c0e609cc9 Add link to runbook for created letter alert
We've got the entry in the runbook, this will make it clear to go and
look at it.
2021-02-10 15:23:51 +00:00
Pea Tyczynska
df00e16100 Merge pull request #3127 from alphagov/stubbed-not-nullable
Make stubbed column on broadcast_message non-nullable
2021-02-10 14:02:17 +00:00
Rebecca Law
13fb92c92b Merge pull request #3130 from alphagov/update-utils-with-address-validation
Update notifications-utils version.
2021-02-10 10:49:30 +00:00
Rebecca Law
87cf3afdc9 Update notifications-utils version.
Postal address validation now includes `< >` in the invalid characters allowed at the start of an address line.
2021-02-10 10:26:00 +00:00
Leo Hemsted
4f89be6944 Revert "Merge pull request #3125 from alphagov/revert-retry"
This reverts commit 6b9a50beff, reversing
changes made to 33f93dfea2.
2021-02-09 17:01:04 +00:00
Pea Tyczynska
6e1c5a70c0 Make stubbed column on broadcast_message non-nullable
But first set any null values there to False.
2021-02-09 16:33:11 +00:00
David McDonald
b2213dad19 Move provider restriction into broadcast settings
This means we will have a much easier way of knowing what the settings
are for a broadcast service.

Note, we can just move data directly into the newer table as there is
nothing on the API or admin app that is putting data in the
`service_broadcast_provider_restriction` table, this was being done
manually for the few services that needed it.
2021-02-09 15:40:32 +00:00
Pea Tyczynska
c2191d3a0e Merge pull request #3129 from alphagov/v2-set-stubbed
Set broadcast message to stubbed when posting broadcast via API
2021-02-09 15:37:01 +00:00
Katie Smith
5eebcf6452 Put service callback retries on a different queue
At the moment, if a service callback fails, it will get put on the retry queue.
This causes a potential problem though:

If a service's callback server goes down, we may generate a lot of retries and
this may then put a lot of items on the retry queue. The retry queue is also
responsible for other important parts of Notify such as retrying message
delivery and we don't want a service's callback server going down to have an
impact on the rest of Notify.

Putting the retries on a different queue means that tasks get processed
faster than if they were put back on the same 'service-callbacks' queue.
2021-02-09 13:31:16 +00:00
Pea Tyczynska
3037bf5fff Set broadcast message to stubbed when posting broadcast via API 2021-02-09 10:41:36 +00:00
Pea Tyczynska
e0ddb5a39e Merge pull request #3126 from alphagov/fix-cryptography-build-problem
Pin cryptography to a version < 3.4
2021-02-08 17:25:44 +00:00
Pea Tyczynska
7cc8371c7f Pin cryptography to a version < 3.4
One of our dependencies has a dependency on cryptography, which has
recently released version 3.4.

This version introduced a circular import error
(pyca/cryptography#5756) which was fixed in
3.4.1.

However, 3.4.1 has a different error where it fails because it cannot
find a rust compiler.

The suggested
solutions are:

Install a newer version of pip which will install a pre-compiled
cryptography wheel OR
Have rust installed and available on our PATH so that it can be used
to build the package.
Since we can't change the buildpack's pip version and we cannot install
rust ourselves, the only we're left with is to avoid upgrading to 3.4 -
at least until PaaS updates their python buildpacks.
2021-02-08 17:05:46 +00:00
Pea Tyczynska
f8b4c9151c Merge pull request #3122 from alphagov/add-billing-details-orgs
Add billing details for organisation
2021-02-08 16:43:08 +00:00
Leo Hemsted
6b9a50beff Merge pull request #3125 from alphagov/revert-retry
Revert cell broadcast retry logic
2021-02-08 11:16:48 +00:00
Leo Hemsted
bee0059e53 Revert "Merge pull request #3101 from alphagov/retry-broadcasts"
This reverts commit 1bd99c779d, reversing
changes made to d390eb2cac.
2021-02-08 11:02:34 +00:00
Leo Hemsted
49e6ec1ead Revert "Merge pull request #3123 from alphagov/retry-loop-fix"
This reverts commit 541a765811, reversing
changes made to 6a9ac654a6.
2021-02-08 11:01:33 +00:00
Pea Tyczynska
bbc8cffb5b No need to alter the columns in services, they are already the right type 2021-02-08 10:45:28 +00:00
Chris Hill-Scott
33f93dfea2 Merge pull request #3124 from alphagov/handle-xml-with-declaration
Handle XML files that have a declaration
2021-02-08 10:29:11 +00:00
Chris Hill-Scott
dec16a98f6 Handle XML files that have a declaration
`lxml` wants its input in bytes:

> XML is explicitly defined as a stream of bytes. It's not Unicode text.
> […] rule number one: do not decode your XML data yourself.

– https://lxml.de/FAQ.html#why-can-t-lxml-parse-my-xml-from-unicode-strings

It will accept strings unless, unless the document contains a
declaration[1] with an `encoding` attribute. Then it will refuse to
parse the document and raises a `ValueError`[2].

We can get fix this by passing `lxml` the bytes from the request, rather
than the decoded text.

1. > XML documents may begin with an XML declaration that describes some
   > information about themselves. An example is
   > `<?xml version="1.0" encoding="UTF-8"?>`.
   – https://en.wikipedia.org/wiki/XML#XML_declaration
2. See an example of this exception being raised in production here:
   https://kibana.logit.io/s/9423a789-282c-4113-908d-0be3b1bc9d1d/app/kibana#/doc/logstash-*/logstash-2021.02.05/syslog?id=AXdzfZVz5ZSa5DKpJiYd&_g=()
2021-02-08 08:51:14 +00:00
Leo Hemsted
541a765811 Merge pull request #3123 from alphagov/retry-loop-fix
broadcast event retry loop fix
2021-02-05 14:56:45 +00:00
Pea Tyczynska
aa7bc3d9b4 Serialise org notes and billing details 2021-02-05 14:44:43 +00:00
Pea Tyczynska
02bc87c096 Add billing details and notes for organisation table 2021-02-05 14:44:42 +00:00
Leo Hemsted
d582e35471 dont try and send broadccast event if it's already in technical-failure
this gives us an option to manually set a status in the database and
avoid things being stuck in a retry loop forever
2021-02-05 12:52:37 +00:00
Leo Hemsted
0ddebc63a8 reduce broadcast retry delay to 4 mins and drop prefetch.
### The facts

* Celery grabs up to 10 tasks from an SQS queue by default
* Each broadcast task takes a couple of seconds to execute, or double
  that if it has to go to the failover proxy
* Broadcast tasks delay retry exponentially, up to 300 seconds.
* Tasks are acknowledged when celery starts executing them.
* If a task is not acknowledged before its visibility timeout of 310
  seconds, sqs assumes the celery app has died, and puts it back on the
  queue.

### The situation

A task stuck in a retry loop was reaching its visbility timeout, and as
such SQS was duplicating it. We're unsure of the exact cause of reaching
its visibility timeout, but there were two contributing factors: The
celery prefetch and the delay of 300 seconds. Essentially, celery grabs
the task, keeps an eye on it locally while waiting for the delay ETA to
come round, then gives the task to a worker to do. However, that worker
might already have up to ten tasks that it's grabbed from SQS. This
means the worker only has 10 seconds to get through all those tasks and
start working on the delayed task, before SQS moves the task back into
available.

(Note that the delay of 300 seconds is translated into a timestamp based
on the time you called self.retry and put the task back on the queue.
Whereas the visibility timeout starts ticking from the time that a
celery worker picked up the task.)

### The fix

#### Set the max retry delay for broadcast tasks to 240 seconds

Setting the max delay to 240 seconds means that instead of a 10 second
buffer before the visibility timeout is tripped, we've got a 70 second
buffer.

#### Set the prefetch limit to 1 for broadcast workers

This means that each worker will have up to 1 currently executing task,
and 1 task pending execution. If it has these, it won't grab any more
off the queue, so they can sit there without their visibility timeout
ticking up.

Setting a prefetch limit to 1 will result in more queries to SQS and a
lower throughput. This might be relevant in, eg, sending emails. But the
broadcast worker is not hyper-time critical.

https://docs.celeryproject.org/en/3.1/getting-started/brokers/sqs.html?highlight=acknowledge#caveats
https://docs.celeryproject.org/en/3.1/userguide/optimizing.html?highlight=prefetch#reserve-one-task-at-a-time
2021-02-05 12:49:51 +00:00
Leo Hemsted
6a9ac654a6 Merge pull request #3121 from alphagov/dont-update-finishes-at
dont update finishes_at when cancelling broadcast
2021-02-04 14:37:49 +00:00
Leo Hemsted
eff0119f5c dont update finishes at when cancelling broadcast
otherwise we run into issues where we dont issue the cancel as we say
"oh look the expiry time just passed, so we shouldnt send this message
as it's already been removed from the cbc".
2021-02-04 14:25:38 +00:00
Leo Hemsted
1bd99c779d Merge pull request #3101 from alphagov/retry-broadcasts
retry sending broadcasts
2021-02-04 12:01:06 +00:00