Commit Graph

82 Commits

Author SHA1 Message Date
Ben Thorner
d2784d0d8a Rename "parents" methods to "ancestors"
Resolves: https://github.com/alphagov/notifications-admin/pull/3980#discussion_r694002952

A grandparent is not a parent, so the return value of these methods
were misleading. This makes it clearer.
2021-08-23 16:50:18 +01:00
Ben Thorner
1923c5edb1 Remove redundant 'filter' and return value
'None' is the implicit return value. Since the filter was operating
on a yield that never yield 'None', it was redundant.
2021-08-23 16:35:38 +01:00
Chris Hill-Scott
b273037462 Use str.join to build query
This avoids the nasty slice operator to trim the trailing comma.
2021-08-06 13:28:41 +01:00
Chris Hill-Scott
de364bba3c Make overlapping_areas a cached property
It’s quite expensive to calculate and there’s no guarantee we’ll only use it once.
2021-08-06 13:28:41 +01:00
Chris Hill-Scott
5e1b96a3a7 Remove argument unpacking from get_areas
Making it only callable in one way is just less stuff to understand.
2021-08-06 13:28:40 +01:00
Chris Hill-Scott
775954da9d Avoid doing a single SQL query per overlapping area
To count phones in a custom polygon we need to work out the percentage
of overlap with each known area. This means we need to get each known
area from the database to compare it.

At the moment we do this by running:
- one SQLite query to get the details of all matching areas
- a loop, which performs one SQLite query *per area* to get the polygons

This commit reduces the number of SQLite queries to one, which uses a
`JOIN` to get both the details of the areas and their polygons.

This gives a speed increase of about 25% for a big area like
Lincolnshire.
2021-08-06 13:28:40 +01:00
Chris Hill-Scott
e7ec77c5bb Make calculating overlapping areas faster
By using the simplified polygons instead of the full resolutions ones
we:
- query less data from SQLite
- pass less data around
- give Shapely a less complicated shape to do its calculations on

This makes it faster to calculate how much of each electoral ward a
custom area overlaps.

For the two areas in our tests:

Place represented by custom area | Before | After
---------------------------------|--------|--------
Bristol                          | 0.07s  | 0.02s
Skye                             | 0.02s  | 0.01s
2021-08-06 13:28:40 +01:00
Ben Thorner
297ab3e5ae Rename demo area to match govuk-alerts
Relates to: https://github.com/alphagov/notifications-govuk-alerts/pull/152

I ran the "create-broadcast-areas-db.py" script to regenerate the
Sqlite DB. Existing alerts with the old naming still appear correctly,
and since we don't (yet) store this text in the DB, there's nothing
more to update.
2021-08-02 15:34:55 +01:00
Chris Hill-Scott
a766324559 Make the max polygon point count a constant
And document it in context.
2021-07-06 17:00:51 +01:00
Chris Hill-Scott
e4ca78634d Bump utils to bring in new polygon simplification
We’ve changed our simplification a bit so:
- polygons have slightly more points (see https://github.com/alphagov/notifications-utils/pull/873)
- the individual points have less precision (see https://github.com/alphagov/notifications-utils/pull/872)

Overall this reduces the size of the data we’re storing from 74MB to
63MB, and should make any pages where we are rendering lots of
coordinates load a bit quicker.
2021-07-06 17:00:50 +01:00
Chris Hill-Scott
5a378fe51f Use CustomBroadcastArea to estimate phones in bleed area
Our current assumption is that the bleed area has the same population
density as the broadcast area.

This is particularly naïve when:
- the bleed area overlaps the sea – no-one lives in the sea
- the broadcast area is a village and the bleed area is the surrounding
  countryside
- the broadcast area is adjacent to a densely populated area like a city

We can be smarter about this now that we have a way of determining the
number of phones in an arbitrary area, based on the known areas that we
have population data about.

Calculating the population in an overlap is a slightly more intensive
calculation. So we only doing it for areas which are smaller enough that
it doesn’t slow things down too much. For larger areas we still use the
more naïve algorithm.
2021-07-02 10:36:25 +01:00
Chris Hill-Scott
b47d04fbf6 Check that the simplification process hasn’t introduced bad data
This is a good bit of future proofing against unintended mistakes in the
simplification code.
2021-06-24 18:28:33 +01:00
Chris Hill-Scott
72cdad14d9 Run app/broadcast_areas/create-broadcast-areas-db.py 2021-06-24 18:28:33 +01:00
Chris Hill-Scott
779ac74fc7 Manually remove a coordinate from Bathavon South
This is the only way I can think to stop this shape self-intersecting
without drastically changing its area (i.e. filling the hole in the
donut).

This is the only area in our library which is a genuine donut and
presents this problem
2021-06-24 18:28:21 +01:00
Chris Hill-Scott
62a2c524ab Fix invalid polygons while importing geographic data
Some of the polygons in our source data are invalid. An invalid polygon
is one that self intersects, in other words has a point which causes
the boundary of the shape to cross itself.

This doesn’t cause an exception until we try to perform certain
operations on one of these polygons, like intersecting them with another
polygon. This is why we haven’t spotted that they are invalid until now.

This commit adds checks so that as we import the polygons we make sure
they are valid.

If they are not valid, we can automatically fix them by just looking at
the exterior boundary of the shape, and ignore any holes created by
self intersection.
2021-06-24 18:10:50 +01:00
Ben Thorner
fba8d09875 Move broadcast model code into an explicit module
Previously this was hidden away in an anonymous __init__.py file.
I did think about splitting the models into individual files, like
we do with the top-level models for the app. Since the models are
only imported in one place - i.e. are all used together - it didn't
seem worth the hassle, so I've kept them in one file.
2021-06-10 15:05:38 +01:00
Chris Hill-Scott
c9611e1cf7 Add another area to the library of test polygons 2021-05-10 16:09:02 +01:00
sakisv
bfa8dfe95e Fix import order 2021-04-13 16:31:06 +03:00
Chris Hill-Scott
e7aad61220 Use pure Python Rtree library
The Python rtree library we are using to build RTrees has a dependency
on the C package libspatialindex. This package is not installed on PaaS,
so it’s hard for us to use it.

This commit changes the code to use a library called rtreelib instead.

rtreelib doesn’t have a built in way to serialise the index it builds,
so I’ve had to implement that using pickle.
2021-04-13 12:43:28 +01:00
Chris Hill-Scott
83c521915c Estimate number of phones in an arbitrary polygon
We want to know how many phones are in a user-supplied polygon, so we
can show the impact of a broadcast, in the same way that we do when
users pick areas from our library.

We already know how many phones are in each electoral ward. But there
are challenges with an arbitrary polygon:
- where it does overlap a ward, the overlap could be partial
- it could overlap more than one ward
- finding out which wards it overlaps by brute force (looping through
  all the wards and seeing which ones intersect with our polygon) would
  be way to slow to do in real time

Instead we can use a data structure called an R-tree[1] to build an
index which provides a much, much faster way of looking up which
polygons overlap another. We can build this tree in advance and save it
somewhere, which means there’s a lot of computation we don’t need to do
in real time.

The R-tree returns a set of objects (ward IDs) which we can go and look
up in our library of electoral wards. These wards will be the ones that
might have some overlap with our custom polygon.

Once we have this small set of wards which might overlap our ward, we
can look at the size of the area of overlap (relative to the size of the
whole ward) and multiply that by the known count of phones in that ward
to get an approximation of the count of phones in the overlap area.
Summing these approximations give an estimate for the whole area of the
custom polygon.

1. https://en.wikipedia.org/wiki/R-tree
2021-04-12 15:45:48 +01:00
Richard Baker
02600d76bd Create additional non-UK broadcast test polygons
This allows MNOs to test delivery to multiple non-adjacent cells without
risk of sending a broadcast on the public network. This will also support
testing of multiple polygon geometries in a single message.

Test polygons are all non-UK (northern Finland).

Signed-off-by: Richard Baker <richard.baker@digital.cabinet-office.gov.uk>
2021-03-31 10:00:39 +01:00
Chris Hill-Scott
fc75d60f65 Refactor BroadcastAreas to reuse common methods
This commit makes an abstract base class for broadcast areas, so that
methods and properties which are common between `BroadcastArea`s (those
which come from our library) and `CustomBroadcastArea`s (those supplied
via the API) can be shared.
2021-03-22 11:07:43 +00:00
Chris Hill-Scott
57aa994ce9 Add docstring 2021-03-19 15:47:18 +00:00
Chris Hill-Scott
a74db6eaa7 Handle areas which don’t have population data
If an area has a `count_of_phones` value of `0` it means we don’t have
data about the population.

This means we can’t do the maths to work out the estimated bleed. So we
should return the default amount of bleed of 1,500m instead, which is
something in between what we’d expect for a built up area and a rural
area.
2021-03-19 15:47:18 +00:00
Chris Hill-Scott
4367908269 Add limits to max/min bleed
This prevents us from giving unrealistically large or small bleed
estimates in case we have areas which are more dense or less dense than
the most/least dense areas we currently have.

Also means we don’t have to treat City of London as a special case.
2021-03-19 15:47:18 +00:00
Chris Hill-Scott
738ac1d818 Vary bleed amount based on population density
There are basically two kinds of 4G masts:

Frequency | Range       | Bandwidth
----------|-------------|----------------------------------
800MHz    | Long (500m) | Low (can handle a bit of traffic)
1800Mhz   | Short (5km) | High (can handle lots of traffic)

The 1800Mhz masts are better in terms of how much traffic they can
handle and how fast a connection they provide. But because they have
quite short range, it’s only economical to install them in very built up
areas†.

In more rural areas the 800MHz masts are better because they cover a
wider area, and have enough bandwidth for the lower population density.

The net effect of this is that cell broadcasts in rural areas are likely
to bleed further, because the masts they are being broadcast from are
less precise.

We can use population density as a proxy for how likely it is to be
covered by 1800Mhz masts, and therefore how much bleed we should expect.
So this commit varies the amount of bleed shown based on the population
density.

I came up with the formula based on 3 fixed points:
- The most remote areas (for example the Scottish Highlands) should have
  the highest average bleed, estimated at 5km
- An town, like Crewe, should have about the same bleed as we were
  estimating before (1.5km) – Pete D thinks this is about right based on
  his knowledge of the area around his office in Crewe
- The most built up areas, like London boroughs, could have as little as
  500m of bleed

Based on these three figures I came up with the following formula, which
roughly gives the right bleed distance (`b`) for each of their population
densities (`d`):
```
b = 5900 - (log10(d) × 1_250)
```

Plotted on a curve it looks like this:

This is based on averages – remember that the UI shows where is _likely_
to receive the alert, based on bleed, not where it’s _possible_ to
receive the alert.

Here’s what it looks like on the map:

---

†There are some additional subtleties which make this not strictly true:
- The 800Mhz masts are also used in built up areas to fill in the gaps
  between the areas covered by the 1800Mhz masts
- Switching between masts is inefficient, so if you’re moving fast
  through a built up area (for example on a train) your phone will only
  use the 800MHz masts so that you have to handoff from one mast to
  another less often
2021-03-18 09:37:23 +00:00
David McDonald
3e80ba4734 Fix flake8 and isort errors
Note, isort now has default behaviour of searching recursively so we no
longer need the `-rc` flag
2021-03-08 18:48:56 +00:00
Chris Hill-Scott
f55a8bf4b8 Add library of test areas
This is a temporary addition so we can test out some functionality.
2021-02-19 11:35:51 +00:00
Chris Hill-Scott
769b85ff25 Replace polygons module with the one from utils
We moved it in https://github.com/alphagov/notifications-utils/pull/818/files
2021-02-12 14:52:53 +00:00
Chris Hill-Scott
60aa2d2b42 Display areas that aren’t in the library 2021-01-26 10:49:47 +00:00
Chris Hill-Scott
76f83f7d2a Merge pull request #3652 from alphagov/updated-bristol-boundaries
Update local authority district GeoJSON to bring in fixes for Bristol
2020-09-29 13:32:32 +01:00
Chris Hill-Scott
04e53c72b3 Update shapes to bring in fixes for Bristol
I emailed the Geography team at the ONS:

> Hi geography team,
>
> I work on GOV.UK Notify, which is a service run by Government Digital Service (part of the Cabinet Office). I was given your email address by [redacted] who’s been helping answer some of my questions on the cross-government Slack.
>
> We’re using some of the boundary datasets from the Open Geography Portal, and mostly they’ve been excellent.
>
> In the abstract, the problem we’re trying to solve is, given a point outside an area, what is the minimum distance to a point within that area. So, for example, if a crow was somewhere in Cardiff, what’s the shortest distance it would have to fly to reach somewhere in the Bristol local authority district?
>
> We’ve noticed some problems with the data that means our calculations would be wrong. We’ve noticed this around Torquay, Norwich and Bristol. Here are some screenshots of Bristol, from the generalised and full resolution boundaries:
>
> The artefacts I’ve highlighted are closer to Cardiff than any actual part of the land area of Bristol. They are either:
> - in the sea
> - land that’s part of North Somerset
>
> I suspect that this is being caused by the process of clipping the actual region of Bristol (which, unusually, extends into the water) to the mean high water line.
>
> I’ve worked around this by filtering out any polygons that are smaller than ~7,500m². It’s a bit hacky because parts of the Scilly Isles start disappearing. That’s not a problem for what I’m working on, but it would be nice to not need the hack.
>
> So my questions would be:
>
> - Is there a better way to remove these artefacts than filtering by area?
> - Is there a plan to remove these artefacts from the data in future releases?
>
> Thanks in advance,
> Chris

They emailed back to say:

> Hi Chris
>
> Thank you for your enquiry.
>
> We  have completed the amendments to the LAD MAY 2020 BFC and BGC boundaries as mentioned so you should be able to download them from the portal now.
>
> Hope this helps.
>
> Kind regards
> [redacted]

This commit brings in the files they’ve updated. We still have to do
some filtering (but now at a higher resolution) because they haven’t
fixed Norwich yet. I’ll email them  separately about that.
2020-09-25 12:24:23 +01:00
Chris Hill-Scott
e7169ad902 Add instructions for converting Shapefiles 2020-09-24 13:19:27 +01:00
Chris Hill-Scott
f50ef84c0d Suggest previously-used areas when adding new area
If you’re adding another area to your broadcast it’s likely to be close
to one of the areas you’ve already added.

But we make you start by choosing a library, then you have to find the
local authority again from the long list. This is clunky, and it
interrupts the task the user is trying to complete.

We thought about redirecting you somewhere deep into the hierarchy,
perhaps by sending you to either:
- the parent of the last area you’d chosen
- the common ancestor of all the areas you’d chosen

This approach would however mean you’d need a way to navigate back up
the hierarchy if we’d dropped you in the wrong place. And we don’t have
a pattern for that at the moment.

So instead this commit adds some ‘shortcuts’ to the chose library page,
giving you a choice of all the parents of the areas you’ve currently
selected. In most cases this will be one (unitary authority) or two
(county and district) choices, but it will scale to adding areas from
multiple different authorities.

It does mean an extra click compared to the redirect approach, but this
is still fewer, easier clicks compared to now.

This meant a couple of under-the-hood changes:
- making `BroadcastArea`s hashable so it’s possible to do
  `set([BroadcastArea(…), BroadcastArea(…), BroadcastArea(…)])`
- making `BroadcastArea`s aware of which library they live in, so we can
  link to the correct _Choose area_ page
2020-09-22 17:33:04 +01:00
Chris Hill-Scott
dd8ce7d5bd Merge pull request #3631 from alphagov/delete-plot-areas
Delete plot-areas.py
2020-09-17 11:41:16 +01:00
Chris Hill-Scott
8a413bec91 Merge pull request #3617 from alphagov/population-estimates
Give estimates of the number of phones in a broadcast area
2020-09-17 11:41:00 +01:00
Chris Hill-Scott
76244d8c07 Handle areas with missing data
At the moment there are some areas which have:
- a `count_of_phones` value of `None`
- no sub-areas

This is wrong, but until we fix the data the phone counting code needs
to handle this.

This commit:
- adds the `or 0` in the right place (where it will catch these areas
  with missing data)
- adds a test which checks these areas, and compares them to other kinds
  of areas
2020-09-17 11:02:22 +01:00
Chris Hill-Scott
49195cb0d3 Rename constants to populations
This is a better name for the module because it’s:
- not just constants, there’s a method in here now
- only stuff to do with populations, not other kinds of constants
2020-09-16 14:45:45 +01:00
Chris Hill-Scott
3047af2c13 Refactor to make testing easier 2020-09-16 11:33:57 +01:00
Chris Hill-Scott
b9f75218d1 Add tests to ensure all areas have a count 2020-09-16 11:20:22 +01:00
Chris Hill-Scott
6b3fe3c5c5 Delete plot-areas.py
We don’t need this now that the admin app can show areas while running locally.
2020-09-16 09:11:01 +01:00
Chris Hill-Scott
ce35200453 Rename variable to be clearer
Better name than `population`, and
`smartphone_ownership_for_area_by_age_range` matches with
`SMARTPHONE_OWNERSHIP_BY_AGE_RANGE`
2020-09-16 08:46:59 +01:00
Leo Hemsted
c2e737b323 Merge pull request #3618 from alphagov/fix-broadcast-area-count
generate library summary in python
2020-09-14 16:47:36 +01:00
Chris Hill-Scott
8ea3f0141c Give estimates of the number of phones in a broadcast area
We need to give people a better feel for the consequences of
broadcasting an alert. We’ve seen in research that some users will
assume it is subscription based, or opt-in, rather than going to every
phone in the area.

I reckon that the most effective way to communicate this is to put some
numbers next to the areas, to give people an idea of how many people
will get alerted.

We can estimate how many phones are in an area by:
- taking the population of all electoral wards in that area
- multiplying it by the percentage of people who own an internet
  connected phone[1]

The Office for National Statistics publish both these datasets.

The number of people who own an intenet connected phone varies a lot by
age. Since the population data for each ward is broken down by age we
can factor this in. Simplified, the calculation looks like this:
- take the _Abbey_ ward of _Barking and Dagenham_
- in this ward there are 26 people aged 80
- 40% of people over 65 have an internet-connected phone
- therefore 10 of these 80-year-olds would be likely to receive a
  broadcast
- (repeat for all other ages)

These numbers won’t be exact, but should be enough to give people a feel
for the severity of what they’re about to do. We can see if they acheive
this aim in user research.

1. This is a proxy for the number of people who are likely to have a 4G
   capable phone, because only 4G capable phones will be receiving
   broadcasts to begin with
2020-09-14 16:26:09 +01:00
Leo Hemsted
ef0564f046 generate library summary in python
much simpler than sqlite.

also remove oxford commas

Co-authored-by: Chris Hill-Scott <me@quis.cc>
2020-09-14 15:25:04 +01:00
Chris Hill-Scott
858d1ee197 Increase threshold for minimum polygon size
We filter out very small polygons from the original data to remove
glitches. These glitches are caused by trying to subtract the water from
a polygon that includes some land and some water, but using two
different definitions or resolutions of mean high water line.

If we don’t do this then we end up with a bunch of very small polygons
which lie far outside the understood area of a place, causing large
overspill.

We need to increase the threshold for this process because we’re still
seeing this problem around Bristol and Norwich.

This does mean we lose a few very small polygons in places like Shetland
and the Scilly Isles, but not in such a way that we would avoid
broadcasting to them (because they’d still be caught by the
simplification and overspill).
2020-09-14 11:32:02 +01:00
Chris Hill-Scott
5e579ed45c Merge pull request #3595 from alphagov/map-key
Add a key to the map
2020-09-09 16:03:27 +01:00
Leo Hemsted
d654323eb8 remove unused fn 2020-09-09 14:39:13 +01:00
Leo Hemsted
9e132263d2 make tests pass (acknowledge that code is wrong)
i really don't want to fix this right now but that total isn't quite right
2020-09-09 14:39:13 +01:00
Leo Hemsted
bc7d3710ab make sure countries library still returns values
to recap the previous commit, in the ward->local authority->county
library we want to return all local authorities and counties. We do this
by excluding anything that doesn't have children.

However, in the countries library, all four countries don't have
children.

I can't think of a generic way to separate these so just filter on the
library id
2020-09-09 14:39:13 +01:00