We have a bunch of stuff for doing lat/long transformation in the
`BroadcastMessage` class. This is not a good separation of concerns, now
that we have a separate class for dealing with polygons and coordinates.
This commit does two things:
- uses our new polygon-simplifying library to process the polygons
before storing them, rather than processing them in real time
- stores only the polygons in the database, rather than the whole
GeoJSON feature, because we don’t need any of the other information
about the feature
Simplifying polygons means reducing the number of points used to render
them. This commit implements simplification such that, for any given
input polygons, the combined point count of the simplified polygons is
less than 100.
When simplifying the polygons we are trying to get the smallest number
of points while meeting these two rules:
1. No part of the area the user has chosen can be cut off
2. The area of the simplified polygon should be as small as possible
This commit introduces two techniques we weren’t using before:
1. Dilating and eroding the area to fill in concave details of the
shape, like inlets and harbours[1]
2. Making the simplification threshold proportionate to the perimeter of
all polygons, so bigger and crinklier polygons get more
simplification applied
It also shows the estimated bleed as a separate polygon. This lets us
make it bigger (so it’s more closer the the approximate bleed) without
having to send a bigger area to the CBC and compounding the amount of
actual bleed.
1. Inspired by this blog post about ‘removing the crinkley bits’ from
Vancouver Island:
http://blog.cleverelephant.ca/2010/11/removing-complexities.html
It’s been superceded by the ‘Local’ library (formerly ‘Electoral wards
in the United Kingdom’).
The latter is better because:
- it’s covers all 4 nations, not just England and Wales
- it has electoral wards as well as local authorities which group them,
so there’s more flexibility when choosing an area to broadcast to
We’ve observed people using ‘national’ and ‘local’ during user research.
It has less tongue-twisting ambiguity than county vs country.
But we think that maybe just getting rid of ‘counties’ is enough to
disambiguate them. So this commit just takes the ‘local’ concept.
This commit also gives the libraries and areas new IDs, which means if
we want to rename them in the future it won’t be a breaking change.
Broadcasting is not a precise technology, because:
- cell towers are directional
- their range varies depending on whether they are 2, 3, 4, or 5G
(the higher the bandwidth the shorter the range)
- in urban areas the towers are more densely packed, so a phone is
likely to have a greater choice of tower to connect to, and will
favour a closer one (which has a stronger signal)
- topography and even weather can affect the range of a tower
So it’s good for us to visually indicate that the broadcast is not as
precise as the boundaries of the area, because it gives the person
sending the message an indication of how the technology works.
At the same time we have a restriction on the number of polygons we
think and area can have, so we’ve done some work to make versions of
polygons which are simplified and buffered (see
https://github.com/alphagov/notifications-utils/pull/769 for context).
Serendipitously, the simplified and buffered polygons are larger and
smoother than the detailed polygons we’ve got from the GeoJSON files. So
they naturally give the impression of covering an area which is wider
and less precise.
So this commit takes those simple polygons and uses them to render the
blue fill. This makes the blue fill extend outside the black stroke,
which is still using the detailed polygons direct from the GeoJSON.
It made for a good early demo to show how we could have different
libraries, but we’d don’t think there’s a strong user need for being
able to broadcast to a region of England.
Regions also have the problem that:
- they are ambiguous – both England and Scotland have a region called
‘South east’
- Northern Ireland doesn’t have formal regions
This commit removes the regions library.
If a library has lots of items then the first 3 should be shown, with
a count of how many more there are, for a total of 4 list items:
> a, b, c, and 23 more
If the library only has 4 items then all 4 should be shown, with
consistent use of conjunction and Oxford comma[1]:
> a, b, c, and d
This keeps the lengths of the examples nice and consistent.
1. We use an Oxford comma because it helps disambiguate when an area
itself has a comma or ‘and’ in it, for example ‘Armagh City, Banbridge
and Craigavon’
When you click through to the page for a library you see the available
areas in alphabetical order. The examples given for each library should
match this.
The given examples should match the choices offered when you visit the
next page. The choices offered on the next page are either the areas
(when a library is not grouped) or the groups (when a library is
grouped).
This commit makes the examples match the choices by excluding sub-areas,
ie those that have a grouping ID.
Now that the data needed to create a `BroadcastArea` is pretty
lightweight because it doesn’t include the GeoJSON we can go back to
putting it in memory when we start up the app, to make the pages load
really fast.
Rough estimate for the size of this dataset:
> 10,000 areas
> Average length of area name = 20 characters
> Average length of area id = 20 characters
> Size of one area in bytes = 20 + 20 = 40
> Size of dataset = 40 * 10,000 = 400,000 bytes = 400kb
I think that even with good indexes, querying the area names from one
table is always going to be slow because there’s so much GeoJSON to scan
past.
This commit splits the data into two tables, one for the names and
grouping IDs and one for the blobs of GeoJSON. So for most pages the app
will never even be looking at the table where the GeoJSON is held.
I don’t know if this is a proper, normalised way of structuring the
data, but it does go brrr.
Rather than querying all the features whenever we look up area(s) let’s
only get them when we need them.
The features are really big blobs of data to pass around, so there’s a
significant performance gain to be had from doing this.