fixing v3 code and adding normalisation section

This commit is contained in:
Jeremy Kidwell 2024-02-27 12:09:10 +00:00
parent 7997afed6f
commit c00400a5a0

View file

@ -87,7 +87,7 @@ We could go a bit further with ggplot(), but for this chapter, we're going to pr
tm_shape(uk) + tm_borders()
```
In the example above shown in @figure-tmap1a you can see we've just added a polygon with a border. We can do something similar point data and dots as shown in @figure-tmap1b:
In the example above shown in @figure-tmap1a you can see we've just added a polygon with a border. We can do something similar with point data and dots as shown in @figure-tmap1b:
```{r, results = 'hide'}
#| label: figure-tmap1b
@ -96,10 +96,12 @@ In the example above shown in @figure-tmap1a you can see we've just added a poly
tm_shape(os_openmap_pow) + tm_dots()
```
From either of these basic starting points (or both), we stack on additional instructions, defining the different visual attributes or aesthetics, just like in `ggplot`. If you want to fill polygons with colour, we'll add `tm_fill` and if you want to adjust the default lines on your polygons, define this with `tm_borders` like we have in @figure-tmap2 below with an alpha and line width (lwd) instruction. We can also add more shapes on top with an additional `tm_shape` instruction and a follow-on `tm_borders` instruction. To add a bit of flourish, you can drop on a scale bar (`tm_scale_bar`) or share licensing information with prospective readers (I've done this below using `tm_credits`) and add a figure label or title.
From either of these basic starting points (or both), we stack on additional instructions, defining the different visual attributes or aesthetics, just like in `ggplot`. If you want to fill polygons with colour, we'll add `tm_fill` and if you want to adjust the default lines on your polygons, define this with `tm_borders` like we have in @figure-tmap2 below with an alpha and line width (lwd) instruction. We can also add more shapes on top with an additional `tm_shape` instruction and a follow-on `tm_borders` instruction. To add a bit of flourish, you can drop on a scale bar (`tm_scalebar`) or share licensing information with prospective readers and add a figure label or title.
Let's see how those layers get added on with an example (@figure-tmap2):
[You can read more about the various visual customisations available, ["here"](https://r-tmap.github.io/tmap/reference/tm_credits.html).]{.aside}
```{r}
#| label: figure-tmap2
#| fig-cap: "A GGPlot of UK Churches"
@ -108,16 +110,11 @@ tm_shape(uk) +
tm_borders(alpha=.5, lwd=0.1) +
tm_shape(local_authorities) +
tm_borders(lwd=0.6) +
tm_scale_bar(position = c("right", "bottom")) +
tm_style("gray") +
tm_credits("Data: UK Data Service (OGL)\n& Jeremy H. Kidwell,\nGraphic is CC-by-SA 4.0",
size = 0.4,
position = c("left", "bottom"),
just = c("left", "bottom"),
align = "left")
tm_scalebar(position = c("right", "bottom")) +
tm_style("gray")
```
That's a quick orientation to the kinds of visual elements we can produce with `tmap`.
That's a quick orientation to some of the kinds of visual elements we can produce with `tmap`.
Our next step here will be to add all the churches to our map, but there's a problem we need to address first, which is that there are a lot of churches in that dataset. As you may have noticed in @figure-churches above there are so many dots that some parts of the map are just blocks of grey. Let's have a look at how things are with `tmap`:
@ -151,11 +148,11 @@ length(os_openmap_pow$classification)
At nearly 50k points, we're going to need to find an alternative if we want to help someone visualise this data clearly. Lucky for us, we can use R to do some computation for us towards a different kind of map called a choropleth map. You'll probably already have seen many of these before without realising what they're called. Think of it as a kind of heatmap, like we used with our scatterplot before, except in this case the shapes that are being coloured in come from a set of polygons we specify. Our administrative map shape data is perfect for this kind of use.
```{r}
uk_rgn$churches_count <- lengths(st_covers(uk_rgn, os_openmap_pow))
uk_rgn$churches_percent <- prop.table(uk_rgn$churches_count)
uk_countries$churches_count <- lengths(st_covers(uk_countries, os_openmap_pow))
uk_countries$churches_percent <- prop.table(uk_countries$churches_count)
```
The sf() library has a host of tools for geospatial data analysis, including the st_covers() calculation which will filter a dataset based on whether points (or shapes) are located inside polygons from another dataset. I'll walk you through what we've done above. First, we want to add a new column with our totals to the administrative shapes. I've used lengths() to fill this column with a simple count of the number of items in a new dataset, which in turn consists of a simple calculation (using `st_covers`) of how many points (from `os_openmap_pow`) are inside each polygon inside that `uk_rgn` dataset. Sometimes it's nice to have percentages close to hand, so I've added another column for this `churches_percent` using the very handy `prop.table` command. We can do the same thing for any set of polygons, including our `local_authorities` data:
The sf() library has a host of tools for geospatial data analysis, including the st_covers() calculation which will filter a dataset based on whether points (or shapes) are located inside polygons from another dataset. I'll walk you through what we've done above. First, we want to add a new column with our totals to the administrative shapes. I've used lengths() to fill this column with a simple count of the number of items in a new dataset, which in turn consists of a simple calculation (using `st_covers`) of how many points (from `os_openmap_pow`) are inside each polygon inside that `uk_countries` dataset. Sometimes it's nice to have percentages close to hand, so I've added another column for this `churches_percent` using the very handy `prop.table` command. We can do the same thing for any set of polygons, including our `local_authorities` data:
```{r}
local_authorities$churches_count <- lengths(st_covers(local_authorities, os_openmap_pow))
@ -168,23 +165,15 @@ Now let's visualise this data using tmap, which (now that we have that new colum
#| label: figure-tmap5
#| fig-cap: "From dots to choropleth"
tm_shape(uk_rgn) +
tm_shape(uk_countries) +
tm_borders(alpha=.5, lwd=0.4) +
tm_fill(fill = "churches_count", title = "Concentration of churches", tm_scale(breaks = c(0, 30000, 40000, 50000)))
```
Now something strange happened here. We've lost Scotland and Wales! If you look at the legend, you'll see a clue which is that our counts start at 1000 rather than zero, so anything below that threshold in our map simply doesn't exist. This is a problem especially if we are aiming to tell the truth. A quick tweak can ensure that our visualisation
There are some issues here with normalising data, just like we've observed in previous chapters. However, normalising geospatial data is a bit more complex. For this section, it's worth asking: should we assume that the frequency of church buildings is the same as population data? Probably not, but even if we did, when should we draw that population data from, given many of these buildings were erected more than a century ago. A bit further down, we'll explore some potential ways to think about normalising buildings using a few different examples.
[You can read more about the various customisations available using tm_fill, ["here"](https://r-tmap.github.io/tmap/reference/tm_polygons.html).]{.aside}
```{r}
#| label: figure-tmap6
#| fig-cap: "From dots to choropleth"
tm_shape(uk_rgn) + tm_fill(fill = "churches_count", tm_scale_intervals(style = "pretty"))
```
We can do the same for our more granular local authorities data:
We can do the same for our more granular local authorities data and this is already a bit more comprehensible, showing not just that the concentration is England vs. Wales and Scotland, but actually some specific high-population regions:
```{r}
#| label: figure-tmap7
@ -192,14 +181,18 @@ We can do the same for our more granular local authorities data:
tm_shape(local_authorities) +
tm_borders(alpha=.5, lwd=0.4) +
tm_fill(col = "churches_count", title = "Concentration of churches")
tm_fill(fill = "churches_count", title = "Concentration of churches")
```
If we're looking for visual outliers, e.g. places where there are more or less of a feature than we might expect, we need to think carefully about the baseline that we're using to set that expectation. We can definitely normalise using population data, like this:
```{r}
```
But I wonder if it's more interesting to compare this type of building, e.g. a place of worship, to other kinds of buildings, like grocery stores or pubs. Let's draw in some data we can work with here:
```{r}
# subsetting ordnance survey openmap data for measuring clusters and proximity
os_openmap_important_buildings <- st_read(here("example_data", "os_openmap_important_buildings.gpkg"), quiet = TRUE)