ggplot(uk_census_2021_religion_wmids, aes(x = key, y = value)) +
-geom_bar(stat = "identity")
ggplot(uk_census_2021_religion_wmids, aes(x = key, y = value)) + geom_bar(stat = "identity")
- 2 @@ -670,6 +669,7 @@ div.csl-indent {
If you’re looking closely, you will notice that I’ve added two elements to our previous ggplot. I’ve asked ggplot to fill in the columns with reference to the dataset
column we’ve just created. Then I’ve also asked ggplot to alter the position="dodge"
which places bars side by side rather than stacked on top of one another. You can give it a try without this instruction to see how this works. We will use stacked bars in a later chapter, so remember this feature.
If you inspect our chart, you can see that we’re getting closer, but it’s not really that helpful to compare the totals. What we need to do is get percentages that can be compared side by side. This is easy to do using another dplyr
feature mutate
:
<- uk_census_2021_religion_totals %>%
@@ -748,29 +748,32 @@ div.csl-indent {
uk_census_2021_religion_totals
2.6 Multifactor Visualisation
One element of R data analysis of census datasets that can get really interesting is working with multiple variables. Above we’ve looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, we’ve already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantitative variable (also known as bivariate data when you have two variables) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Let’s have a look at the way that religion in England and Wales breaks down by ethnicity.
For the UK, census data is made available for programmatic research like this via an organisation called NOMIS. Luckily for us, there is an R library you can use to access nomis directly which greatly simplifies the process of pulling data down from the platform. It’s worth noting that if you’re not in the UK, there are similar options for other countries. Nearly every R textbook I’ve ever seen works with USA census data, so you’ll find plenty of documentation available on the tools you can use for US Census data. Similarly for the EU, Canada, Austrailia etc.
If you want to draw some data from the nomis platform yourself in R, have a look at the nomis script in our companion cookbook repository. For now, we’ll provide some data extracts for you to use.
Let’s start by loading in some of the enhanced tables from nomis with the 2021 religion / ethnicity tables:
<- readRDS(file = (here("example_data", "nomis_extract_census2021.rds"))) nomis_extract_census2021
I’m hoping that readers of this book will feel free to pause along the way and “hack” the code to explore questions of their own, perhaps in this case probing the NOMIS data for answers to their own questions. If I tidy things up too much, however, you’re likely to be surprised when you get to the real life data sets. So that you can use the code in this book in a reproducible way, I’ve started this exercise with what is a more or less raw dump from NOMIS. This means that the data is a bit messy and needs to be filtered down quite a bit so that it only includes the basic stuff that we’d like to examine for this particular question. The upside of this is that you can modify this code to draw in different columns etc.
-1<- select(nomis_extract_census2021, GEOGRAPHY_NAME, C2021_RELIGION_10_NAME, C2021_ETH_8_NAME, OBS_VALUE)
uk_census_2021_religion_ethnicity 2<- filter(uk_census_2021_religion_ethnicity, GEOGRAPHY_NAME=="England and Wales" & C2021_RELIGION_10_NAME != "Total" & C2021_ETH_8_NAME != "Total")
uk_census_2021_religion_ethnicity 3<- filter(uk_census_2021_religion_ethnicity, C2021_ETH_8_NAME != "White: English, Welsh, Scottish, Northern Irish or British" & C2021_ETH_8_NAME != "White: Irish" & C2021_ETH_8_NAME != "White: Gypsy or Irish Traveller, Roma or Other White")
@@ -795,16 +798,32 @@ What is Nomis?
uk_census_2021_religion_ethnicity
The trouble with using grouped bars here, as you can see, is that there are quite sharp disparities which make it hard to compare in meaningful ways. We could use logarithmic rather than linear scaling as an option, but this is hard for many general public audiences to appreciate without guidance. One alternative quick fix is to extract data from “white” respondents which can then be placed in a separate chart with a different scale.
-Content TBD.
+1<- filter(uk_census_2021_religion_ethnicity, C2021_ETH_8_NAME == "White")
uk_census_2021_religion_ethnicity_white 2<- filter(uk_census_2021_religion_ethnicity, C2021_ETH_8_NAME != "White")
uk_census_2021_religion_ethnicity_nonwhite 3ggplot(uk_census_2021_religion_ethnicity_nonwhite, aes(fill=C2021_ETH_8_NAME, x=C2021_RELIGION_10_NAME, y=OBS_VALUE)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the 2021 Census of England and Wales") + xlab("") + ylab("") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
As you’ll notice, this is a bit better, but this still doesn’t quite render with as much visual clarity and communication as I’d like. For a better look, we can use a technique in R called “faceting” to create a series of small charts which can be viewed alongside one another. This is just intended to whet you appetite for facetted plots, so I won’t break down all the separate elements in great detail as there are other guides which will walk you through the full details of how to use this technique if you want to do a deep dive. For now, you’ll want to observe that we’ve augmented the ggplot
with a new element called facet_wrap
which takes the ethnicity data column as the basis for rendering separate charts.
ggplot(uk_census_2021_religion_ethnicity_nonwhite, aes(x=C2021_RELIGION_10_NAME, y=OBS_VALUE)) + geom_bar(position="dodge", stat ="identity", colour = "black") + facet_wrap(~C2021_ETH_8_NAME, ncol = 2) + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the 2021 Census of England and Wales") + xlab("") + ylab("") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))