@@ -578,7 +579,7 @@ i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all
2
-We’ll re-order the column by size.
+We’ll re-order the column by size.
@@ -601,19 +602,19 @@ i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all
1
-First, remove the column with region names and the totals for the regions as we want just integer data.
+First, remove the column with region names and the totals for the regions as we want just integer data.
2
-Second calculate the totals. In this example we use the tidyverse library dplyr(), but you can also do this using base R with colsums() like this: uk_census_2021_religion_totals <- colSums(uk_census_2021_religion_totals, na.rm = TRUE). The downside with base R is that you’ll also need to convert the result into a dataframe for ggplot like this: uk_census_2021_religion_totals <- as.data.frame(uk_census_2021_religion_totals)
+Second calculate the totals. In this example we use the tidyverse library dplyr(), but you can also do this using base R with colsums() like this: uk_census_2021_religion_totals <- colSums(uk_census_2021_religion_totals, na.rm = TRUE). The downside with base R is that you’ll also need to convert the result into a dataframe for ggplot like this: uk_census_2021_religion_totals <- as.data.frame(uk_census_2021_religion_totals)
3
-In order to visualise this data using ggplot, we need to shift this data from wide to long format. This is a quick job using gather()
+In order to visualise this data using ggplot, we need to shift this data from wide to long format. This is a quick job using gather()
4
-Now plot it out and have a look!
+Now plot it out and have a look!
@@ -691,8 +692,12 @@ i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all
Change orientation of X axis labels + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Relabel fields Simplify y-axis labels Add percentage text to bars (or maybe save for next chapter?)
-
-
2.6 Multifactor Visualisation
+
+
2.6 Making our script reproducible
+
Let’s take a moment to review our hacker code. I’ve just spent some time addressing how we can be truthful in our data science work. We haven’t done much yet to talk abour reproducibility.
+
+
+
2.7 Multifactor Visualisation
One element of R data analysis that can get really interesting is working with multiple variables. Above we’ve looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, we’ve already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantative variable (also known as bivariate data) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Let’s have a look at the way that religion in England and Wales breaks down by ethnicity.
# Filter down to simplified dataset with England / Wales and percentages without totalsuk_census_2011_religion_ethnicitity <-filter(uk_census_2011_religion_ethnicitity, GEOGRAPHY_NAME=="England and Wales"& C_RELPUK11_NAME !="All categories: Religion"& C_ETHPUK11_NAME !="All categories: Ethnic group")# Simplify data to only include general totals and omit subcategories
-uk_census_2011_religion_ethnicitity <- uk_census_2011_religion_ethnicitity %>%filter(grepl('Total', C_ETHPUK11_NAME))
+uk_census_2011_religion_ethnicitity <- uk_census_2011_religion_ethnicitity %>%filter(grepl('Total', C_ETHPUK11_NAME))
+
+ggplot(uk_census_2011_religion_ethnicitity, aes(fill=C_ETHPUK11_NAME, x=C_RELPUK11_NAME, y=OBS_VALUE)) +geom_bar(position="dodge", stat ="identity", colour ="black") +scale_fill_brewer(palette ="Set1") +ggtitle("Religious Affiliation in the 2021 Census of England and Wales") +xlab("") +ylab("") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))
+
+
+
+
+
The trouble with using grouped bars here, as you can see, is that there are quite sharp disparities which make it hard to compare in meaningful ways. We could use logarithmic rather than linear scaling as an option, but this is hard for many general public audiences to apprecaite without guidance. One alternative quick fix is to extract data from “white” respondents which can then be placed in a separate chart with a different scale.
+
+
# Filter down to simplified dataset with England / Wales and percentages without totals
+uk_census_2011_religion_ethnicitity_white <-filter(uk_census_2011_religion_ethnicitity, C_ETHPUK11_NAME =="White: Total")
+uk_census_2011_religion_ethnicitity_nonwhite <-filter(uk_census_2011_religion_ethnicitity, C_ETHPUK11_NAME !="White: Total")
+
+ggplot(uk_census_2011_religion_ethnicitity_nonwhite, aes(fill=C_ETHPUK11_NAME, x=C_RELPUK11_NAME, y=OBS_VALUE)) +geom_bar(position="dodge", stat ="identity", colour ="black") +scale_fill_brewer(palette ="Set1") +ggtitle("Religious Affiliation in the 2021 Census of England and Wales") +xlab("") +ylab("") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))
+
+
+
+
+
This still doesn’t quite render with as much visual clarity and communication as I’d like. For a better look, we can use a technique in R called “faceting” to create a series of small charts which can be viewed alongside one another.
+
+
ggplot(uk_census_2011_religion_ethnicitity_nonwhite, aes(x=C_RELPUK11_NAME, y=OBS_VALUE)) +geom_bar(position="dodge", stat ="identity", colour ="black") +facet_wrap(~C_ETHPUK11_NAME, ncol =2) +scale_fill_brewer(palette ="Set1") +ggtitle("Religious Affiliation in the 2011 Census of England and Wales") +xlab("") +ylab("") +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1))