mirror of
https://github.com/kidwellj/hacking_religion_textbook.git
synced 2025-06-30 23:24:09 +00:00
updating headings hierarchy in ch1
This commit is contained in:
parent
5116c12449
commit
dd514ea06d
10 changed files with 56 additions and 62 deletions
|
@ -1,5 +1,3 @@
|
|||
# Preamble
|
||||
|
||||
We'll get to the good stuff in a moment, but first we need to do a bit of setup. The code provided here is intended to set up your workspace and is also necessary for the `quarto` application we use to build this book. Quarto is an application which blends together text and blocks of code. You can ignore most of it for now, though if you're running the code as we go along, you'll definitely want to include these lines, as they create directories where your files will go as you create charts and extract data below and tells R where to find those files:
|
||||
|
||||
```{r}
|
||||
|
@ -22,11 +20,11 @@ if (dir.exists("derivedData") == FALSE) {
|
|||
}
|
||||
```
|
||||
|
||||
# The 2021 UK Census
|
||||
# Introducing the 2021 UK Census
|
||||
|
||||
For our first exercise in this book, we're going to work with a census dataset. As you'll see by contrast in chapter 2, census data is intended to represent as fully as possible the demographic features of a specific community, in this case, the United Kingdom. We might assume that a large-scale survey given to 1000 or more respondents and distributed appropriately across a variety of demographics will approximate the results of a census, but there's really no substitite for a survey which has been given to (nearly) the entire population. This also allows us to compare a number of different subsets, as we'll explore further below. The big question that we're confronting in this chapter is how best to represent religious belonging and participation at such a large scale, and to flag up some of the hidden limitations in this seemingly comprehensive dataset.
|
||||
|
||||
## Getting started with UK Census data
|
||||
# Getting started with UK Census data
|
||||
|
||||
Let's start by importing some data into R. Because R is what is called an object-oriented programming language, we'll always take our information and give it a home inside a named object. There are many different kinds of objects, which you can specify, but usually R will assign a type that seems to fit best, often a table of data which looks a bit like a spreadsheet which is called a `dataframe`.
|
||||
|
||||
|
@ -39,7 +37,7 @@ In the example below, we're going to begin by reading in data from a comma separ
|
|||
uk_census_2021_religion <- read.csv(here("example_data", "census2021-ts030-rgn.csv"))
|
||||
```
|
||||
|
||||
## Examining data:
|
||||
# Examining data:
|
||||
|
||||
What's in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:
|
||||
|
||||
|
@ -59,7 +57,7 @@ You can see how I've nested the previous command inside the `kable` command. For
|
|||
knitr::kable(tail(uk_census_2021_religion))
|
||||
```
|
||||
|
||||
## Parsing and Exploring your data
|
||||
# Parsing and Exploring your data
|
||||
|
||||
The first thing you're going to want to do is to take a smaller subset of a large data set, either by filtering out certain columns or rows. Now let's say we want to just work with the data from the West Midlands, and we'd like to omit some of the columns. We can choose a specific range of columns using `select`, like this:
|
||||
|
||||
|
@ -77,7 +75,7 @@ Now we'll use select in a different way to narrow our data to specific columns t
|
|||
In keeping with my goal to demonstrate data science through examples, we're going to move on to producing some snappy looking charts for this data.
|
||||
|
||||
|
||||
## Making your first data visulation: the humble bar chart
|
||||
# Making your first data visulation: the humble bar chart
|
||||
|
||||
We've got a nice lean set of data, so now it's time to visualise this. We'll start by making a pie chart:
|
||||
|
||||
|
@ -89,7 +87,7 @@ uk_census_2021_religion_wmids <- gather(uk_census_2021_religion_wmids)
|
|||
|
||||
There are two basic ways to do visualisations in R. You can work with basic functions in R, often called "base R" or you can work with an alternative library called ggplot:
|
||||
|
||||
### Base R
|
||||
## Base R
|
||||
|
||||
```{r}
|
||||
df <- uk_census_2021_religion_wmids[order(uk_census_2021_religion_wmids$value,decreasing = TRUE),]
|
||||
|
@ -97,7 +95,7 @@ barplot(height=df$value, names=df$key)
|
|||
```
|
||||
|
||||
|
||||
### GGPlot
|
||||
## GGPlot
|
||||
|
||||
```{r}
|
||||
ggplot(uk_census_2021_religion_wmids, aes(x = key, y = value)) + geom_bar(stat = "identity") # <1>
|
||||
|
@ -180,7 +178,7 @@ We can fine tune a few other visual features here as well, like adding a title w
|
|||
```{r}
|
||||
ggplot(uk_census_2021_religion_merged, aes(fill=fct_reorder(dataset, value), x=reorder(key,-value),value, y=perc)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the UK: 2021") + xlab("") + ylab("")
|
||||
```
|
||||
## Is your chart accurate? Telling the truth in data science
|
||||
# Telling the truth in data science: Is your chart accurate?
|
||||
|
||||
If you've been following along up until this point, you'll now have produced a fairly complete data visualisation for the UK census. There is some technical work yet to be done fine-tuning the visualisation of our chart here, but I'd like to pause for a moment and consider an ethical question drawn from the principles I outlined in the introduction: is the title of this chart truthful and accurate?
|
||||
|
||||
|
@ -206,8 +204,7 @@ So if we are going to fine-tune our visuals to ensure they comport with our hack
|
|||
ggplot(uk_census_2021_religion_merged, aes(fill=fct_reorder(dataset, value), x=reorder(key,-value),value, y=perc)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the 2021 Census of England and Wales") + xlab("") + ylab("")
|
||||
```
|
||||
|
||||
|
||||
## Multifactor Visualisation
|
||||
# Multifactor Visualisation
|
||||
|
||||
One element of R data analysis of census datasets that can get really interesting is working with multiple variables. Above we've looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, we've already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantitative variable (also known as bivariate data when you have *two* variables) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Let's have a look at the way that religion in England and Wales breaks down by ethnicity.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue