mirror of
https://github.com/kidwellj/hacking_religion_textbook.git
synced 2025-01-09 00:42:23 +00:00
added content to chapter 1
This commit is contained in:
parent
54fe76ceff
commit
76b43ea1bf
|
@ -15,8 +15,7 @@ setwd("/Users/kidwellj/gits/hacking_religion_textbook/hacking_religion")
|
|||
library(here) # much better way to manage working paths in R across multiple instances
|
||||
library(tidyverse)
|
||||
here::i_am("chapter_1.qmd")
|
||||
|
||||
religion_uk <- read.csv(here("example_data", "census2021-ts030-rgn.csv"))
|
||||
uk_census_2021_religion <- read.csv(here("example_data", "census2021-ts030-rgn.csv"))
|
||||
```
|
||||
|
||||
### Examining data:
|
||||
|
@ -24,19 +23,21 @@ religion_uk <- read.csv(here("example_data", "census2021-ts030-rgn.csv"))
|
|||
What's in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:
|
||||
|
||||
```{r .column-page}
|
||||
head(religion_uk)
|
||||
head(uk_census_2021_religion)
|
||||
summarise(uk_census_2021_religion)
|
||||
rowSums(uk_census_2021_religion)
|
||||
```
|
||||
|
||||
This is actually a fairly ugly table, so I'll use an R tool called kable to give you prettier tables in the future, like this:
|
||||
|
||||
```{r}
|
||||
knitr::kable(head(religion_uk))
|
||||
knitr::kable(head(uk_census_2021_religion))
|
||||
```
|
||||
|
||||
You can see how I've nested the previous command inside the `kable` command. For reference, in some cases when you're working with really complex scripts with many different libraries and functions, they may end up with functions that have the same name. You can specify the library where the function is meant to come from by preceding it with :: as we've done `knitr::` above. The same kind of output can be gotten using `tail`:
|
||||
|
||||
```{r}
|
||||
knitr::kable(tail(religion_uk))
|
||||
knitr::kable(tail(uk_census_2021_religion))
|
||||
```
|
||||
|
||||
### Parsing and Exploring your data
|
||||
|
@ -47,7 +48,7 @@ You can use the `filter` command to do this. To give an example, `filter` can pi
|
|||
|
||||
|
||||
```{r}
|
||||
wmids_data <- religion_uk %>%
|
||||
uk_census_2021_religion_wmids <- uk_census_2021_religion %>%
|
||||
filter(geography=="West Midlands")
|
||||
```
|
||||
|
||||
|
@ -63,8 +64,8 @@ In keeping with my goal to demonstrate data science through examples, we're goin
|
|||
We've got a nice lean set of data, so now it's time to visualise this. We'll start by making a pie chart:
|
||||
|
||||
```{r}
|
||||
wmids_data <- wmids_data %>% select(no_religion:no_response)
|
||||
wmids_data <- gather(wmids_data)
|
||||
uk_census_2021_religion_wmids <- uk_census_2021_religion_wmids %>% select(no_religion:no_response)
|
||||
uk_census_2021_religion_wmids <- gather(uk_census_2021_religion_wmids)
|
||||
```
|
||||
|
||||
|
||||
|
@ -73,7 +74,7 @@ There are two basic ways to do visualisations in R. You can work with basic func
|
|||
#### Base R
|
||||
|
||||
```{r}
|
||||
df <- wmids_data[order(wmids_data$value,decreasing = TRUE),]
|
||||
df <- uk_census_2021_religion_wmids[order(uk_census_2021_religion_wmids$value,decreasing = TRUE),]
|
||||
barplot(height=df$value, names=df$key)
|
||||
```
|
||||
|
||||
|
@ -81,20 +82,61 @@ barplot(height=df$value, names=df$key)
|
|||
#### GGPlot
|
||||
|
||||
```{r}
|
||||
# unsorted
|
||||
ggplot(wmids_data, aes(x = key, y = value)) +
|
||||
geom_bar(stat = "identity")
|
||||
|
||||
# with sorting added in
|
||||
ggplot(wmids_data, aes(x= reorder(key,-value),value)) + geom_bar(stat ="identity")
|
||||
ggplot(uk_census_2021_religion_wmids, aes(x = key, y = value)) + # <1>
|
||||
geom_bar(stat = "identity") # <1>
|
||||
ggplot(uk_census_2021_religion_wmids, aes(x= reorder(key,-value),value)) + geom_bar(stat ="identity") # <2>
|
||||
```
|
||||
|
||||
Clean up chart features
|
||||
1. First we'll plot the data using `ggplot` and then...
|
||||
2. We'll re-order the column by size.
|
||||
|
||||
Let's assume we're working with a data set that doesn't include a "totals" column and that we might want to get sums for each column. This is pretty easy to do in R:
|
||||
|
||||
```{r}
|
||||
|
||||
uk_census_2021_religion_totals <- uk_census_2021_religion %>% select(no_religion:no_response) # <1>
|
||||
uk_census_2021_religion_totals <- uk_census_2021_religion_totals %>%
|
||||
summarise(across(everything(), ~ sum(., na.rm = TRUE))) # <2>
|
||||
uk_census_2021_religion_totals <- gather(uk_census_2021_religion_totals) # <3>
|
||||
ggplot(uk_census_2021_religion_totals, aes(x= reorder(key,-value),value)) + geom_bar(stat ="identity") # <4>
|
||||
```
|
||||
|
||||
1. First, remove the column with region names and the totals for the regions as we want just integer data.
|
||||
2. Second calculate the totals. In this example we use the tidyverse library `dplyr()`, but you can also do this using base R with `colsums()` like this: `uk_census_2021_religion_totals <- colSums(uk_census_2021_religion_totals, na.rm = TRUE)`. The downside with base R is that you'll also need to convert the result into a dataframe for `ggplot` like this: `uk_census_2021_religion_totals <- as.data.frame(uk_census_2021_religion_totals)`
|
||||
3. In order to visualise this data using ggplot, we need to shift this data from wide to long format. This is a quick job using gather()
|
||||
4. Now plot it out and have a look!
|
||||
|
||||
You might have noticed that these two dataframes give us somewhat different results. But with data science, it's much more interesting to compare these two side-by-side in a visualisation. We can join these two dataframes and plot the bars side by side using `bind()` - which can be done by columns with cbind() and rows using rbind():
|
||||
|
||||
```{r}
|
||||
uk_census_2021_religion_merged <- rbind(uk_census_2021_religion_totals, uk_census_2021_religion_wmids)
|
||||
```
|
||||
|
||||
Do you notice there's going to be a problem here? How can we tell one set from the other? We need to add in something idenfiable first! This isn't too hard to do as we can simply create a new column for each with identifiable information before we bind them:
|
||||
|
||||
```{r}
|
||||
uk_census_2021_religion_totals$dataset <- c("totals")
|
||||
uk_census_2021_religion_wmids$dataset <- c("wmids")
|
||||
uk_census_2021_religion_merged <- rbind(uk_census_2021_religion_totals, uk_census_2021_religion_wmids)
|
||||
```
|
||||
|
||||
Now we're ready to plot out our data as a grouped barplot:
|
||||
|
||||
```{r}
|
||||
ggplot(uk_census_2021_religion_merged, aes(fill=dataset, x= reorder(key,-value), value)) + geom_bar(position="dodge", stat ="identity")
|
||||
```
|
||||
If you're looking closely, you will notice that I've added two elements to our previous ggplot. I've asked ggplot to fill in the columns with reference to the `dataset` column we've just created. Then I've also asked ggplot to alter the `position="dodge"` which places bars side by side rather than stacked on top of one another. You can give it a try without this instruction to see how this works. We will use stacked bars in a later chapter, so remember this feature.
|
||||
|
||||
If you inspect our chart, you can see that we're getting closer, but it's not really that helpful to compare the totals. What we need to do is get percentages that can be compared side by side. This is easy to do using another `dplyr` feature `mutate`:
|
||||
|
||||
```{r}
|
||||
uk_census_2021_religion_totals <- uk_census_2021_religion_totals %>%
|
||||
dplyr::mutate(perc = scales::percent(value / sum(value), accuracy = 0.1, trim = FALSE)) # <3>
|
||||
uk_census_2021_religion_wmids <- uk_census_2021_religion_wmids %>%
|
||||
dplyr::mutate(perc = scales::percent(value / sum(value), accuracy = 0.1, trim = FALSE)) # <3>
|
||||
uk_census_2021_religion_merged <- rbind(uk_census_2021_religion_totals, uk_census_2021_religion_wmids)
|
||||
ggplot(uk_census_2021_religion_merged, aes(fill=dataset, x=key, y=perc)) + geom_bar(position="dodge", stat ="identity")
|
||||
```
|
||||
Now you can see a very rough comparison
|
||||
|
||||
Add time series data for 2001 and 2011 census, change to grouped bar plot:
|
||||
|
||||
|
@ -104,30 +146,10 @@ https://r-graphics.org/recipe-bar-graph-grouped-bar#discussion-8
|
|||
Reference on callout box syntax here: https://quarto.org/docs/authoring/callouts.html
|
||||
-->
|
||||
|
||||
::: {.callout-tip}
|
||||
## What is Religion?
|
||||
Content tbd
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
::: {.callout-tip}
|
||||
## Hybrid Religious Identity
|
||||
Content tbd
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
::: {.callout-tip}
|
||||
## What is Secularisation?
|
||||
Content tbd
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
# References {.unnumbered}
|
||||
|
||||
|
|
|
@ -47,404 +47,62 @@ You'll find that many surveys will only use one of these forms of question and i
|
|||
::: {.callout-tip}
|
||||
## So *who's* religious?
|
||||
|
||||
Content tbd
|
||||
As I've already hinted in the previous chapter, measuring religiosity is complicated. I suspect some readers may be wondering something like, "what's the right question to ask?" here. Do we get the most accurate representation by asking people to self-report their religious affiliation? Or is it more accurate to ask individuals to report on how religious they are? Is it, perhaps, better to assume that the indirect query about practice, e.g. how frequently one attends services at a place of worship may be the most reliable proxy?
|
||||
|
||||
Highlight challenges of various approaches pointing to literature.
|
||||
|
||||
:::
|
||||
|
||||
Let's dive into the data and see how this all works out:
|
||||
|
||||
Let's dive into the data and see how this all works out. We'll start with the question 56 data, around religious affiliation:
|
||||
|
||||
```{r}
|
||||
|
||||
# Load some new libraries used by functions below
|
||||
|
||||
library(RColorBrewer)
|
||||
library(hrbrthemes) # Used for ipsum theme etc.
|
||||
library(ggeasy) # used for easy_center_title() which is not strictly necessary, but tidier than theme(plot.title = element_text(hjust = 0.5))
|
||||
|
||||
# Define colour palettes
|
||||
# TODO: confirm final colour scheme for charts and normalise across usage of different themes
|
||||
coul3 <- brewer.pal(3, "RdYlBu") # Using RdYlBu range to generate 3 colour palette: https://colorbrewer2.org/#type=diverging&scheme=RdYlBu&n=5
|
||||
coul4 <- brewer.pal(4, "RdYlBu")
|
||||
coul5 <- brewer.pal(5, "RdYlBu")
|
||||
coul6 <- brewer.pal(6, "RdYlBu")
|
||||
coul7 <- brewer.pal(7, "RdYlBu")
|
||||
coul4_reversed <- c("#2C7BB6", "#ABD9E9", "#FDAE61", "#D7191C")
|
||||
coul6_reversed <- c("#4575B4", "#91BFDB" , "#E0F3F8" , "#FEE090", "#FC8D59", "#D73027")
|
||||
white <- "#ffffff"
|
||||
purple <- "#590048"
|
||||
ochre <- "#B18839"
|
||||
ochre_12 <- wheel(ochre, num = 12)
|
||||
purple_12 <- wheel(purple, num = 12)
|
||||
|
||||
# Reusable Functions ------------------------------------------------------
|
||||
|
||||
# Importing code for colortools() now deprecated and removed from CRAN here. Some minor modifications to update code, but generally all credit here goes to Gaston Sanchez
|
||||
|
||||
setColors <- function(color, num) {
|
||||
# convert to RGB
|
||||
rgb_col = col2rgb(color)
|
||||
# convert to HSV
|
||||
hsv_col = rgb2hsv(rgb_col)[,1]
|
||||
# get degree
|
||||
hue = hsv_col[1]
|
||||
sat = hsv_col[2]
|
||||
val = hsv_col[3]
|
||||
cols = seq(hue, hue + 1, by=1/num)
|
||||
cols = cols[1:num]
|
||||
cols[cols > 1] <- cols[cols > 1] - 1
|
||||
# get colors with hsv
|
||||
colors = hsv(cols, sat, val)
|
||||
# transparency
|
||||
if (substr(color, 1, 1) == "#" && nchar(color) == 9)
|
||||
({
|
||||
alpha = substr(color, 8, 9)
|
||||
colors = paste(colors, alpha, sep="")
|
||||
})
|
||||
colors
|
||||
}
|
||||
|
||||
complementary <- function(color, plot=TRUE, bg="white", labcol=NULL, cex=0.8, title=TRUE) {
|
||||
tmp_cols = setColors(color, 12)
|
||||
comp_colors <- tmp_cols[c(1, 7)]
|
||||
|
||||
# plot
|
||||
if (plot)
|
||||
({
|
||||
# labels color
|
||||
if (is.null(labcol))
|
||||
({
|
||||
lab_col = rep("", 12)
|
||||
if (mean(col2rgb(bg)) > 127)
|
||||
({
|
||||
lab_col[c(1, 7)] <- "black"
|
||||
lab_col[c(2:6,8:12)] <- col2HSV(bg)
|
||||
}) else ({
|
||||
lab_col[c(1, 7)] <- "white"
|
||||
lab_col[c(2:6,8:12)] <- col2HSV(bg)
|
||||
})
|
||||
}) else ({
|
||||
lab_col = rep(labcol, 12)
|
||||
if (mean(col2rgb(bg)) > 127)
|
||||
({
|
||||
lab_col[c(1, 7)] <- labcol
|
||||
lab_col[c(2:6,8:12)] <- col2HSV(bg)
|
||||
}) else ({
|
||||
lab_col[c(1, 7)] <- labcol
|
||||
lab_col[c(2:6,8:12)] <- col2HSV(bg)
|
||||
})
|
||||
})
|
||||
# hide non-adjacent colors
|
||||
tmp_cols[c(2:6,8:12)] <- paste(substr(tmp_cols[c(2:6,8:12)],1,7), "0D", sep="")
|
||||
pizza(tmp_cols, labcol=lab_col, bg=bg, cex=cex)
|
||||
# title
|
||||
if (title)
|
||||
title(paste("Complementary (opposite) color of: ", tmp_cols[1]),
|
||||
col.main=lab_col[1], cex.main=0.8)
|
||||
})
|
||||
# result
|
||||
comp_colors
|
||||
}
|
||||
|
||||
sequential <- function(color, percentage=5, what="saturation", s=NULL, v=NULL, alpha=NULL, fun="linear", plot=TRUE, verbose=TRUE) {
|
||||
# convert to HSV
|
||||
col_hsv = rgb2hsv(col2rgb(color))[,1]
|
||||
# transparency
|
||||
if (is.null(alpha))
|
||||
alpha = 1
|
||||
if (substr(color, 1, 1) == "#" && nchar(color) == 9)
|
||||
alpha = substr(color, 8, 9)
|
||||
# get hue, saturation, and value
|
||||
hue = col_hsv[1]
|
||||
if (is.null(s)) s = col_hsv[2]
|
||||
if (is.null(v)) v = col_hsv[3]
|
||||
# sequence function
|
||||
getseq = switch(fun,
|
||||
linear = seq(0, 1, by=percentage/100),
|
||||
sqrt = sqrt(seq(0, 1, by=percentage/100)),
|
||||
log = log1p(seq(0, 1, by=percentage/100)),
|
||||
log10 = log10(seq(0, 1, by=percentage/100))
|
||||
)
|
||||
# what type of sequence?
|
||||
if (what == "saturation") ({
|
||||
sat = getseq
|
||||
fixed = paste("v=", round(v,2), " and alpha=", alpha, sep="")
|
||||
if (is.numeric(alpha))
|
||||
seq_col = hsv(hue, s=sat, v=v, alpha=alpha)
|
||||
if (is.character(alpha)) ({
|
||||
seq_col = hsv(hue, s=sat, v=v)
|
||||
seq_col = paste(seq_col, alpha, sep="")
|
||||
})
|
||||
})
|
||||
if (what == "value") ({
|
||||
val = getseq
|
||||
fixed = paste("s=", round(s,2), " and alpha=", alpha, sep="")
|
||||
if (is.numeric(alpha))
|
||||
seq_col = hsv(hue, s=s, v=val, alpha=alpha)
|
||||
if (is.character(alpha)) ({
|
||||
seq_col = hsv(hue, s=s, v=val)
|
||||
seq_col = paste(seq_col, alpha, sep="")
|
||||
})
|
||||
})
|
||||
if (what == "alpha") ({
|
||||
alpha = getseq
|
||||
fixed = paste("s=", round(s,2), " and v=", round(v,2), sep="")
|
||||
seq_col = hsv(hue, s=s, v=v, alpha=alpha)
|
||||
})
|
||||
# if plot TRUE
|
||||
if (plot)
|
||||
({
|
||||
n = length(seq(0, 1, by=percentage/100))
|
||||
fx = unlist(fixed)
|
||||
#dev.new()
|
||||
plot(0, 0, type="n", xlim=c(0,1), ylim=c(0,1), axes=FALSE, xlab="", ylab="")
|
||||
rect(0:(n-1)/n, 0, 1:n/n, 1, col=seq_col, border="lightgray")
|
||||
mtext(seq_col, side=1, at=0.5:(n)/n, cex=0.8, las=2)
|
||||
title(paste("Sequential colors based on ", what, "\n with fixed ", fx, sep=""),
|
||||
cex.main=0.9)
|
||||
})
|
||||
# result
|
||||
if (verbose)
|
||||
seq_col
|
||||
}
|
||||
|
||||
wheel <- function(color, num=12, bg="gray95", border=NULL, init.angle=105, cex=1, lty=NULL, main=NULL, verbose=TRUE, ...) {
|
||||
if (!is.numeric(num) || any(is.na(num) | num < 0))
|
||||
stop("\n'num' must be positive")
|
||||
x <- rep(1, num)
|
||||
x <- c(0, cumsum(x)/sum(x))
|
||||
dx <- diff(x)
|
||||
nx <- length(dx)
|
||||
# set colors
|
||||
col = setColors(color, num)
|
||||
labels = col
|
||||
# labels color
|
||||
labcol = ifelse( mean(col2rgb(bg)) > 127, "black", "white")
|
||||
# prepare plot window
|
||||
par(bg = bg)
|
||||
plot.new()
|
||||
pin <- par("pin")
|
||||
xlim <- ylim <- c(-1, 1)
|
||||
if (pin[1L] > pin[2L])
|
||||
xlim <- (pin[1L]/pin[2L]) * xlim
|
||||
else ylim <- (pin[2L]/pin[1L]) * ylim
|
||||
dev.hold()
|
||||
on.exit(dev.flush())
|
||||
plot.window(xlim, ylim, "", asp = 1)
|
||||
# get ready to plot
|
||||
if (is.null(border[1])) ({
|
||||
border <- rep(bg, length.out = nx)
|
||||
}) else ({
|
||||
border <- rep(border, length.out = nx)
|
||||
})
|
||||
if (!is.null(lty))
|
||||
lty <- rep(NULL, length.out = nx)
|
||||
angle <- rep(45, length.out = nx)
|
||||
radius = seq(1, 0, by=-1/num)[1:num]
|
||||
twopi <- -2 * pi
|
||||
t2xy <- function(t, rad) ({
|
||||
t2p <- twopi * t + init.angle * pi/180
|
||||
list(x = rad * cos(t2p), y = rad * sin(t2p))
|
||||
})
|
||||
# plot colored segments
|
||||
for (i in 1L:nx)
|
||||
({
|
||||
n <- max(2, floor(200 * dx[i]))
|
||||
P <- t2xy(seq.int(x[i], x[i + 1], length.out = n), rad=radius[1])
|
||||
polygon(c(P$x, 0), c(P$y, 0), angle = angle[i],
|
||||
border = border[i], col = col[i], lty = lty[i])
|
||||
P <- t2xy(mean(x[i + 0:1]), rad=radius[1])
|
||||
lab <- labels[i]
|
||||
if (!is.na(lab) && nzchar(lab)) ({
|
||||
adjs = 0.5
|
||||
if (P$x > 1e-08) adjs <- 0
|
||||
if (P$x < -1e-08) adjs <- 1
|
||||
lines(c(1, 1.05) * P$x, c(1, 1.05) * P$y)
|
||||
text(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE,
|
||||
adj = adjs, cex=cex, col=labcol, ...)
|
||||
})
|
||||
})
|
||||
# add title
|
||||
title(main = main, ...)
|
||||
# return color names
|
||||
if (verbose)
|
||||
col
|
||||
}
|
||||
|
||||
# function to produce horizontal bar chart, colours drawn from "ochre" colour wheel defined above to match report
|
||||
plot_horizontal_bar <- function(x) {
|
||||
## code if a specific palette is needed for matching
|
||||
fill = wheel(ochre, num = as.integer(count(x[1])))
|
||||
#fill = scale_fill_brewer()
|
||||
# make plot
|
||||
ggplot(x, aes(x = n, y = response, fill = fill)) +
|
||||
geom_col(colour = "white") +
|
||||
## add percentage labels
|
||||
geom_text(aes(label = perc),
|
||||
## make labels left-aligned and white
|
||||
hjust = 1, nudge_x = -.5, colour = "black", size=3) +
|
||||
## reduce spacing between labels and bars
|
||||
scale_fill_identity(guide = "none") +
|
||||
## get rid of all elements except y axis labels + adjust plot margin
|
||||
theme_ipsum_rc() +
|
||||
theme(plot.margin = margin(rep(15, 4))) +
|
||||
easy_center_title()
|
||||
}
|
||||
|
||||
qualtrics_process_single_multiple_choice <- function(x) {
|
||||
# create separate data frame
|
||||
df <- as.data.frame(x)
|
||||
# make column names coherent and simplified
|
||||
names(df) <- c("response")
|
||||
# filter out NA values
|
||||
df <- filter(df, !is.na(response))
|
||||
# generate new dataframe with sums per category and sort in descending order
|
||||
sums <- df %>%
|
||||
dplyr::count(response, sort = TRUE) %>%
|
||||
dplyr::mutate(
|
||||
response = forcats::fct_rev(forcats::fct_inorder(response))
|
||||
)
|
||||
# add new column with percentages for each sum
|
||||
sums <- sums %>%
|
||||
dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = 1, trim = FALSE))
|
||||
}
|
||||
|
||||
qualtrics_process_single_multiple_choice_unsorted_streamlined <- function(x) {
|
||||
# create separate data frame
|
||||
df <- as.data.frame(as_factor(x))
|
||||
# make column names coherent and simplified
|
||||
names(df) <- c("response")
|
||||
# filter out NA values
|
||||
df <- filter(df, !is.na(response))
|
||||
# generate new dataframe with sums per category and sort in descending order
|
||||
sums <- df %>%
|
||||
dplyr::count(response, sort = FALSE)
|
||||
# add new column with percentages for each sum
|
||||
sums <- sums %>%
|
||||
dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = 1, trim = FALSE))
|
||||
}
|
||||
|
||||
qualtrics_process_single_multiple_choice_basic <- function(x) {
|
||||
# create separate data frame
|
||||
df <- as_factor(x)
|
||||
# make column names coherent and simplified
|
||||
names(df) <- c("response")
|
||||
# filter out NA values
|
||||
df <- filter(df, !is.na(response))
|
||||
# generate new dataframe with sums per category and sort in descending order
|
||||
sums <- df %>%
|
||||
dplyr::count(response, sort = FALSE)
|
||||
# add new column with percentages for each sum
|
||||
sums <- sums %>%
|
||||
dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = 1, trim = FALSE))
|
||||
}
|
||||
|
||||
qualtrics_process_single_multiple_choice_unsorted <- function(x) {
|
||||
# create separate data frame
|
||||
df <- as.data.frame(x)
|
||||
# make column names coherent and simplified
|
||||
names(df) <- c("response")
|
||||
# filter out NA values
|
||||
df <- filter(df, !is.na(response))
|
||||
# generate new dataframe with sums per category and sort in descending order
|
||||
sums <- df %>%
|
||||
dplyr::count(response, sort = FALSE) %>%
|
||||
dplyr::mutate(
|
||||
response = forcats::fct_rev(forcats::fct_inorder(response))
|
||||
)
|
||||
# add new column with percentages for each sum
|
||||
sums <- sums %>%
|
||||
dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = 1, trim = FALSE))
|
||||
}
|
||||
|
||||
# function to produce a summary table of results for a single column using flextable
|
||||
|
||||
chart_single_result_flextable <- function(.data, var) {
|
||||
table <- table(.data)
|
||||
# add calculations and convert to a flextable object
|
||||
table %>%
|
||||
prop.table %>% # turn this into a table of proportions
|
||||
# flextable requires a dataframe
|
||||
as.data.frame() %>%
|
||||
set_names(c("Variable", "Count")) %>%
|
||||
# arrange in descending order
|
||||
arrange({{ var }}) %>%
|
||||
# convert table object to a flextable()
|
||||
flextable(defaults = TRUE) %>%
|
||||
# adjust column widths automatically to fit widest values
|
||||
style(part = 'body', pr_t=fp_text(font.family='Roboto')) %>%
|
||||
style(part = 'header', pr_t=fp_text(font.family='Roboto')) %>%
|
||||
# note, likert also uses set_caption() so need to specify flextable:: here
|
||||
flextable::set_caption(caption, style = "Table Caption", autonum = run_autonum(seq_id = "tab", bkm = "figures", bkm_all = TRUE)) %>%
|
||||
autofit() %>%
|
||||
theme_vanilla() %>%
|
||||
# format numbers in count column as rounded percentages
|
||||
set_formatter( table, Count = function(x) sprintf( "%.1f%%", x*100 ))
|
||||
}
|
||||
|
||||
chart_single_result_flextable_unsorted <- function(.data, var) {
|
||||
table <- table(.data)
|
||||
# add calculations and convert to a flextable object
|
||||
table %>%
|
||||
prop.table %>% # turn this into a table of proportions
|
||||
# flextable requires a dataframe
|
||||
as.data.frame() %>%
|
||||
set_names(c("Variable", "Count")) %>%
|
||||
# convert table object to a flextable()
|
||||
flextable(defaults = TRUE) %>%
|
||||
# adjust column widths automatically to fit widest values
|
||||
style(part = 'body', pr_t=fp_text(font.family='Roboto')) %>%
|
||||
style(part = 'header', pr_t=fp_text(font.family='Roboto')) %>%
|
||||
# note, likert also uses set_caption() so need to specify flextable:: here
|
||||
flextable::set_caption(caption, style = "Table Caption", autonum = run_autonum(seq_id = "tab", bkm = "figures", bkm_all = TRUE)) %>%
|
||||
autofit() %>%
|
||||
theme_vanilla() %>%
|
||||
# format numbers in count column as rounded percentages
|
||||
set_formatter( table, Count = function(x) sprintf( "%.1f%%", x*100 ))
|
||||
}
|
||||
religious_affiliation <- as_tibble(as_factor(climate_experience_data$Q56)) # <1>
|
||||
names(religious_affiliation) <- c("response") # <2>
|
||||
religious_affiliation <- filter(religious_affiliation, !is.na(response)) # <3>
|
||||
```
|
||||
|
||||
There are few things we need to do here to get the data into initial proper shape. This might be called "cleaning" the data:
|
||||
|
||||
1. Because we imported this data from an SPSS `.sav` file format using the R `haven()` library, we need to start by adapting the data into a format that our visualation engine ggplot can handle (a dataframe).
|
||||
2. Next we'll rename the columns so these names are a bit more useful.
|
||||
3. We need to omit non-responses so these don't mess with the counting (these are `NA` in R)
|
||||
|
||||
If we pause at this point to view the data, you'll see it's basically just a long list of survey responses. What we need is a count of each unique response (or `factor`). This will take a few more steps:
|
||||
|
||||
```{r}
|
||||
|
||||
# religious_affiliation
|
||||
# migrate haven data into separate data frame
|
||||
religious_affiliation <- as_tibble(as_factor(climate_experience_data$Q56))
|
||||
# make column names coherent and simplified
|
||||
names(religious_affiliation) <- c("response")
|
||||
# filter out NA values
|
||||
religious_affiliation <- filter(religious_affiliation, !is.na(response))
|
||||
# generate new dataframe with sums per category and sort in descending order
|
||||
religious_affiliation_sums <- religious_affiliation %>%
|
||||
dplyr::count(response, sort = TRUE) %>%
|
||||
dplyr::mutate(response = forcats::fct_rev(forcats::fct_inorder(response)))
|
||||
# add new column with percentages for each sum
|
||||
religious_affiliation_sums <- religious_affiliation %>%
|
||||
dplyr::count(response, sort = TRUE) %>% # <1>
|
||||
dplyr::mutate(response = forcats::fct_rev(forcats::fct_inorder(response))) # <2>
|
||||
religious_affiliation_sums <- religious_affiliation_sums %>%
|
||||
dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = 1, trim = FALSE))
|
||||
dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = .1, trim = FALSE)) # <3>
|
||||
```
|
||||
|
||||
1. First we generate new a dataframe with sums per category and
|
||||
2. ...sort in descending order
|
||||
3. Then we add new column with percentages based on the sums you've just generated
|
||||
|
||||
That should give us a tidy table of results, which you can see if you view the contents of our new `religious_affiliation_sums` dataframe:
|
||||
|
||||
```{r}
|
||||
head(religious_affiliation_sums)
|
||||
```
|
||||
|
||||
|
||||
# TODO: use mutate to put "prefer not to say" at the bottom
|
||||
# Info here: https://r4ds.had.co.nz/factors.html#modifying-factor-levels
|
||||
|
||||
caption <- "Religious Affiliation"
|
||||
|
||||
```{r}
|
||||
# make plot
|
||||
ggplot(religious_affiliation_sums, aes(x = n, y = response)) +
|
||||
geom_col(colour = "white") +
|
||||
## add percentage labels
|
||||
geom_text(aes(label = perc),
|
||||
## make labels left-aligned and white
|
||||
hjust = 1, nudge_x = -.5, colour = "black", size=3)
|
||||
|
||||
religious_affiliation_plot <- religious_affiliation_plot + labs(caption = caption, x = "", y = "")
|
||||
religious_affiliation_plot
|
||||
ggsave("figures/q56_religious_affiliation.png", width = 20, height = 10, units = "cm")
|
||||
hjust = 1, nudge_x = -.5, colour = "white", size=3)
|
||||
```
|
||||
|
||||
Now let's make a table
|
||||
|
||||
```{r}
|
||||
religious_affiliation_table <- chart_single_result_flextable(climate_experience_data$Q56, Variable)
|
||||
religious_affiliation_table
|
||||
save_as_docx(religious_affiliation_table, path = "./figures/q56_religious_affiliation.docx")
|
||||
|
||||
Add colours
|
||||
Use mutate to put "prefer not to say" at the bottom
|
||||
# Info here: https://r4ds.had.co.nz/factors.html#modifying-factor-levels
|
||||
|
||||
|
||||
# Q56 follow-ups
|
||||
|
@ -559,10 +217,16 @@ df %>%
|
|||
guides(fill = guide_legend(title = NULL))
|
||||
ggsave("figures/q59_faceted.png", width = 30, height = 10, units = "cm")
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Comparing with attitudes surrounding climate change
|
||||
|
||||
```{r}
|
||||
|
||||
# Q6
|
||||
|
||||
q6_data <- qualtrics_process_single_multiple_choice_unsorted_streamlined(climate_experience_data$Q6)
|
||||
|
@ -597,11 +261,17 @@ q6_data_plot <- ggplot(q6_data, aes(x = n, y = response, fill = fill)) +
|
|||
q6_data_plot
|
||||
|
||||
ggsave("figures/q6.png", width = 18, height = 12, units = "cm")
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Subsetting
|
||||
|
||||
```{r}
|
||||
## Q57 subsetting based on Religiosity --------------------------------------------------------------
|
||||
climate_experience_data <- climate_experience_data %>%
|
||||
mutate(
|
||||
|
@ -641,9 +311,29 @@ climate_experience_data <- climate_experience_data %>%
|
|||
TRUE ~ "medium"
|
||||
) %>% factor(levels = c("low", "medium", "high"))
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
::: {.callout-tip}
|
||||
## What is Religion?
|
||||
Content tbd
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
::: {.callout-tip}
|
||||
## Hybrid Religious Identity
|
||||
Content tbd
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
::: {.callout-tip}
|
||||
## What is Secularisation?
|
||||
Content tbd
|
||||
:::
|
||||
|
||||
# References {.unnumbered}
|
||||
|
||||
|
|
|
@ -4,7 +4,8 @@ Guides to geographies:
|
|||
https://rconsortium.github.io/censusguide/
|
||||
https://ocsi.uk/2019/03/18/lsoas-leps-and-lookups-a-beginners-guide-to-statistical-geographies/
|
||||
|
||||
|
||||
Extact places of worship from Ordnance survey open data set
|
||||
Calculate proximity to pubs
|
||||
|
||||
# References {.unnumbered}
|
||||
|
||||
|
|
Loading…
Reference in a new issue