# Getting into the nitty-gritty details

In this chapter, we'll explore the diverse variety of ways you can frame collecting data around religion. Before we dive into that all, however, you might be wondering, why does it all really matter? Can't we just use the census data and assume that's a reasonably accurate approximation? I'll explore the importance of getting the framing right, or better yet, working with data that seeks to unpack religious belonging, identity, and beliefs (or unbelief) in a variety of ways, but an example might serve to explain why this is important.

The 2016 presidential election result in the USA came as a surprise to many data analysts and pollsters. As the dust settled, a number of analysis scrambled to make sense of things and identify some hidden factor that might have tipped the balance away from the expected winner Hilary Clinton. One of the most widely circulated data points was the role of white evangelical Christians in supporting Trump. Exit polls reported that 81% of this constituency voted for Trump and many major media outlets reported this figure prominently, with public commentary from many religious leaders on the meaning this figure had the social direction of evangelical Christianity.

Far too few observers paused to ask what those exit polls were measuring and a closer look at that information reveals some interesting nuances. There is only a single firm that runs exit polling in the USA, Edison Research, who is contracted to do this work by a consortium of major media news outlets ("the National Election Pool"), which represents ABC News, Associated Press, CBS News, CNN, Fox News, and NBC News. It's not a process driven by slow, nuanced, scholarly researchers strapped for funding, it's a rapid high-stakes data collection exercise meant to provide data which can feed into the election week news cycle. The poll doesn't ask respondents simply if they are "evangelical" it uses a broader proxy question to do this: "Would you describe yourself as a born-again or evangelical Christian?" This term "born-again" can be a useful proxy, but it can also prove misleading. When asked if they are "born again" people who identify with a number of non-Christian religions, and people who might describe themselves as non-religious will also often answer "yes". This is particularly salient given the 2016 exit survey asked this question before asking specifically what a person's religion was, so as Pew Research reported, "everyone who takes the exit poll (including religious “nones” and adherents of non-Christian faiths) has the opportunity to identify as a born-again or evangelical Christian." 

While the "born-again" Christian category tends to correlate to high levels of attendance at worship services, in this case some researchers found that white protestant Christian voters for Trump tended to have low levels of participation in activities. We don't have access to the underlying data, and ultimately the exit polling was quite limited in scope (in some instances respondents weren't even asked about religion), so we'll never really have a proper understanding of what happened demographically in that election. But it's an interesting puzzle to consider how different ways to record participation in religion might fit together, or even be in tension with one another. For this chapter, we're going to take a look at another dataset which gives us exactly this kind of opportunity, to see how different kinds of measurement might reinforce or relate with one another.


# Survey Data: Spotlight Project

In the last chapter we explored some high level data about religion in the UK. This was a census sample, which usually refers to an attempt to get as comprehensive a sample as possible. But this is actually fairly unusual in practice. Depending on how complex a subject is and how representative we want our data to be, it's much more common to use selective sampling, that is survey responses at n=100 or n=1000 at a maximum. The advantage of a census sample is that you can explore how a wide range of other factors - particularly demographics - intersect with your question. And this can be really valuable in the study of religion, particularly as you will see as we go along that responses to some questions are more strongly correlated to things like economic status or educational attainment than they are to religious affiliation. It can be hard to tell if this is the case unless you have enough of a sample to break down into a number of different kinds of subsets. But census samples are complex and expensive to gather, so they're quite rare in practice. 

For this chapter, I'm going to walk you through a data set that a colleague (Charles Ogunbode) and I collected in 2021. Another problem with smaller, more selective samples is that researchers can often undersample minoritised ethnic groups. This is particularly the case with climate change research. Until the time we conducted this research, there had not been a single study investigating the specific experiences of people of colour in relation to climate change in the UK. Past researchers had been content to work with large samples, and assumed that if they had done 1000 surveys and 50 of these were completed by people of colour, they could "tick" the box. But 5% is actually well below levels of representation in the UK generally, and even more sharply the case for specific communities and regions in the UK. And if we bear in mind that non-white respondents are (of course!) a highly heterogenous group, we're even more behind in terms of collecting data that can improve our knowledge. Up until recently researchers just haven't been paying close enough attention to catch the significant neglect of the empirical field that this represents.

While I've framed my comments above in terms of climate change research, it is also the case that, especially in diverse societies like the USA, Canada, the UK etc., paying attention to non-majority groups and people and communities of colour automatically draws in a strongly religious sample. This is highlighted in one recent study done in the UK, the "[Black British Voices Report](https://www.cam.ac.uk/stories/black-british-voices-report)" in which the researchers observed that "84% of respondents described themselves as religious and/or spiritual". My comments above in terms of controlling for other factors remains important here - these same researchers also note that "despire their significant important to the lives of Black Britons, only 7% of survey respondents reported that their religion was more defining of their identity than their race".

We've decided to open up access to our data and I'm highlighting it in this book because it's a unique opportunitiy to explore a dataset that emphasises diversity from the start, and by extension, provides some really interesting ways to use data science techniques to explore religion in the UK.

## Loading in some data

```{r, results = 'hide'}
# R Setup -----------------------------------------------------------------
setwd("/Users/kidwellj/gits/hacking_religion_textbook/hacking_religion")
library(here)  |> suppressPackageStartupMessages()
library(tidyverse)  |> suppressPackageStartupMessages()
# used for importing SPSS .sav files
library(haven)   |> suppressPackageStartupMessages()
here::i_am("chapter_2.qmd")
climate_experience_data <- read_sav(here("example_data", "climate_experience_data.sav"))
```

The first thing to note here is that we've drawn in a different type of data file, this time from an `.sav` file, which is usually produced by the statistics software package SPSS. This uses a different R Library (I use `haven` for this). The upside is that in some cases where you have survey data with both a code like "very much agree" which corresponds to a value (like "1") this package will preserve both in the R dataframe that is created. Now that you've loaded in data, you have a new R dataframe called "climate_experience_data" with a lot of columns with just under 1000 survey responses.

## How can you ask about religion?

One of the challenges we faced when running this study is how to gather responsible data from surveys regarding religious identity. We'll dive into this in depth as we do analysis and look at some of the agreements and conflicts in terms of respondent attribution. Just to set the stage, we used the following kinds of question to ask about religion and spirituality:


### "What is your religion?"

The first, and perhaps most obvious question (Question 56 in the dataset) asks respondents simply, "What is your religion?" and then provides a range of possible answers. We included follow-up questions regarding denomination for respondents who indicated they were "Christian" or "Muslim". For respondents who ticked "Christian" we asked, "What is your denomination?" and for respondents who ticked "Muslim" we asked "Which of the following would you identify with?" and then left a range of possible options which could be ticked such as "Sunni," "Shia," "Sufi" etc.

This is one way of measuring religion, that is, to ask a person if they consider themselves formally affiliated with a particular group. This kind of question has some (serious) limitations, but we'll get to that in a moment.


### "How religious would you say you are?"

We also asked respondents (Q57): "Regardless of whether you belong to a particular religion, how religious would you say you are?" and then provided a sliding scale from 0 (not religious at all) to 10 (very religious). Seen in this way, we had a tradition-neutral measurement of religious intensity.


### Participation in Worship

We included some classic indicators about how often respondents go to worship (Q58): "Apart from weddings, funerals and other special occasions, how often do you attend religious services?" and (Q59): "Apart from when you are at religious services, how often do you pray?"

- More than once a week  (1) 
- Once a week  (2) 
- At least once a month  (3) 
- Only on special holy days  (4) 
- Never  (5)

Each of these measures a particular kind of dimension, and it is interesting to note that sometimes there are stronger correlations between how often a person attends worship services (weekly versus once a year) and a particular view (in the case of our survey on environmental issues), than there is between their affiliation (if they are Christian or Pagan). We'll do some exploratory work shortly to see how this is the case in our sample. 

### Spirituality

We also included a series of questions about spirituality in Q52 and used a slightly overlapping nature relatedness scale Q51. 

You'll find that many surveys will only use one of these forms of question and ignore the rest. I think this is a really bad idea as religious belonging, identity, and spirituality are far too complex to work off a single form of response. We can also test out how these different attributions relate to other demographic features, like interest in politics, economic attainment, etc.

::: {.callout-tip}
### So *Who's* Religious?

As I've already hinted in the previous chapter, measuring religiosity is complicated. I suspect some readers may be wondering something like, "what's the right question to ask?" here. Do we get the most accurate representation by asking people to self-report their religious affiliation? Or is it more accurate to ask individuals to report on how religious they are? Is it, perhaps, better to assume that the indirect query about practice, e.g. how frequently one attends services at a place of worship may be the most reliable proxy?

Highlight challenges of various approaches pointing to literature. 

:::

## Exploring data around religious affiliation:

Let's dive into the data and see how this all works out. We'll start with the question 56 data, around religious affiliation:

```{r}
religious_affiliation <- as_tibble(as_factor(climate_experience_data$Q56))        # <1>
names(religious_affiliation) <- c("response")  # <2>
religious_affiliation <- filter(religious_affiliation, !is.na(response)) # <3>
```

There are few things we need to do here to get the data into initial proper shape. This is often referred to as "cleaning" the data:

1. Because we imported this data from an SPSS `.sav` file format using the R `haven()` library, we need to start by adapting the data into a format that our visualation engine `ggplot` can handle (a dataframe).
2. Next we'll rename the columns so these names are a bit more useful.
3. We need to omit non-responses so these don't mess with the counting (these are `NA` in R)

As in the previous chapter, I've provided sample data here that needs a bit of work. This gives you a chance to see what this all looks like in practice, and offers some examples you can apply later to your own datasets.

If we pause at this point to view the data, you'll see it's basically just a long list of survey responses. What we need is a count of each unique response (these are sometimes called a `factor`). This will take a few more steps:

```{r}
religious_affiliation_sums <- religious_affiliation %>%  
  dplyr::count(response, sort = TRUE) %>% # <1>
  dplyr::mutate(response = forcats::fct_rev(forcats::fct_inorder(response)))    # <2>
religious_affiliation_sums <- religious_affiliation_sums %>% 
  dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = .1, trim = FALSE)) # <3>
```

1. First we generate new a dataframe with sums per category and 
2. ...sort in descending order
3. Then we add new column with percentages based on the sums you've just generated

That should give us a tidy table of results, which you can see if you view the contents of our new `religious_affiliation_sums` dataframe:

```{r}
head(religious_affiliation_sums)
```

We can view this as a bar chart using `ggplot` in ways that are similar to the exercises in the last chapter:

```{r}
# make plot
ggplot(religious_affiliation_sums, aes(x = n, y = response)) +
  geom_col(colour = "white") + 
  ## add percentage labels
  geom_text(aes(label = perc),
            ## make labels left-aligned and white
            hjust = 1, nudge_x = -.5, colour = "white", size=3)
```
You may notice that I've added one feature to our chart that wasn't in the bar charts in chapter 1, text labels with the actual value on each bar using `geom_text`.

You may be thinking about the plots we've just finished in chapter 1 and wondering how they compare. Let's use the same facet approach that we've just used to render this data in a subsetted way.

```{r}
# First we need to add in data on ethnic self-identification from our respondents:
df <- select(climate_experience_data, Q56, Q0)
religious_affiliation_ethnicity <- as_tibble(as_factor(df))
names(religious_affiliation_ethnicity) <- c("Religion", "Ethnicity")

religious_affiliation_ethnicity_sums <- religious_affiliation_ethnicity %>%  
  group_by(Ethnicity) %>%
  dplyr::count(Religion, sort = TRUE) %>% # <1>
  dplyr::mutate(Religion = forcats::fct_rev(forcats::fct_inorder(Religion)))

religious_affiliation_ethnicity_plot <- ggplot(religious_affiliation_ethnicity_sums, aes(x = n, y = Religion)) + geom_col(colour = "white") + facet_wrap(~Ethnicity, scales = "free_x", labeller = label_wrap_gen(width = 24)) + theme(strip.text.x = element_text(size = 8)) + theme(strip.text.y = element_text(size = 6))

religious_affiliation_ethnicity_plot

ggsave("figures/spotlight_religious_affiliation_ethnicity.png", plot=religious_affiliation_ethnicity_plot, width = 8, height = 10, units=c("in"))
```
You'll notice that I've tweaked the display of facet titles a bit here so that the text wraps using `labeller = label_wrap_gen(width = 24)`, since there are a lot of facets here, which are all interesting, I've also reduced the size of text for x- and y- axes using `theme(strip.text.x = element_text()`.


## Working With a Continum: Religiosity and Spirituality

So far we've just worked with bar plots, but there are a lot of other possible visualisations and types of data which demand them.

As I've mentioned above, on this survey we also asked respondents to tell us on by rating themselves on a scale of 0-10 with 0 being "not religious at all" and 10 being "very religious" in response to the question, "Regardless of whether you belong to a particular religion, how religious would you say you are?"

We'll recycle some code from our previous import to bring in the Q57 data:

```{r}
religiosity <- as_tibble(as_factor(climate_experience_data$Q57_1))
names(religiosity) <- c("response")
religiosity <- filter(religiosity, !is.na(response))
religiosity_sums <- religiosity %>%  
  dplyr::count(response) %>% # <1>
  dplyr::mutate(response = forcats::fct_rev(forcats::fct_inorder(response)))
religiosity_sums <- religiosity_sums %>% 
  dplyr::mutate(perc = scales::percent(n / sum(n), accuracy = .1, trim = FALSE))
```

1. Note: we have removed `sort = TRUE` in the above statement as it will enforce sorting the data by quantities rather than the factor order. It wouldn't really make sense to plot this chart in the order of response.

Now, let's plot that data:
```{r}
caption <- "Respondent Religiosity"
ggplot(religiosity_sums, aes(x = response, y = n, color=response)) +
  geom_col(colour = "white", aes(fill = response)) +   # <1>
  ## get rid of all elements except y axis labels + adjust plot margin
  coord_flip() +  # <2>
  theme(plot.margin = margin(rep(15, 4))) +
  labs(caption = caption)
```

1. We've added colors, because colours are fun.
2. Also new here is `coord_flip` to rotate the chart so we have bars going horizontally

### Quick excursus: making things pretty with themes

Since we're thinking about how things look just now, let's play with themes for a minute. `ggplot` is a really powerful tool for visualising information, but it also has some quite nice features for making things look pretty. 

[If you'd like to take a proper deep dive on all this theme stuff, R-Charts has a great set of examples showing you how a number of different theme packages look in practice, ["R-Charts on  Themes"](https://r-charts.com/ggplot2/themes/).]{.aside}

R has a number of built-in themes, but these are mostly driven by functional concerns, such as whether you might want to print your chart or have a less heavy look overall. So for example you might use `theme_light()` in the following way:

```{r}
ggplot(religiosity_sums, aes(x = response, y = n, color=response)) +
  geom_col(colour = "white", aes(fill = response)) +   # <1>
  ## get rid of all elements except y axis labels + adjust plot margin
  coord_flip() +  # <1>
  theme(plot.margin = margin(rep(15, 4))) +
  labs(caption = caption) +
  theme_light()
```

You can also use additional packages like `ggthemes()` or `hrbrthemes()` so for example we might want to try the `pander` theme which has it's own special (and very cheerful) colour palette. 

```{r}
library(ggthemes)  |> suppressPackageStartupMessages()
ggplot(religiosity_sums, aes(x = response, y = n, color=response)) + geom_col(colour = "white", aes(fill = response)) + coord_flip() + theme(plot.margin = margin(rep(15, 4))) + labs(caption = caption) + 
  theme_pander() +
  scale_fill_pander()
```

Or, you might try the well-crafted typgraphy from `hbrthemes` in the `theme_ipsum_pub` theme:

Note: this library will expect your system to have certain fonts installed and available for RStudio to use. You may want to run the following command to import your system fonts to R so they are available: `extrafont::font_import()`. This will take a bit of time to run. The package will also save fonts to a folder on your PC so you can install them if you don't already have them, you can run `import_public_sans()` to get the path for these files and install in R.

```{r}
library(hrbrthemes)  |> suppressPackageStartupMessages()
ggplot(religiosity_sums, aes(x = response, y = n, color=response)) + geom_col(colour = "white", aes(fill = response)) + coord_flip() + theme(plot.margin = margin(rep(15, 4))) + labs(caption = caption) + 
  theme_ipsum_pub() +
  scale_fill_pander()
```
We're going to come back to this chart, but let's set it to one side for a moment and build up a visualisation of an adjacent measure we used in this study which focussed on spirituality.

::: {.callout-tip}
### What is the difference between Spirituality and Religion?

Though the terms can tend to be used interchangeable in many cases, some scholars in religious studies and psychology have sought to develop the concept (and measurement of) spirituality as a counterpoint to religion. In some cases, scholars argue that religion is extrinsic (something outside us that we participate in) and spirituality is intrinsic (something inside ourselves that we engage with). Another way of contrasting the two concepts is to suggest that religion is social whereas spirituality is personal. As Hodge puts it, “spirituality refers to an individual’s relationship with God (or perceived Transcendence), while religion is defined as a particular set of beliefs, practices, and rituals that have been developed in community by people who share similar exis- tential experiences of transcendent reality.” Of course, as you'll have noticed, there are many people who think of themselves as religious, but are opposed to participation in a formal religious tradition, or a social institution like a church, mosque, or denomination. So these differentiations can't be sharply made in a conclusive way. And it's likely that many respondents will have their own way to relate to these terms, whether it is affection or aversion.

:::

For our study, we made use of a six-item intrinsic spirituality scale that was developed by David R. Hodge which is based on another instrument intended to measure "intrinsic religion" by Allport and Ross (1967). These researchers developed a series of questions which they asked respondents in a survey. The advantage here is that you're getting at the question of spirituality from a lot of different angles and then you combine the scores from all the questions to get a mean "spirituality score". There are many other ways that psychologists have developed to measure intrinsic religion or spirituality, and we'd encourage you to try them out (there are some references to get you started in Appendix B).

```{r}
### Spirituality scale  --------------------------------------------------------------
# Calculate overall mean spirituality score based on six questions:
climate_experience_data$Q52_score <- rowMeans(select(climate_experience_data, Q52a_1:Q52f_1))
# Calculate overall mean nature relatedness score based on six questions:
climate_experience_data$Q51_score <- rowMeans(select(climate_experience_data, Q51_heritage:Q51_remote_vacation))

```


Like we did in chapter 1, let's start by exploring the data and get a bit of a sense of the character of the responses overall. One good place to start is to find out the mean response for our two continum questions. We can start with religiosity:

```{r}
mean(climate_experience_data$Q57_1)
```
Now let's compare this with the overall mean score for our whole survey pool around spirituality:

```{r}
mean(climate_experience_data$Q52_score)
```


```{r}
# t_testing and means

# Calculate the Pearson correlation coefficient, 
# Positive correlation: If r is close to +1, it indicates a strong positive linear relationship.
# Negative correlation: If r is close to -1, it indicates a strong negative linear relationship.
# No correlation: If r is close to 0, there is no linear relationship.
# The closer the value of r is to -1 or +1, the stronger the correlation.

# Religious intensity to happiness - minimal positive
cor(climate_experience_data$Q57_1, climate_experience_data$Q49)
# Religious intensity to life satisfaction - minimal positive
cor(climate_experience_data$Q57_1, climate_experience_data$Q50)
# Religious intensity to spirituality - strong positive
cor(climate_experience_data$Q57_1, climate_experience_data$Q52_score)
# Religious intensity to interest in nature relatedness/spirituality - minimal positive
cor(climate_experience_data$Q57_1, climate_experience_data$Q51_spirituality)
# Religious intensity to politics - strong positive
cor(climate_experience_data$Q57_1, climate_experience_data$Q54)
# Religious intensity to participation in services - strong positive (because reverse in scales)
cor(climate_experience_data$Q57_1, climate_experience_data$Q58)
# Religious intensity to participation in activity - even stronger positive (because reverse in scales)
cor(climate_experience_data$Q57_1, climate_experience_data$Q59)

cor.test(climate_experience_data$Q57_1, climate_experience_data$Q59)
p_value <- result$p.value
# Format the p-value without scientific notation
format(p_value, scientific = FALSE)

# Religious intensity to happiness - minimal positive
cor(climate_experience_data$Q52_score, climate_experience_data$Q49)
# Religious intensity to life satisfaction - minimal positive
cor(climate_experience_data$Q52_score, climate_experience_data$Q50)
# Religious intensity to spirituality - strong positive
cor(climate_experience_data$Q52_score, climate_experience_data$Q57_1)
# Religious intensity to interest in politics - very minimal positive
cor(climate_experience_data$Q52_score, climate_experience_data$Q51_spirituality)
# Religious intensity to nature relatedness - strong positive
cor(climate_experience_data$Q52_score, climate_experience_data$Q54)
# Religious intensity to participation in services - strong positive (because reverse in scales)
cor(climate_experience_data$Q52_score, climate_experience_data$Q58)
# Religious intensity to participation in activity - even stronger positive (because reverse in scales)
cor(climate_experience_data$Q57_1, climate_experience_data$Q59)


religiosity - Q57_1
spirituality - Q52_score
nature relatedness - Q51 
attendance at worship - Q58
prayer - Q59 - from never = 5 to lots=1

sample_size <- length(climate_experience_data$Q57_1)
t_score <- correlation_coefficient * sqrt(sample_size - 2) / sqrt(1 - correlation_coefficient^2)
pt(t_score, df = sample_size - 2, lower.tail = FALSE) * 2

# Assuming you have already performed the t-test
result <- t.test(climate_experience_data$Q57_1, climate_experience_data$Q58)

# Extract the p-value
p_value <- result$p.value

# Format the p-value without scientific notation
formatted_p_value <- format(p_value, scientific = FALSE)

# Print the formatted p-value
print(formatted_p_value)


# Spirituality scale
library(rstatix)
religiosity_stats <- as.tibble(climate_experience_data$Q57_1) 
spirituality_stats <- as.tibble(climate_experience_data$Q52_score)

plot(religiosity_stats ~ spirituality_stats, data=CO2)

stats %>% get_summary_stats(value, type="mean_sd")


# JK note to self: need to fix stat_summary plot here
# stat_summary(climate_experience_data$Q52_score)


# Q57 Regardless of whether you belong to a particular religion, how religious would you say you are?
# 0-10, Not religious at all => Very religious; mean=5.58

mean(climate_experience_data$Q57_1) # religiosity

# Q58 Apart from weddings, funerals and other special occasions, how often do you attend religious services?
# coded at 1-5, lower value = stronger mean=3.439484

mean(climate_experience_data$Q58) # service attendance

# Q59 Apart from when you are at religious services, how often do you pray?
# coded at 1-5, lower = stronger mean=2.50496

mean(climate_experience_data$Q59)
```

Now let's try out some visualisations:

```{r}
## Q52 Spirituality data ------------------------

q52_data <- select(climate_experience_data, Q52a_1:Q52f_1)
# Data is at wide format, we need to make it 'tidy' or 'long'
q52_data <- q52_data %>% 
  gather(key="text", value="value") %>%
  # rename columns
  mutate(text = gsub("Q52_", "",text, ignore.case = TRUE)) %>%
  mutate(value = round(as.numeric(value),0))

# Change names of rows to question text
q52_data <- q52_data %>% 
  gather(key="text", value="value") %>%
  # rename columns
  mutate(text = gsub("Q52a_1", "In terms of questions I have about my life, my spirituality answers...",text, ignore.case = TRUE)) %>%
  mutate(text = gsub("Q52b_1", "Growing spiritually is important...",text, ignore.case = TRUE)) %>%
  mutate(text = gsub("Q52c_1", "When I’m faced with an important decision, spirituality plays a role...",text, ignore.case = TRUE)) %>%
  mutate(text = gsub("Q52d_1", "Spirituality is part of my life...",text, ignore.case = TRUE)) %>%
  mutate(text = gsub("Q52e_1", "When I think of things that help me grow and mature as a person, spirituality has an effect on my personal growth...",text, ignore.case = TRUE)) %>%
  mutate(text = gsub("Q52f_1", "My spiritual beliefs affect aspects of my life...",text, ignore.case = TRUE))

# Plot
# Used for gradient colour schemes, as with violin plots
library(viridis) 

q52_plot <- q52_data %>%
  mutate(text = fct_reorder(text, value)) %>% # Reorder data
  ggplot( aes(x=text, y=value, fill=text, color=text)) +
  geom_boxplot() +
  scale_fill_viridis(discrete=TRUE, alpha=0.8) +
  geom_jitter(color="black", size=0.2, alpha=0.2) +
  theme_ipsum() +
  theme(legend.position="none", axis.text.y = element_text(size = 8)) +
  coord_flip() + # This switch X and Y axis and allows to get the horizontal version
  xlab("") +
  ylab("Spirituality scales") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 45))

# using gridExtra to specify explicit dimensions for printing
q52_plot
ggsave("figures/q52_boxplot.png", width = 20, height = 10, units = "cm")
```

There's an enhanced version of this plot we can use, called `ggstatsplot()` to get a different view:

```{r}
# As an alternative trying ggstatsplot:
library(rstantools)
library(ggstatsplot)
q52_plot_alt <- ggbetweenstats(
  data = q52_data,
  x = text,
  y = value,
  outlier.tagging  = TRUE,
  title = "Intrinsic Spirituality Scale Responses"
) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 30)) +
  # Customizations
  theme(
    # Change fonts in the plot
    text = element_text(family = "Helvetica", size = 8, color = "black"),
    plot.title = element_text(
      family = "Abril Fatface", 
      size = 20,
      face = "bold",
      color = "#2a475e"
    ),
    # Statistical annotations below the main title
    plot.subtitle = element_text(
      family = "Helvetica", 
      size = 12, 
      face = "bold",
      color="#1b2838"
    ),
    plot.title.position = "plot", # slightly different from default
    axis.text = element_text(size = 10, color = "black"),
    axis.text.x = element_text(size = 7),
    axis.title = element_text(size = 12),
    axis.line = element_line(colour = "grey50"),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid = element_line(color = "#b4aea9"),
    panel.grid.major.y = element_line(linetype = "dashed"),
    panel.background = element_rect(fill = "#fbf9f4", color = "#fbf9f4"),
    plot.background = element_rect(fill = "#fbf9f4", color = "#fbf9f4")
  )

q52_plot_alt
ggsave("figures/q52_plot_alt.png", width = 20, height = 12, units = "cm")

```

One thing that might be interesting to test here is whether spirituality and religiosity are similar for our respondents.

```{r}
ggplot(climate_experience_data, aes(x=Q52_score, y=Q57_1)) + labs(x="Spirituality Scale Score", y = "How Religious?") +
  geom_point(size=1, alpha=0.3) + geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)

# using http://sthda.com/english/wiki/ggplot2-scatter-plots-quick-start-guide-r-software-and-data-visualization

ggplot(climate_experience_data, aes(x=Q52_score, y=Q57_1)) +
  labs(x="Spirituality Scale Score", y = "How Religious?") +
  geom_point(size=1, alpha=0.3) + stat_density_2d(aes(fill = ..level..), geom="polygon", alpha=0.3)+
  scale_fill_gradient(low="blue", high="red") +
  theme_minimal()
```

Because the responses to these two questions, about spirituality and religiosity are on a continuum, we can also use them, like we did in previous charts, to subset other datasets. A simple way of doing this is to separate our respondents into "high," "medium," and "low" bins for the two questions. Rather than working with hard values, like assigning 0-3, 4-6 and 7-10 for low medium and high, we'll work with the range of values that respondents actually chose. This is particularly appropriate as the median answer to these questions was not "5". So we'll use the statistical concept of standard deviation, which R can calculate almost magically for us, in the following way:

```{r}
# Create low/med/high bins based on Mean and +1/-1 Standard Deviation
climate_experience_data <- climate_experience_data %>%
  mutate(
    Q52_bin = case_when(
      Q52_score > mean(Q52_score) + sd(Q52_score) ~ "high",
      Q52_score < mean(Q52_score) - sd(Q52_score) ~ "low",
      TRUE ~ "medium"
    ) %>% factor(levels = c("low", "medium", "high"))
  )


## Q57 subsetting based on Religiosity --------------------------------------------------------------
climate_experience_data <- climate_experience_data %>%
  mutate(
    Q57_bin = case_when(
      Q57_1 > mean(Q57_1) + sd(Q57_1) ~ "high",
      Q57_1 < mean(Q57_1) - sd(Q57_1) ~ "low",
      TRUE ~ "medium"
    ) %>% factor(levels = c("low", "medium", "high"))
  )

```


As in the previous chapter, it's useful to explore multiple factors when possible. So I'd like us to take the data about political affiliation to visualise alongside our religion and spirituality data. this will help us to see where effects are more or less significant and give us a point of comparison.

```{r}
## Q53 subsetting based on Political LR orientation --------------------------------------------------------------
# Generate low/med/high bins based on Mean and SD
climate_experience_data <- climate_experience_data %>%
  mutate(
    Q53_bin = case_when(
      Q53_1 > mean(Q53_1) + sd(Q53_1) ~ "high",
      Q53_1 < mean(Q53_1) - sd(Q53_1) ~ "low",
      TRUE ~ "medium"
    ) %>% factor(levels = c("low", "medium", "high"))
  )

```


Now let's use those bins to explore some of the responses about attitudes towards climate change:

```{r}
# Faceted plot working with 3x3 grid
df <- select(climate_experience_data, Q52_bin, Q53_bin, Q57_bin, Q58)
names(df) <- c("Q52_bin", "Q53_bin", "Q57_bin", "response")
facet_names <- c(`Q52_bin` = "Spirituality", `Q53_bin` = "Politics L/R", `Q57_bin` = "Religiosity", `low`="low", `medium`="medium", `high`="high")
facet_labeller <- function(variable,value){return(facet_names[value])}
df$response <- factor(df$response, ordered = TRUE, levels = c("1", "2", "3", "4", "5"))
df$response <- fct_recode(df$response, "More than once a week" = "1", "Once a week" = "2", "At least once a month" = "3", "Only on special holy days" = "4", "Never" = "5")
df %>% 
  # we need to get the data including facet info in long format, so we use pivot_longer()
  pivot_longer(!response, names_to = "bin_name", values_to = "b") %>% 
  # add counts for plot below
  count(response, bin_name, b) %>%
  group_by(bin_name,b) %>%
  mutate(perc=paste0(round(n*100/sum(n),1),"%")) %>% 
  # run ggplot
  ggplot(aes(x = n, y = "", fill = response)) +
  geom_col(position=position_fill(), aes(fill=response)) +
  geom_text(aes(label = perc), position = position_fill(vjust=.5), size=2) +
  scale_fill_brewer(palette = "Dark2", type = "qual") +
  scale_x_continuous(labels = scales::percent_format()) +
  facet_grid(vars(b), vars(bin_name), labeller=as_labeller(facet_names)) + 
  labs(caption = caption, x = "", y = "") + 
  guides(fill = guide_legend(title = NULL))
ggsave("figures/q58_faceted.png", width = 30, height = 10, units = "cm")
```

<!--
Use mutate to put "prefer not to say" at the bottom
# Info here: https://r4ds.had.co.nz/factors.html#modifying-factor-levels


# Q56 follow-ups
caption <- "Christian Denomination"
# TODO: copy plot above for Q56 to add two additional plots using climate_experience_data_named$Q56b and climate_experience_data_named$Q56c
# Religious Affiliation b - Christian Denomination Subquestion
christian_denomination <- qualtrics_process_single_multiple_choice(climate_experience_data_named$Q56b)
christian_denomination_table <- chart_single_result_flextable(climate_experience_data_named$Q56b, desc(Count))
christian_denomination_table
save_as_docx(christian_denomination_table, path = "./figures/q56_religious_affiliation_xn_denomination.docx")

christian_denomination_hi <- filter(climate_experience_data_named, Q56 == "Christian", Q57_bin == "high")
christian_denomination_hi <- qualtrics_process_single_multiple_choice(christian_denomination_hi$Q56b)
christian_denomination_hi

# Religious Affiliation c - Muslim Denomination Subquestion
caption <- "Islamic Identity"
# Should the label be different than income since the data examined is the Affiliation?
# TODO: adjust plot to factor using numbered responses on this question (perhaps also above)
religious_affiliationc <- qualtrics_process_single_multiple_choice(climate_experience_data_named$Q56c)
religious_affiliationc_plot <- plot_horizontal_bar(religious_affiliationc)
religious_affiliationc_plot <- religious_affiliationc_plot + labs(caption = caption, x = "", y = "")
religious_affiliationc_plot
ggsave("figures/q56c_religious_affiliation.png", width = 20, height = 10, units = "cm")
religious_affiliationc_table <- chart_single_result_flextable(climate_experience_data_named$Q56c, Count)
religious_affiliationc_table
save_as_docx(religious_affiliationc_table, path = "./figures/q56_religious_affiliation_islam.docx")


# Q58

caption <- "Respondent Attendance of Religious Services"
religious_service_attend <- qualtrics_process_single_multiple_choice(climate_experience_data_named$Q58)
religious_service_attend_plot <- plot_horizontal_bar(religious_service_attend)
religious_service_attend_plot <- religious_service_attend_plot + labs(title = caption, x = "", y = "")
religious_service_attend_plot
ggsave("figures/q58_religious_service_attend.png", width = 20, height = 10, units = "cm")
religious_service_attend_table <- chart_single_result_flextable(climate_experience_data_named$Q58, Count)
religious_service_attend_table
save_as_docx(religious_service_attend_table, path = "./figures/q58_religious_service_attend.docx")

# Faceted plot working with 3x3 grid
df <- select(climate_experience_data, Q52_bin, Q53_bin, Q57_bin, Q58)
names(df) <- c("Q52_bin", "Q53_bin", "Q57_bin", "response")
facet_names <- c(`Q52_bin` = "Spirituality", `Q53_bin` = "Politics L/R", `Q57_bin` = "Religiosity", `low`="low", `medium`="medium", `high`="high")
facet_labeller <- function(variable,value){return(facet_names[value])}
df$response <- factor(df$response, ordered = TRUE, levels = c("1", "2", "3", "4", "5"))
df$response <- fct_recode(df$response, "More than once a week" = "1", "Once a week" = "2", "At least once a month" = "3", "Only on special holy days" = "4", "Never" = "5")
df %>% 
  # we need to get the data including facet info in long format, so we use pivot_longer()
  pivot_longer(!response, names_to = "bin_name", values_to = "b") %>% 
  # add counts for plot below
  count(response, bin_name, b) %>%
  group_by(bin_name,b) %>%
  mutate(perc=paste0(round(n*100/sum(n),1),"%")) %>% 
  # run ggplot
  ggplot(aes(x = n, y = "", fill = response)) +
  geom_col(position=position_fill(), aes(fill=response)) +
  geom_text(aes(label = perc), position = position_fill(vjust=.5), size=2) +
  scale_fill_brewer(palette = "Dark2", type = "qual") +
  scale_x_continuous(labels = scales::percent_format()) +
  facet_grid(vars(b), vars(bin_name), labeller=as_labeller(facet_names)) + 
  labs(caption = caption, x = "", y = "") + 
  guides(fill = guide_legend(title = NULL))
ggsave("figures/q58_faceted.png", width = 30, height = 10, units = "cm")

# Q59

caption <- "Respondent Prayer Outside of Religious Services"
prayer <- qualtrics_process_single_multiple_choice(climate_experience_data_named$Q59)
prayer_plot <- plot_horizontal_bar(prayer)
prayer_plot <- prayer_plot + labs(caption = caption, x = "", y = "")
prayer_plot
ggsave("figures/q59_prayer.png", width = 20, height = 10, units = "cm")
prayer_table <- chart_single_result_flextable(climate_experience_data_named$Q59, Count)
prayer_table
save_as_docx(prayer_table, path = "./figures/q59_prayer.docx")

# Faceted plot working with 3x3 grid
df <- select(climate_experience_data, Q52_bin, Q53_bin, Q57_bin, Q59)
names(df) <- c("Q52_bin", "Q53_bin", "Q57_bin", "response")
facet_names <- c(`Q52_bin` = "Spirituality", `Q53_bin` = "Politics L/R", `Q57_bin` = "Religiosity", `low`="low", `medium`="medium", `high`="high")
facet_labeller <- function(variable,value){return(facet_names[value])}
df$response <- factor(df$response, ordered = TRUE, levels = c("1", "2", "3", "4", "5"))
df$response <- fct_recode(df$response, "More than once a week" = "1", "Once a week" = "2", "At least once a month" = "3", "Only on special holy days" = "4", "Never" = "5")
df %>% 
  # we need to get the data including facet info in long format, so we use pivot_longer()
  pivot_longer(!response, names_to = "bin_name", values_to = "b") %>% 
  # add counts for plot below
  count(response, bin_name, b) %>%
  group_by(bin_name,b) %>%
  mutate(perc=paste0(round(n*100/sum(n),1),"%")) %>% 
  # run ggplot
  ggplot(aes(x = n, y = "", fill = response)) +
  geom_col(position=position_fill(), aes(fill=response)) +
  geom_text(aes(label = perc), position = position_fill(vjust=.5), size=2) +
  scale_fill_brewer(palette = "Dark2", type = "qual") +
  scale_x_continuous(labels = scales::percent_format()) +
  facet_grid(vars(b), vars(bin_name), labeller=as_labeller(facet_names)) + 
  labs(caption = caption, x = "", y = "") + 
  guides(fill = guide_legend(title = NULL))
ggsave("figures/q59_faceted.png", width = 30, height = 10, units = "cm")


# Comparing with attitudes surrounding climate change


# Q6

q6_data <- qualtrics_process_single_multiple_choice_unsorted_streamlined(climate_experience_data$Q6)

title <- "Do you think the climate is changing?"

level_order <- c("Don’t know",
                 "Definitely not changing",
                 "Probably not changing", 
                 "Probably changing",
                 "Definitely changing")
## code if a specific palette is needed for matching
fill = wheel(ochre, num = as.integer(count(q6_data[1])))
# make plot
q6_data_plot <- ggplot(q6_data, aes(x = n, y = response, fill = fill)) +
  geom_col(colour = "white") +
  ## add percentage labels
  geom_text(aes(label = perc),
            ## make labels left-aligned and white
            hjust = 1, colour = "black", size=4) + # use nudge_x = 30, to shift position
  ## reduce spacing between labels and bars
  scale_fill_identity(guide = "none") +
  ## get rid of all elements except y axis labels + adjust plot margin
  theme_ipsum_rc() +
  theme(plot.margin = margin(rep(15, 4))) +
  easy_center_title() + 
  # with thanks for helpful info on doing wrap here: https://stackoverflow.com/questions/21878974/wrap-long-axis-labels-via-labeller-label-wrap-in-ggplot2
  scale_y_discrete(labels = wrap_format(30), limits = level_order) + 
  theme(plot.title = element_text(size =18, hjust = 0.5), axis.text.y = element_text(size =16)) +
  labs(title = title, x = "", y = "")

q6_data_plot 

ggsave("figures/q6.png", width = 18, height = 12, units = "cm")


# Subsetting

## Q57 subsetting based on Religiosity --------------------------------------------------------------
climate_experience_data <- climate_experience_data %>%
  mutate(
    Q57_bin = case_when(
      Q57_1 > mean(Q57_1) + sd(Q57_1) ~ "high",
      Q57_1 < mean(Q57_1) - sd(Q57_1) ~ "low",
      TRUE ~ "medium"
    ) %>% factor(levels = c("low", "medium", "high"))
  )

## Subsetting based on Spirituality --------------------------------------------------------------

### Nature relatedness --------------------------------------------------------------
# Calculate overall mean nature-relatedness score based on six questions:
climate_experience_data$Q51_score <- rowMeans(select(climate_experience_data, Q51_remote_vacation:Q51_heritage))

# Create low/med/high bins based on Mean and +1/-1 Standard Deviation
climate_experience_data <- climate_experience_data %>%
  mutate(
    Q51_bin = case_when(
      Q51_score > mean(Q51_score) + sd(Q51_score) ~ "high",
      Q51_score < mean(Q51_score) - sd(Q51_score) ~ "low",
      TRUE ~ "medium"
    ) %>% factor(levels = c("low", "medium", "high"))
  )


-->

::: {.callout-tip}
### What is Religion?
Content tbd
:::


::: {.callout-tip}
### Hybrid Religious Identity
Content tbd
:::


::: {.callout-tip}
### What is Secularisation?
Content tbd
:::

## References {.unnumbered}

::: {#refs}
:::