From 152c1562d8292241c27baf858d6bfc93e2bde519 Mon Sep 17 00:00:00 2001 From: Jeremy Kidwell Date: Mon, 2 Oct 2023 11:28:06 +0100 Subject: [PATCH] added intro to ch. 2 --- hacking_religion/appendix_b.qmd | 1 + hacking_religion/chapter_1.qmd | 7 +++++++ hacking_religion/chapter_2.qmd | 8 ++++++++ hacking_religion/intro.qmd | 5 +---- 4 files changed, 17 insertions(+), 4 deletions(-) diff --git a/hacking_religion/appendix_b.qmd b/hacking_religion/appendix_b.qmd index 6d1f91b..8d24721 100644 --- a/hacking_religion/appendix_b.qmd +++ b/hacking_religion/appendix_b.qmd @@ -9,6 +9,7 @@ ## Python Data Science Books: - [Intro to Cultural Analytics and Python](https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html) - [The Hitchhiker's Guide to Python](https://docs.python-guide.org/) +- (https://pysd-cookbook.readthedocs.io/en/latest/index.html) ## Reproducible Research: - [Anna Krystalli, Putting the into Reproducible Research: Directors Cut](https://annakrystalli.me/talks/r-in-repro-research-dc.html) diff --git a/hacking_religion/chapter_1.qmd b/hacking_religion/chapter_1.qmd index 11f8ea0..5dafbb8 100644 --- a/hacking_religion/chapter_1.qmd +++ b/hacking_religion/chapter_1.qmd @@ -81,18 +81,25 @@ barplot(height=df$value, names=df$key) #### GGPlot ```{r} +# unsorted ggplot(wmids_data, aes(x = key, y = value)) + geom_bar(stat = "identity") +# with sorting added in ggplot(wmids_data, aes(x= reorder(key,-value),value)) + geom_bar(stat ="identity") ``` Clean up chart features ```{r} + ``` +Add time series data for 2001 and 2011 census, change to grouped bar plot: + +https://r-graphics.org/recipe-bar-graph-grouped-bar#discussion-8 + diff --git a/hacking_religion/chapter_2.qmd b/hacking_religion/chapter_2.qmd index 8e15426..1f9000d 100644 --- a/hacking_religion/chapter_2.qmd +++ b/hacking_religion/chapter_2.qmd @@ -1,5 +1,13 @@ # Survey Data: Spotlight Project +In the last chapter we explored some high level data about religion in the UK. This was a census sample, which usually refers to an attempt to get as comprehensive a sample as possible. But this is actually fairly unusual in practice. Depending on how complex a subject is, and how representative we want our data to be, it's much more common to use selective sampling, that is survey responses at n=100 or n=1000 at a maximum. The advantage of a census sample is that you can explore how a wide range of other factors - particularly demographics - intersect with your question. And this can be really valuable in the study of religion, particularly as you will see as we go along that responses to some questions are more strongly correlated to things like economic status or educational attainment than they are to religious affiliation. It can be hard to tell if this is the case unless you have enough of a sample to break down into a number of different kinds of subsets. But census samples are complex and expensive to gather, so they're quite rare in practice. + +For this chapter, I'm going to walk you through a data set that a colleague (Charles Ogunbode) and I collected in 2021. Another problem with smaller, more selective samples is that researchers can often undersample minoritised ethnic groups. This is particularly the case with climate change research. Until the time we conducted this research, there had not been a single study investigating the specific experiences of people of colour in relation to climate change in the UK. Past researchers had been content to work with large samples, and assumed that if they had done 1000 surveys and 50 of these were completed by people of colour, they could "tick" the box. But 5% is actually well below levels of representation in the UK generally, and even more sharply the case for specific communities. And if we bear in mind that non-white respondents are (of course!) a highly heterogenous group, we're even more behind in terms of collecting data that can improve our knowledge. Up until recently researchers just haven't been paying close enough attention to catch the significant neglect of the empirical field that this represents. + +While I've framed my comments above in terms of climate change research, it is also the case that, especially in diverse societies like the USA, Canada, the UK etc., paying attention to non-majority groups and people and communities of colour automatically draws in a strongly religious sample. This is highlighted in one recent study done in the UK, the "[Black British Voices Report](https://www.cam.ac.uk/stories/black-british-voices-report)" in which the researchers observed that "84% of respondents described themselves as religious and/or spiritual". My comments above in terms of controlling for other factors remains important here - these same researchers also note that "despire their significant important to the lives of Black Britons, only 7% of survey respondents reported that their religion was more defining of their identity than their race". + +We've decided to open up access to our data and I'm highlighting it in this book because it's a unique opportunitiy to explore a dataset that emphasises diversity from the start, and by extension, provides some really interesting ways to use data science techniques to explore religion in the UK. + diff --git a/hacking_religion/intro.qmd b/hacking_religion/intro.qmd index 12918b0..785a75a 100644 --- a/hacking_religion/intro.qmd +++ b/hacking_religion/intro.qmd @@ -28,10 +28,7 @@ This isn't just a book about data analysis, I'm proposing an approach which migh ## Learning to code: my way - -Explain accelerated approach in this book, working from examples and providing exposure to concepts in a streamlined way, pointing to other resources - -Point to other guides, +This guide is a little different from other textbooks targetting learning to code. I remember when I was first starting out, I went through a fair few guides, and they all tended to spend about 200 pages on various theoretical bits, how you form an integer, or data structures, subroutines, or whatever, such that it was weeks before I got to actually *do* anything. I know some people, may prefer this approach, but I dramatically prefer a problem-focussed approach to learning. Give me something that is broken, or a problem to solve, which engages the things I want to figure out and the motivation for learning just comes much more naturally. And we know from research in cognitive science that these kinds of problem-focussed approaches can tend to faciliate faster learning and better retention, so it's not just my personal preference, but also justified! It will be helpful for you to be aware of this approach when you get into the book as it explains some of the editorial choices I've made and the way I've structured things. Each chapter focusses on a *problem* which is particularly salient for the use of data science to conduct research into religion. That problem will be my focal point, guiding choices of specific aspects of programming to introduce to you as we work our way around that data set and some of the crucial questions that arise in terms of how we handle it. If you find this approach unsatisfying, luckily there are a number of really terrific guides which lay things out slowly and methodically and I will explicitly signpost some of these along the way so that you can do a "deep dive" when you feel like it. Otherwise, I'll take an accelerated approach to this introduction to data science in R. I expect that you will identify adjacent resources and perhaps even come up with your own creative approaches along the way, which incidentally is how real data science tends to work in practice. There are a range of terrific textbooks out there which cover all these elements in greater depth and more slowly. In particular, I'd recommend that many readers will want to check out Hadley Wickham's "R For Data Science" book. I'll include marginal notes in this guide pointing to sections of that book, and a few others which unpack the basic mechanics of R in more detail.