working on chapter 1

2025-07-18 22:54:10 +00:00 · 2023-09-30 13:37:41 +01:00 · 2023-09-30 13:37:41 +01:00 · 286f263700
commit 286f263700
parent 11e330e3ba
2 changed files with 70 additions and 11 deletions
--- a/hacking_religion/chapter_1.qmd
+++ b/hacking_religion/chapter_1.qmd
@ -1,21 +1,62 @@
 # The 2021 UK Census

-## How to get data
-
-## What is data?
-
 ## Your first project: building a pie chart

-Importing data from a CSV file
+Let's start by importing some data into R. Because R is what is called an object-oriented programming language, we'll always take our information and give it a home inside a named object. There are many different kinds of objects, which you can specify, but usually R will assign a type that seems to fit best.

-Examining data:
+[If you'd like to explore this all in a bit more depth, you can find a very helpful summary in R for Data Science, chapter 8, ["data import"](https://r4ds.hadley.nz/data-import#reading-data-from-a-file).]{.aside}

+In the example below, we're going to read in data from a comma separated value file ("csv") which has rows of information on separate lines in a text file with each column separated by a comma. This is one of the standard plain text file formats. R has a function you can use to import this efficiently called "read.csv". Each line of code in R usually starts with the object, and then follows with instructions on what we're going to put inside it, where that comes from, and how to format it:
+
+
+```{r.hidden}
+# R Setup -----------------------------------------------------------------
+setwd("/Users/kidwellj/gits/hacking_religion_textbook/hacking_religion")
+library(here) # much better way to manage working paths in R across multiple instances
+library(tidyverse)
+here::i_am("chapter_1.qmd")
+
+religion_uk <- read.csv(here("example_data", "census2021-ts030-rgn.csv")) 
 ```
-dat
-head
-tail
+
+### Examining data:
+
+What's in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:
+
+```{r .column-page}
+head(religion_uk)
 ```

+This is actually a fairly ugly table, so I'll use an R tool called kable to give you prettier tables in the future, like this:
+
+```{r}
+knitr::kable(head(religion_uk))
+```
+
+You can see how I've nested the previous command inside the `kable` command. For reference, in some cases when you're working with really complex scripts with many different libraries and functions, they may end up with functions that have the same name. You can specify the library where the function is meant to come from by preceding it with :: as we've done `knitr::` above. The same kind of output can be gotten using `tail`:
+
+```{r}
+knitr::kable(tail(religion_uk))
+```
+
+We use `filter` to pick a single row, in the following way:
+
+```{r}
+# wmids_data <- select(religion_uk, geography=="West Midlands")
+```
+
+Now let's say we want to just work with the data from the West Midlands, and we'd like to omit some of the columns. We can choose a specific range of columns using `select`, like this:
+
+[Some readers will want to pause here and check out Hadley Wickham's "R For Data Science" book, in the section, ["Data visualisation"](https://r4ds.hadley.nz/data-visualize#introduction) to get a fuller explanation of how to explore your data.]{.aside}
+
+```{r}
+wmids_data <- select(religion_uk, no_religion:other)
+```
+
+
+In keeping with my goal to demonstrate data science through examples, we're going to move on to producing some snappy looking charts for this data.
+
+

 <!--
 Reference on callout box syntax here: https://quarto.org/docs/authoring/callouts.html
--- a/hacking_religion/intro.qmd
+++ b/hacking_religion/intro.qmd
@ -21,14 +21,32 @@
 5. Learn by doing


-## Using the R programming language
+## Why programmatic data science?
+
+This isn't just a book about data analysis, I'm proposing an approach which might be thought of as research-as-code, where you write out instructions to execute the various steps of work. The upside of this is that other researchers can learn from your work, correct and build on it as part of the commons. It takes a bit more time to learn and set things up, but the upside is that you'll gain access to a set of tools and a research philosophy which is much more powerful.
+
+
+## Learning to code: my way

-Why R? 

 Explain accelerated approach in this book, working from examples and providing exposure to concepts in a streamlined way, pointing to other resources

 Point to other guides, 

+There are a range of terrific textbooks out there which cover all these elements in greater depth and more slowly. In particular, I'd recommend that many readers will want to check out Hadley Wickham's "R For Data Science" book. I'll include marginal notes in this guide pointing to sections of that book, and a few others which unpack the basic mechanics of R in more detail.
+
+
+
+## Getting set up
+
+Every single tool, programming language and data set we refer to in this book is free and open source. These tools have been produced by professionals and volunteers who are passionate about data science and research and want to share it with the world, and in order to do this (and following the "hacker way") they've made these tools freely available. This also means that you aren't restricted to a specific proprietary, expensive, or unavailable piece of software to do this work. I'll make a few opinionated recommendations here based on my own preferences and experience, but it's really up to your own style and approach. In fact, given that this is an open source textbook, you can even propose additions to this chapter explaining other tools you've found that you want to share with others.
+
+There are, right now, primarily two languages that statisticians and data scientists use for this kind of programmatic data science: python and R. Each language has its merits and I won't rehash the debates between various factions. For this book, we'll be using the R language. This is, in part, because the R user community and libraries tend to scale a bit better for the work that I'm commending in this book. However, it's entirely possible that one could use python for all these exercises, and perhaps in the future we'll have volume two of this book outlining python approaches to the same operations.
+
+Bearing this in mind, the first step you'll need to take is to download and install R. You can find instructions and install packages for a wide range of hardware on the The Comprehensive R Archive Network (or "CRAN"): https://cran.rstudio.com. Once you've installed R, you've got some choices to make about the kind of programming environment you'd like to use. You can just use a plain text editor like `textedit` to write your code and then execute your programs using the R software you've just installed. However, most users, myself included, tend to use an integrated development environment (or "IDE"). This is usually another software package with a guided user interface and some visual elements that make it faster to write and test your code. Some IDE packages, will have built-in reference tools so you can look up options for libraries you use in your code, they will allow you to visualise the results of your code execution, and perhaps most important of all, will enable you to execute your programs line by line so you can spot errors more quickly (we call this "debugging"). The two most popular IDE platforms for R coding at the time of writing this textbook are RStudio and Visual Studio. You should download and try out both and stick with your favourite, as the differences are largely aesthetic. I use a combination of RStudio and an enhanced plain text editor Sublime Text for my coding.
+
+Once you have R and your pick of an IDE, you are ready to go! Proceed to the next chapter and we'll dive right in and get started!
+
 ## Other useful guides:

 [R For Data Science 2e](https://r4ds.hadley.nz/)