mirror of
https://github.com/kidwellj/hacking_religion_textbook.git
synced 2024-11-01 01:12:20 +00:00
49 lines
4.3 KiB
Plaintext
49 lines
4.3 KiB
Plaintext
# Introduction: Hacking Religion
|
|
|
|
## Who this book is for
|
|
|
|
|
|
|
|
## Why this book?
|
|
|
|
|
|
|
|
## The hacker way
|
|
|
|
1. Tell the truth
|
|
|
|
2. Do not deceive using beauty
|
|
|
|
3. Work transparently: research as open code using open data
|
|
|
|
4. Draw others in: produce reproducible research
|
|
|
|
5. Learn by doing
|
|
|
|
|
|
## Why programmatic data science?
|
|
|
|
This isn't just a book about data analysis, I'm proposing an approach which might be thought of as research-as-code, where you write out instructions to execute the various steps of work. The upside of this is that other researchers can learn from your work, correct and build on it as part of the commons. It takes a bit more time to learn and set things up, but the upside is that you'll gain access to a set of tools and a research philosophy which is much more powerful.
|
|
|
|
|
|
## Learning to code: my way
|
|
|
|
|
|
Explain accelerated approach in this book, working from examples and providing exposure to concepts in a streamlined way, pointing to other resources
|
|
|
|
Point to other guides,
|
|
|
|
There are a range of terrific textbooks out there which cover all these elements in greater depth and more slowly. In particular, I'd recommend that many readers will want to check out Hadley Wickham's "R For Data Science" book. I'll include marginal notes in this guide pointing to sections of that book, and a few others which unpack the basic mechanics of R in more detail.
|
|
|
|
|
|
|
|
## Getting set up
|
|
|
|
Every single tool, programming language and data set we refer to in this book is free and open source. These tools have been produced by professionals and volunteers who are passionate about data science and research and want to share it with the world, and in order to do this (and following the "hacker way") they've made these tools freely available. This also means that you aren't restricted to a specific proprietary, expensive, or unavailable piece of software to do this work. I'll make a few opinionated recommendations here based on my own preferences and experience, but it's really up to your own style and approach. In fact, given that this is an open source textbook, you can even propose additions to this chapter explaining other tools you've found that you want to share with others.
|
|
|
|
There are, right now, primarily two languages that statisticians and data scientists use for this kind of programmatic data science: python and R. Each language has its merits and I won't rehash the debates between various factions. For this book, we'll be using the R language. This is, in part, because the R user community and libraries tend to scale a bit better for the work that I'm commending in this book. However, it's entirely possible that one could use python for all these exercises, and perhaps in the future we'll have volume two of this book outlining python approaches to the same operations.
|
|
|
|
Bearing this in mind, the first step you'll need to take is to download and install R. You can find instructions and install packages for a wide range of hardware on the The Comprehensive R Archive Network (or "CRAN"): https://cran.rstudio.com. Once you've installed R, you've got some choices to make about the kind of programming environment you'd like to use. You can just use a plain text editor like `textedit` to write your code and then execute your programs using the R software you've just installed. However, most users, myself included, tend to use an integrated development environment (or "IDE"). This is usually another software package with a guided user interface and some visual elements that make it faster to write and test your code. Some IDE packages, will have built-in reference tools so you can look up options for libraries you use in your code, they will allow you to visualise the results of your code execution, and perhaps most important of all, will enable you to execute your programs line by line so you can spot errors more quickly (we call this "debugging"). The two most popular IDE platforms for R coding at the time of writing this textbook are RStudio and Visual Studio. You should download and try out both and stick with your favourite, as the differences are largely aesthetic. I use a combination of RStudio and an enhanced plain text editor Sublime Text for my coding.
|
|
|
|
Once you have R and your pick of an IDE, you are ready to go! Proceed to the next chapter and we'll dive right in and get started!
|