<spanclass="menu-text"><spanclass="chapter-number">2</span> <spanclass="chapter-title">Different ways to measure religion using data science</span></span></a>
<spanclass="menu-text"><spanclass="chapter-number">3</span> <spanclass="chapter-title">Mapping churches: geospatial data science</span></span></a>
<spanclass="menu-text"><spanclass="chapter-number">4</span> <spanclass="chapter-title">Data scraping, corpus analysis and wordclouds</span></span></a>
<li><ahref="#learning-to-code-my-way"id="toc-learning-to-code-my-way"class="nav-link"data-scroll-target="#learning-to-code-my-way">Learning to code: my way</a></li>
<li><ahref="#getting-set-up"id="toc-getting-set-up"class="nav-link"data-scroll-target="#getting-set-up">Getting set up</a></li>
<h2class="anchored"data-anchor-id="why-this-book">Why this book?</h2>
<p>Data science is quickly consolidating as a new field, with new tools and user communities emerging every week. At the same time academic research has opened up into new interdisciplinary vistas, with experts crossing over into new fields, transgressing disciplinary boundaries and deploying tools in new and unexpected ways to develop knowledge. There are many gaps yet to be filled, but one which I found to be particularly glaring is the lack of applied data science documentation around the subject of religion. On one hand, scholars who are working with cutting edge theory seldom pick up these emerging tools of data science. On the other hand, data scientists rarely go beyond dabbling in religious themes, leaving quite a lot of really interesting theoretical research untouched. This book aims to bring these two things together: introducing the tools of data science in an applied way, whilst introducing some of the complexities and cutting edge theories which help us to conceptualise and frame our understanding of this knowledge regarding religion in the world around us.</p>
<p>It’s worth emphasising at the outset that this isn’t meant to be a generic data science book. My own training as a researcher lies in the field of religious ethics, and my engagement with digital technology has, from the very start, been a context for exploring matters of personal values and social action. A fair bit of ink has been spilled in books, magazines, blogs and zines unpacking what exactly it means to be a “hacker”. Pressing beyond some of the more superficial cultural stereotypes, I want to explain a bit here about how hacking can be a much more substantial vision for ethical engagement with technology and social transformation.</p>
<p>Back in the 1980s Steven Levy tried to capture some of this in his book “Hackers: Heroes of the Computer Revolution”. As Levy put it, the “hacker ethic” included: (1) sharing, (2) openness, (3) decentralisation, (4) free access to computers and (5) world improvement. The key point here is that hacking isn’t just about writing and breaking code, or testing and finding weaknesses in computer systems and networks. There is often a more substantial underpinning ethical code which dovetails with on-the-surface matters of curiosity and craft.</p>
<p>This emphasis on ethics is especially important when we’re doing data science because this kind of research work will put you in positions of influence. You might think this seems a bit overstated, but it never ceases to amaze me how much bringing a bar chart which succinctly shows a social trend can sway a conversation or decision making process. There is something unusually persuasive that comes with the combination of aesthetics, data and storytelling. I’ve met many people who have come to data science out of a desire to bring about social transformation in some sphere of life. People want to use technology and communication to make the world better. However, it’s possible that this can quickly get out of hand. With this in mind, I’ve found that it can be important to have a clear sense of the convictions that guide your work in this field: a “hacker code” of sorts. Here are the principles that I have settled on in my own practice of “hacking” religion:</p>
<oltype="1">
<li><em>Tell the truth</em>: be candid about your limits, use visualisation responsibly</li>
<li><em>Work transparently</em>: open data, open code</li>
<li><em>Work in community</em>: draw others in by producing reproducible research</li>
<li><em>Work with reality</em> and learn by doing</li>
</ol>
<p>It never ceases to amaze me how often people think that, when they’re working for something they think is important it is acceptable to conceal bad news or amplify good or compelling information beyond its real scope. There are always consequences, eventually. When people realise you’ve been misleading or manipulating them your platform and credibility will evaporate. Good work mixed with bad will all get tossed out. And sometimes, our convictions can lead us beyond an accurate and true apprehension of the situation we are focussed on in research.</p>
<p>Presenting through “facts” an argument can become unnaturally compelling. Wrapping those facts up in something that uses colour, line and shape in a way that is aesthetically pleasing, even beautiful, enhances this allure even further. As you craft your own set of hacker principles, it’s vitally important that you always strive to tell the truth. This includes a willingness to acknowledge the limits of your information, and to share the whole set of information. The easiest way to do this is to work with visualation in a responsible way (I’ll get into this a bit more in Chapter 1) and to open up your data and code to scrutiny. By allowing others to try, criticise, edit, and reappropriate your code and data in their own ways, you contribute to knowledge and help to build up a community of accountability. The upside of this is that it’s also a lot more fun and interesting to work alongside others.</p>
<p>Far too often, scholarly research (and theology) has been criticised for being disconnected from reality, making abstract pie-in-the-sky claims about how life should be lived. When exposed to the uncomfortable pressures of reality, these claims can crumble, or even turn sinister. One of the upsides of working with empirical research is that you have a chance to engage with the real world. For this reason, I love to do ethics in a way that arises - bottom-up - from real world experiences and relationships. There’s also the potential (at least in the best case scenario) that when we make choices based on reliable information drawn from everyday reality our policy and culture can be more resilient and accountable. This also works well with the hacker ethos of “learning by doing” and it’s this approach that guides my approach in this book. This isn’t just a book about data analysis, I’m proposing an approach which might be thought of as research-as-code, where you write out instructions to execute the various steps of work. The upside of this is that other researchers can learn from your work, correct and build on it as part of an intellectual commons. It takes a bit more time to learn and set things up, but the upside is that you’ll gain access to a set of tools and a research philosophy which is much more powerful.</p>
<p>I’ll return to these principles periodically as we work through the coding and data in this book.</p>
<h2class="anchored"data-anchor-id="learning-to-code-my-way">Learning to code: my way</h2>
<p>Alongside these guiding principles, it’s also worth saying a bit about how I like to design teaching and learning. I remember when I was first starting out, and gathered coding manuals to read and learn from. They all tended to spend the first several hundred pages on theory, how you form an integer, data structures, subroutines, the logical structure of algorithms etc. etc. It was usually weeks of reading before I got to actually <em>do</em> anything. I know some people may prefer this approach, but I prefer a problem-focussed approach to learning. Give me something that is broken, or a problem to solve, which engages the things I want to figure out and the motivation for learning just comes much more naturally. And we know from research in cognitive science that these kinds of problem-focussed approaches can tend to faciliate faster learning and better retention. It will be helpful for you to be aware of this approach when you get into the book as it explains some of the editorial choices I’ve made and the way I’ve structured things.</p>
<p>Each chapter focusses on a series of <em>problems</em> which are particularly salient for the use of data science to conduct research into religion. These problems will be my focal point, guiding choices of specific aspects of programming to introduce to you as we work our way around that dataset and some of the crucial questions that arise in terms of how we handle it. If you find this approach unsatisfying, luckily there are a number of really terrific guides which lay things out slowly and methodically and I will explicitly signpost some of these along the way so that you can do a “deep dive” when you feel like it. You can also find a list of resources in Appendix B to this book. Otherwise, I’ll take an accelerated approach to this introduction to data science in R. I expect that you will identify adjacent resources and perhaps even come up with your own creative approaches along the way, which incidentally is how real data science tends to work in practice.</p>
<p>There are a range of terrific textbooks which cover all these elements in greater depth and more slowly. In particular, I’d recommend that many readers will want to check out Hadley Wickham’s “<ahref="https://r4ds.hadley.nz/">R For Data Science</a>” book. I’ll include marginal notes in this guide pointing to sections of that book, and a few others which unpack the basic mechanics of R in more detail.</p>
</section>
<sectionid="getting-set-up"class="level2">
<h2class="anchored"data-anchor-id="getting-set-up">Getting set up</h2>
<p>Every single tool, programming language and data set we refer to in this book is free and open source. These tools have been produced by professionals and volunteers who are passionate about data science and research and want to share it with the world, and in order to do this (and following the “hacker way”) they’ve made these tools freely available. This also means that you aren’t restricted to a specific proprietary, expensive, or unavailable piece of software to do this work. I’ll make a few opinionated recommendations here based on my own preferences and experience, but it’s really up to your own style and approach. In fact, given that this is an open source textbook, you can even propose additions to this chapter online sharing other tools you’ve found that you want to share with others.</p>
<p>There are, right now, primarily two languages that statisticians and data scientists use for this kind of programmatic data science: python and R. Each language has its merits and I won’t rehash the debates between various factions. For this book, we’ll be using the R language. This is, in part, because I’ve found that the R user community and libraries tend to scale a bit better for the work that I’m commending in this book. However, it’s entirely possible that one could use python for all these exercises, and I’ll release a future version of this volume outlining python approaches to hacking religion.</p>
<p>Bearing this in mind, the first step you’ll need to take is to download and install R. You can find instructions and install packages for a wide range of hardware on a key resource online for R programmers: The Comprehensive R Archive Network (or “CRAN”): https://cran.rstudio.com. Once you’ve installed R, you’ve got some choices to make about the kind of programming environment you’d like to use. You can just use a plain text editor like <code>textedit</code> to write your code and then execute your programs using the R software you’ve just installed. However, most users, myself included, tend to use an integrated development environment (or “IDE”). This is usually another software package with a guided user interface and some visual elements that make it faster to write and test your code. Some IDE packages will have built-in reference tools so you can look up options for libraries you use in your code. They will allow you to visualise the results of your code execution and perhaps most important of all, will enable you to execute your programs line by line so you can spot errors more quickly (we call this “debugging”). The two most popular IDE platforms for R coding at the time of writing this textbook are RStudio and Visual Studio. You should download and try out both and stick with your favourite, as the differences are largely aesthetic. I use a combination of RStudio and an enhanced plain text editor “Sublime Text” for my coding.</p>
<p>Once you have R and your pick of an IDE, you are ready to go! Proceed to the next chapter and we’ll dive right in and get started!</p>
<ahref="./chapter_1.html"class="pagination-link"aria-label="<span class='chapter-number'>1</span> <span class='chapter-title'>Set up local workspace:</span>">
<spanclass="nav-page-text"><spanclass="chapter-number">1</span> <spanclass="chapter-title">Set up local workspace:</span></span><iclass="bi bi-arrow-right-short"></i>