updating headings hierarchy in ch1
|
@ -7,7 +7,7 @@
|
|||
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
|
||||
|
||||
|
||||
<title>Hacking Religion: TRS & Data Science in Action - 1 Preamble</title>
|
||||
<title>Hacking Religion: TRS & Data Science in Action - 1 Set up local workspace:</title>
|
||||
<style>
|
||||
code{white-space: pre-wrap;}
|
||||
span.smallcaps{font-variant: small-caps;}
|
||||
|
@ -134,7 +134,7 @@ div.csl-indent {
|
|||
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
|
||||
<i class="bi bi-layout-text-sidebar-reverse"></i>
|
||||
</button>
|
||||
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./chapter_1.html"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></a></li></ol></nav>
|
||||
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./chapter_1.html"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></a></li></ol></nav>
|
||||
<a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
|
||||
</a>
|
||||
<button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
|
||||
|
@ -168,7 +168,7 @@ div.csl-indent {
|
|||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="./chapter_1.html" class="sidebar-item-text sidebar-link active">
|
||||
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></span></a>
|
||||
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
|
@ -217,19 +217,17 @@ div.csl-indent {
|
|||
<h2 id="toc-title">Table of contents</h2>
|
||||
|
||||
<ul>
|
||||
<li><a href="#the-2021-uk-census" id="toc-the-2021-uk-census" class="nav-link active" data-scroll-target="#the-2021-uk-census"><span class="header-section-number">2</span> The 2021 UK Census</a>
|
||||
<li><a href="#introducing-the-2021-uk-census" id="toc-introducing-the-2021-uk-census" class="nav-link active" data-scroll-target="#introducing-the-2021-uk-census"><span class="header-section-number">2</span> Introducing the 2021 UK Census</a></li>
|
||||
<li><a href="#getting-started-with-uk-census-data" id="toc-getting-started-with-uk-census-data" class="nav-link" data-scroll-target="#getting-started-with-uk-census-data"><span class="header-section-number">3</span> Getting started with UK Census data</a></li>
|
||||
<li><a href="#examining-data" id="toc-examining-data" class="nav-link" data-scroll-target="#examining-data"><span class="header-section-number">4</span> Examining data:</a></li>
|
||||
<li><a href="#parsing-and-exploring-your-data" id="toc-parsing-and-exploring-your-data" class="nav-link" data-scroll-target="#parsing-and-exploring-your-data"><span class="header-section-number">5</span> Parsing and Exploring your data</a></li>
|
||||
<li><a href="#making-your-first-data-visulation-the-humble-bar-chart" id="toc-making-your-first-data-visulation-the-humble-bar-chart" class="nav-link" data-scroll-target="#making-your-first-data-visulation-the-humble-bar-chart"><span class="header-section-number">6</span> Making your first data visulation: the humble bar chart</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#getting-started-with-uk-census-data" id="toc-getting-started-with-uk-census-data" class="nav-link" data-scroll-target="#getting-started-with-uk-census-data"><span class="header-section-number">2.1</span> Getting started with UK Census data</a></li>
|
||||
<li><a href="#examining-data" id="toc-examining-data" class="nav-link" data-scroll-target="#examining-data"><span class="header-section-number">2.2</span> Examining data:</a></li>
|
||||
<li><a href="#parsing-and-exploring-your-data" id="toc-parsing-and-exploring-your-data" class="nav-link" data-scroll-target="#parsing-and-exploring-your-data"><span class="header-section-number">2.3</span> Parsing and Exploring your data</a></li>
|
||||
<li><a href="#making-your-first-data-visulation-the-humble-bar-chart" id="toc-making-your-first-data-visulation-the-humble-bar-chart" class="nav-link" data-scroll-target="#making-your-first-data-visulation-the-humble-bar-chart"><span class="header-section-number">2.4</span> Making your first data visulation: the humble bar chart</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#base-r" id="toc-base-r" class="nav-link" data-scroll-target="#base-r"><span class="header-section-number">2.4.1</span> Base R</a></li>
|
||||
<li><a href="#ggplot" id="toc-ggplot" class="nav-link" data-scroll-target="#ggplot"><span class="header-section-number">2.4.2</span> GGPlot</a></li>
|
||||
</ul></li>
|
||||
<li><a href="#is-your-chart-accurate-telling-the-truth-in-data-science" id="toc-is-your-chart-accurate-telling-the-truth-in-data-science" class="nav-link" data-scroll-target="#is-your-chart-accurate-telling-the-truth-in-data-science"><span class="header-section-number">2.5</span> Is your chart accurate? Telling the truth in data science</a></li>
|
||||
<li><a href="#multifactor-visualisation" id="toc-multifactor-visualisation" class="nav-link" data-scroll-target="#multifactor-visualisation"><span class="header-section-number">2.6</span> Multifactor Visualisation</a></li>
|
||||
<li><a href="#base-r" id="toc-base-r" class="nav-link" data-scroll-target="#base-r"><span class="header-section-number">6.1</span> Base R</a></li>
|
||||
<li><a href="#ggplot" id="toc-ggplot" class="nav-link" data-scroll-target="#ggplot"><span class="header-section-number">6.2</span> GGPlot</a></li>
|
||||
</ul></li>
|
||||
<li><a href="#telling-the-truth-in-data-science-is-your-chart-accurate" id="toc-telling-the-truth-in-data-science-is-your-chart-accurate" class="nav-link" data-scroll-target="#telling-the-truth-in-data-science-is-your-chart-accurate"><span class="header-section-number">7</span> Telling the truth in data science: Is your chart accurate?</a></li>
|
||||
<li><a href="#multifactor-visualisation" id="toc-multifactor-visualisation" class="nav-link" data-scroll-target="#multifactor-visualisation"><span class="header-section-number">8</span> Multifactor Visualisation</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
</div>
|
||||
|
@ -238,7 +236,7 @@ div.csl-indent {
|
|||
|
||||
<header id="title-block-header" class="quarto-title-block default">
|
||||
<div class="quarto-title">
|
||||
<h1 class="title"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></h1>
|
||||
<h1 class="title"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></h1>
|
||||
</div>
|
||||
|
||||
|
||||
|
@ -264,22 +262,22 @@ div.csl-indent {
|
|||
<div class="cell-output cell-output-stderr">
|
||||
<pre><code>here() starts at /Users/kidwellj/gits/hacking_religion_textbook/hacking_religion</code></pre>
|
||||
</div>
|
||||
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Set up local workspace:</span></span>
|
||||
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"data"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
|
||||
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"data"</span>) </span>
|
||||
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>}</span>
|
||||
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"figures"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
|
||||
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"figures"</span>) </span>
|
||||
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>}</span>
|
||||
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"derivedData"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
|
||||
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"derivedData"</span>)</span>
|
||||
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"data"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
|
||||
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"data"</span>) </span>
|
||||
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>}</span>
|
||||
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"figures"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
|
||||
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"figures"</span>) </span>
|
||||
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>}</span>
|
||||
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"derivedData"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
|
||||
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"derivedData"</span>)</span>
|
||||
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
<section id="the-2021-uk-census" class="level1 page-columns page-full" data-number="2">
|
||||
<h1 data-number="2"><span class="header-section-number">2</span> The 2021 UK Census</h1>
|
||||
<section id="introducing-the-2021-uk-census" class="level1" data-number="2">
|
||||
<h1 data-number="2"><span class="header-section-number">2</span> Introducing the 2021 UK Census</h1>
|
||||
<p>For our first exercise in this book, we’re going to work with a census dataset. As you’ll see by contrast in chapter 2, census data is intended to represent as fully as possible the demographic features of a specific community, in this case, the United Kingdom. We might assume that a large-scale survey given to 1000 or more respondents and distributed appropriately across a variety of demographics will approximate the results of a census, but there’s really no substitite for a survey which has been given to (nearly) the entire population. This also allows us to compare a number of different subsets, as we’ll explore further below. The big question that we’re confronting in this chapter is how best to represent religious belonging and participation at such a large scale, and to flag up some of the hidden limitations in this seemingly comprehensive dataset.</p>
|
||||
<section id="getting-started-with-uk-census-data" class="level2 page-columns page-full" data-number="2.1">
|
||||
<h2 data-number="2.1" class="anchored" data-anchor-id="getting-started-with-uk-census-data"><span class="header-section-number">2.1</span> Getting started with UK Census data</h2>
|
||||
</section>
|
||||
<section id="getting-started-with-uk-census-data" class="level1 page-columns page-full" data-number="3">
|
||||
<h1 data-number="3"><span class="header-section-number">3</span> Getting started with UK Census data</h1>
|
||||
<p>Let’s start by importing some data into R. Because R is what is called an object-oriented programming language, we’ll always take our information and give it a home inside a named object. There are many different kinds of objects, which you can specify, but usually R will assign a type that seems to fit best, often a table of data which looks a bit like a spreadsheet which is called a <code>dataframe</code>.</p>
|
||||
<div class="page-columns page-full"><p></p><div class="no-row-height column-margin column-container"><span class="margin-aside">If you’d like to explore this all in a bit more depth, you can find a very helpful summary in R for Data Science, chapter 8, <a href="https://r4ds.hadley.nz/data-import#reading-data-from-a-file">“data import”</a>.</span></div></div>
|
||||
<p>In the example below, we’re going to begin by reading in data from a comma separated value file (“csv”) which has rows of information on separate lines in a text file with each column separated by a comma. This is one of the standard plain text file formats. R has a function you can use to import this efficiently called <code>read.csv</code>. Each line of code in R usually starts with the object, and then follows with instructions on what we’re going to put inside it, where that comes from, and how to format it:</p>
|
||||
|
@ -287,8 +285,8 @@ div.csl-indent {
|
|||
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>uk_census_2021_religion <span class="ot"><-</span> <span class="fu">read.csv</span>(<span class="fu">here</span>(<span class="st">"example_data"</span>, <span class="st">"census2021-ts030-rgn.csv"</span>)) </span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="examining-data" class="level2" data-number="2.2">
|
||||
<h2 data-number="2.2" class="anchored" data-anchor-id="examining-data"><span class="header-section-number">2.2</span> Examining data:</h2>
|
||||
<section id="examining-data" class="level1" data-number="4">
|
||||
<h1 data-number="4"><span class="header-section-number">4</span> Examining data:</h1>
|
||||
<p>What’s in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:</p>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(uk_census_2021_religion)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
|
@ -550,8 +548,8 @@ div.csl-indent {
|
|||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="parsing-and-exploring-your-data" class="level2 page-columns page-full" data-number="2.3">
|
||||
<h2 data-number="2.3" class="anchored" data-anchor-id="parsing-and-exploring-your-data"><span class="header-section-number">2.3</span> Parsing and Exploring your data</h2>
|
||||
<section id="parsing-and-exploring-your-data" class="level1 page-columns page-full" data-number="5">
|
||||
<h1 data-number="5"><span class="header-section-number">5</span> Parsing and Exploring your data</h1>
|
||||
<p>The first thing you’re going to want to do is to take a smaller subset of a large data set, either by filtering out certain columns or rows. Now let’s say we want to just work with the data from the West Midlands, and we’d like to omit some of the columns. We can choose a specific range of columns using <code>select</code>, like this:</p>
|
||||
<p>You can use the <code>filter</code> command to do this. To give an example, <code>filter</code> can pick a single row in the following way:</p>
|
||||
<div class="cell">
|
||||
|
@ -561,16 +559,16 @@ div.csl-indent {
|
|||
<div class="page-columns page-full"><p></p><div class="no-row-height column-margin column-container"><span class="margin-aside">Some readers will want to pause here and check out Hadley Wickham’s “R For Data Science” book, in the section, <a href="https://r4ds.hadley.nz/data-visualize#introduction">“Data visualisation”</a> to get a fuller explanation of how to explore your data.</span></div></div>
|
||||
<p>In keeping with my goal to demonstrate data science through examples, we’re going to move on to producing some snappy looking charts for this data.</p>
|
||||
</section>
|
||||
<section id="making-your-first-data-visulation-the-humble-bar-chart" class="level2 page-columns page-full" data-number="2.4">
|
||||
<h2 data-number="2.4" class="anchored" data-anchor-id="making-your-first-data-visulation-the-humble-bar-chart"><span class="header-section-number">2.4</span> Making your first data visulation: the humble bar chart</h2>
|
||||
<section id="making-your-first-data-visulation-the-humble-bar-chart" class="level1 page-columns page-full" data-number="6">
|
||||
<h1 data-number="6"><span class="header-section-number">6</span> Making your first data visulation: the humble bar chart</h1>
|
||||
<p>We’ve got a nice lean set of data, so now it’s time to visualise this. We’ll start by making a pie chart:</p>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>uk_census_2021_religion_wmids <span class="ot"><-</span> uk_census_2021_religion_wmids <span class="sc">%>%</span> <span class="fu">select</span>(no_religion<span class="sc">:</span>no_response)</span>
|
||||
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>uk_census_2021_religion_wmids <span class="ot"><-</span> <span class="fu">gather</span>(uk_census_2021_religion_wmids)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
</div>
|
||||
<p>There are two basic ways to do visualisations in R. You can work with basic functions in R, often called “base R” or you can work with an alternative library called ggplot:</p>
|
||||
<section id="base-r" class="level3" data-number="2.4.1">
|
||||
<h3 data-number="2.4.1" class="anchored" data-anchor-id="base-r"><span class="header-section-number">2.4.1</span> Base R</h3>
|
||||
<section id="base-r" class="level2" data-number="6.1">
|
||||
<h2 data-number="6.1" class="anchored" data-anchor-id="base-r"><span class="header-section-number">6.1</span> Base R</h2>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>df <span class="ot"><-</span> uk_census_2021_religion_wmids[<span class="fu">order</span>(uk_census_2021_religion_wmids<span class="sc">$</span>value,<span class="at">decreasing =</span> <span class="cn">TRUE</span>),]</span>
|
||||
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="fu">barplot</span>(<span class="at">height=</span>df<span class="sc">$</span>value, <span class="at">names=</span>df<span class="sc">$</span>key)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
|
@ -583,8 +581,8 @@ div.csl-indent {
|
|||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="ggplot" class="level3 page-columns page-full" data-number="2.4.2">
|
||||
<h3 data-number="2.4.2" class="anchored" data-anchor-id="ggplot"><span class="header-section-number">2.4.2</span> GGPlot</h3>
|
||||
<section id="ggplot" class="level2 page-columns page-full" data-number="6.2">
|
||||
<h2 data-number="6.2" class="anchored" data-anchor-id="ggplot"><span class="header-section-number">6.2</span> GGPlot</h2>
|
||||
<div class="cell">
|
||||
<div class="sourceCode cell-code" id="annotated-cell-10"><pre class="sourceCode r code-annotation-code code-with-copy"><code class="sourceCode r"><span id="annotated-cell-10-1"><a href="#annotated-cell-10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(uk_census_2021_religion_wmids, <span class="fu">aes</span>(<span class="at">x =</span> key, <span class="at">y =</span> value)) <span class="sc">+</span> <span class="fu">geom_bar</span>(<span class="at">stat =</span> <span class="st">"identity"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-annotation">
|
||||
|
@ -725,8 +723,8 @@ div.csl-indent {
|
|||
</div>
|
||||
</section>
|
||||
</section>
|
||||
<section id="is-your-chart-accurate-telling-the-truth-in-data-science" class="level2" data-number="2.5">
|
||||
<h2 data-number="2.5" class="anchored" data-anchor-id="is-your-chart-accurate-telling-the-truth-in-data-science"><span class="header-section-number">2.5</span> Is your chart accurate? Telling the truth in data science</h2>
|
||||
<section id="telling-the-truth-in-data-science-is-your-chart-accurate" class="level1" data-number="7">
|
||||
<h1 data-number="7"><span class="header-section-number">7</span> Telling the truth in data science: Is your chart accurate?</h1>
|
||||
<p>If you’ve been following along up until this point, you’ll now have produced a fairly complete data visualisation for the UK census. There is some technical work yet to be done fine-tuning the visualisation of our chart here, but I’d like to pause for a moment and consider an ethical question drawn from the principles I outlined in the introduction: is the title of this chart truthful and accurate?</p>
|
||||
<p>On one hand, it is a straight-forward reference to the nature of the question asked on the 2021 census survey instrument, e.g. something like “what is your religious affiliation”. However, as you will see in the next chapter, large data sets from the same year which asked a fairly similar question yield different results. Part of this could be attributed to the amount of non-respose to this specific question which, in the 2021 census is between 5-6% across many demographics. It’s possible (though perhaps unlikely) that all those non-responses were Sikh respondents who felt uncomfortable identifying themselves on such a survey. If even half of the non-responses were of this nature, this would dramatically shift the results especially in comparison to other minority groups. So there is some work for us to do here in representing non-response as a category on the census.</p>
|
||||
<p>It’s equally possible that someone might feel uncertain when answering, but nonetheless land on a particular decision marking “Christian” when they wondered if they should instead tick “no religion. Some surveys attempt to capture uncertainty in this way, asking respondents to mark how confident they are about their answers, or allowing respondents to choose multiple answers, but the census hasn’t captured this so we simply don’t know. It’s possible that a large portion of respondents in the”Christian” category were hovering between this and another response, and they might shift their answers when responding on a different day or in the context of a particular experience like a good or bad day attending church, or perhaps having just had a conversation with a friend which shifted their thinking.</p>
|
||||
|
@ -748,8 +746,8 @@ div.csl-indent {
|
|||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="multifactor-visualisation" class="level2 page-columns page-full" data-number="2.6">
|
||||
<h2 data-number="2.6" class="anchored" data-anchor-id="multifactor-visualisation"><span class="header-section-number">2.6</span> Multifactor Visualisation</h2>
|
||||
<section id="multifactor-visualisation" class="level1 page-columns page-full" data-number="8">
|
||||
<h1 data-number="8"><span class="header-section-number">8</span> Multifactor Visualisation</h1>
|
||||
<p>One element of R data analysis of census datasets that can get really interesting is working with multiple variables. Above we’ve looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, we’ve already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantitative variable (also known as bivariate data when you have <em>two</em> variables) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Let’s have a look at the way that religion in England and Wales breaks down by ethnicity.</p>
|
||||
<div class="callout callout-style-default callout-tip callout-titled">
|
||||
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
|
||||
|
@ -1018,7 +1016,6 @@ Statistics 101: Logarithmic Visualisation
|
|||
</div>
|
||||
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</main> <!-- /main -->
|
||||
|
|
|
@ -149,7 +149,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
|||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="./chapter_1.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></span></a>
|
||||
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
|
@ -1440,8 +1440,8 @@ window.document.addEventListener("DOMContentLoaded", function (event) {
|
|||
</script>
|
||||
<nav class="page-navigation">
|
||||
<div class="nav-page nav-page-previous">
|
||||
<a href="./chapter_1.html" class="pagination-link aria-label=" <span="">
|
||||
<i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></span>
|
||||
<a href="./chapter_1.html" class="pagination-link aria-label=" <span="" up="" local="" workspace:<="" span>"="">
|
||||
<i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></span>
|
||||
</a>
|
||||
</div>
|
||||
<div class="nav-page nav-page-next">
|
||||
|
|
Before Width: | Height: | Size: 232 KiB After Width: | Height: | Size: 232 KiB |
Before Width: | Height: | Size: 286 KiB After Width: | Height: | Size: 290 KiB |
Before Width: | Height: | Size: 304 KiB After Width: | Height: | Size: 300 KiB |
Before Width: | Height: | Size: 395 KiB After Width: | Height: | Size: 396 KiB |
Before Width: | Height: | Size: 191 KiB After Width: | Height: | Size: 191 KiB |
|
@ -116,7 +116,7 @@ ul.task-list li input[type="checkbox"] {
|
|||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="./chapter_1.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></span></a>
|
||||
<span class="menu-text"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
|
@ -642,8 +642,8 @@ window.document.addEventListener("DOMContentLoaded", function (event) {
|
|||
<div class="nav-page nav-page-previous">
|
||||
</div>
|
||||
<div class="nav-page nav-page-next">
|
||||
<a href="./chapter_1.html" class="pagination-link" aria-label="<span class='chapter-number'>1</span> <span class='chapter-title'>Preamble</span>">
|
||||
<span class="nav-page-text"><span class="chapter-number">1</span> <span class="chapter-title">Preamble</span></span> <i class="bi bi-arrow-right-short"></i>
|
||||
<a href="./chapter_1.html" class="pagination-link" aria-label="<span class='chapter-number'>1</span> <span class='chapter-title'>Set up local workspace:</span>">
|
||||
<span class="nav-page-text"><span class="chapter-number">1</span> <span class="chapter-title">Set up local workspace:</span></span> <i class="bi bi-arrow-right-short"></i>
|
||||
</a>
|
||||
</div>
|
||||
</nav>
|
||||
|
|
|
@ -1,5 +1,3 @@
|
|||
# Preamble
|
||||
|
||||
We'll get to the good stuff in a moment, but first we need to do a bit of setup. The code provided here is intended to set up your workspace and is also necessary for the `quarto` application we use to build this book. Quarto is an application which blends together text and blocks of code. You can ignore most of it for now, though if you're running the code as we go along, you'll definitely want to include these lines, as they create directories where your files will go as you create charts and extract data below and tells R where to find those files:
|
||||
|
||||
```{r}
|
||||
|
@ -22,11 +20,11 @@ if (dir.exists("derivedData") == FALSE) {
|
|||
}
|
||||
```
|
||||
|
||||
# The 2021 UK Census
|
||||
# Introducing the 2021 UK Census
|
||||
|
||||
For our first exercise in this book, we're going to work with a census dataset. As you'll see by contrast in chapter 2, census data is intended to represent as fully as possible the demographic features of a specific community, in this case, the United Kingdom. We might assume that a large-scale survey given to 1000 or more respondents and distributed appropriately across a variety of demographics will approximate the results of a census, but there's really no substitite for a survey which has been given to (nearly) the entire population. This also allows us to compare a number of different subsets, as we'll explore further below. The big question that we're confronting in this chapter is how best to represent religious belonging and participation at such a large scale, and to flag up some of the hidden limitations in this seemingly comprehensive dataset.
|
||||
|
||||
## Getting started with UK Census data
|
||||
# Getting started with UK Census data
|
||||
|
||||
Let's start by importing some data into R. Because R is what is called an object-oriented programming language, we'll always take our information and give it a home inside a named object. There are many different kinds of objects, which you can specify, but usually R will assign a type that seems to fit best, often a table of data which looks a bit like a spreadsheet which is called a `dataframe`.
|
||||
|
||||
|
@ -39,7 +37,7 @@ In the example below, we're going to begin by reading in data from a comma separ
|
|||
uk_census_2021_religion <- read.csv(here("example_data", "census2021-ts030-rgn.csv"))
|
||||
```
|
||||
|
||||
## Examining data:
|
||||
# Examining data:
|
||||
|
||||
What's in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:
|
||||
|
||||
|
@ -59,7 +57,7 @@ You can see how I've nested the previous command inside the `kable` command. For
|
|||
knitr::kable(tail(uk_census_2021_religion))
|
||||
```
|
||||
|
||||
## Parsing and Exploring your data
|
||||
# Parsing and Exploring your data
|
||||
|
||||
The first thing you're going to want to do is to take a smaller subset of a large data set, either by filtering out certain columns or rows. Now let's say we want to just work with the data from the West Midlands, and we'd like to omit some of the columns. We can choose a specific range of columns using `select`, like this:
|
||||
|
||||
|
@ -77,7 +75,7 @@ Now we'll use select in a different way to narrow our data to specific columns t
|
|||
In keeping with my goal to demonstrate data science through examples, we're going to move on to producing some snappy looking charts for this data.
|
||||
|
||||
|
||||
## Making your first data visulation: the humble bar chart
|
||||
# Making your first data visulation: the humble bar chart
|
||||
|
||||
We've got a nice lean set of data, so now it's time to visualise this. We'll start by making a pie chart:
|
||||
|
||||
|
@ -89,7 +87,7 @@ uk_census_2021_religion_wmids <- gather(uk_census_2021_religion_wmids)
|
|||
|
||||
There are two basic ways to do visualisations in R. You can work with basic functions in R, often called "base R" or you can work with an alternative library called ggplot:
|
||||
|
||||
### Base R
|
||||
## Base R
|
||||
|
||||
```{r}
|
||||
df <- uk_census_2021_religion_wmids[order(uk_census_2021_religion_wmids$value,decreasing = TRUE),]
|
||||
|
@ -97,7 +95,7 @@ barplot(height=df$value, names=df$key)
|
|||
```
|
||||
|
||||
|
||||
### GGPlot
|
||||
## GGPlot
|
||||
|
||||
```{r}
|
||||
ggplot(uk_census_2021_religion_wmids, aes(x = key, y = value)) + geom_bar(stat = "identity") # <1>
|
||||
|
@ -180,7 +178,7 @@ We can fine tune a few other visual features here as well, like adding a title w
|
|||
```{r}
|
||||
ggplot(uk_census_2021_religion_merged, aes(fill=fct_reorder(dataset, value), x=reorder(key,-value),value, y=perc)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the UK: 2021") + xlab("") + ylab("")
|
||||
```
|
||||
## Is your chart accurate? Telling the truth in data science
|
||||
# Telling the truth in data science: Is your chart accurate?
|
||||
|
||||
If you've been following along up until this point, you'll now have produced a fairly complete data visualisation for the UK census. There is some technical work yet to be done fine-tuning the visualisation of our chart here, but I'd like to pause for a moment and consider an ethical question drawn from the principles I outlined in the introduction: is the title of this chart truthful and accurate?
|
||||
|
||||
|
@ -206,8 +204,7 @@ So if we are going to fine-tune our visuals to ensure they comport with our hack
|
|||
ggplot(uk_census_2021_religion_merged, aes(fill=fct_reorder(dataset, value), x=reorder(key,-value),value, y=perc)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the 2021 Census of England and Wales") + xlab("") + ylab("")
|
||||
```
|
||||
|
||||
|
||||
## Multifactor Visualisation
|
||||
# Multifactor Visualisation
|
||||
|
||||
One element of R data analysis of census datasets that can get really interesting is working with multiple variables. Above we've looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, we've already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantitative variable (also known as bivariate data when you have *two* variables) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Let's have a look at the way that religion in England and Wales breaks down by ethnicity.
|
||||
|
||||
|
|