updating headings hierarchy in ch1

This commit is contained in:
Jeremy Kidwell 2024-02-15 12:36:03 +00:00
parent 5116c12449
commit dd514ea06d
10 changed files with 56 additions and 62 deletions

View file

@ -7,7 +7,7 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>Hacking Religion: TRS &amp; Data Science in Action - 1&nbsp; Preamble</title>
<title>Hacking Religion: TRS &amp; Data Science in Action - 1&nbsp; Set up local workspace:</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
@ -134,7 +134,7 @@ div.csl-indent {
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<i class="bi bi-layout-text-sidebar-reverse"></i>
</button>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./chapter_1.html"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></a></li></ol></nav>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="./chapter_1.html"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></a></li></ol></nav>
<a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
</a>
<button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@ -168,7 +168,7 @@ div.csl-indent {
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./chapter_1.html" class="sidebar-item-text sidebar-link active">
<span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></span></a>
<span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></span></a>
</div>
</li>
<li class="sidebar-item">
@ -217,19 +217,17 @@ div.csl-indent {
<h2 id="toc-title">Table of contents</h2>
<ul>
<li><a href="#the-2021-uk-census" id="toc-the-2021-uk-census" class="nav-link active" data-scroll-target="#the-2021-uk-census"><span class="header-section-number">2</span> The 2021 UK Census</a>
<li><a href="#introducing-the-2021-uk-census" id="toc-introducing-the-2021-uk-census" class="nav-link active" data-scroll-target="#introducing-the-2021-uk-census"><span class="header-section-number">2</span> Introducing the 2021 UK Census</a></li>
<li><a href="#getting-started-with-uk-census-data" id="toc-getting-started-with-uk-census-data" class="nav-link" data-scroll-target="#getting-started-with-uk-census-data"><span class="header-section-number">3</span> Getting started with UK Census data</a></li>
<li><a href="#examining-data" id="toc-examining-data" class="nav-link" data-scroll-target="#examining-data"><span class="header-section-number">4</span> Examining data:</a></li>
<li><a href="#parsing-and-exploring-your-data" id="toc-parsing-and-exploring-your-data" class="nav-link" data-scroll-target="#parsing-and-exploring-your-data"><span class="header-section-number">5</span> Parsing and Exploring your data</a></li>
<li><a href="#making-your-first-data-visulation-the-humble-bar-chart" id="toc-making-your-first-data-visulation-the-humble-bar-chart" class="nav-link" data-scroll-target="#making-your-first-data-visulation-the-humble-bar-chart"><span class="header-section-number">6</span> Making your first data visulation: the humble bar chart</a>
<ul class="collapse">
<li><a href="#getting-started-with-uk-census-data" id="toc-getting-started-with-uk-census-data" class="nav-link" data-scroll-target="#getting-started-with-uk-census-data"><span class="header-section-number">2.1</span> Getting started with UK Census data</a></li>
<li><a href="#examining-data" id="toc-examining-data" class="nav-link" data-scroll-target="#examining-data"><span class="header-section-number">2.2</span> Examining data:</a></li>
<li><a href="#parsing-and-exploring-your-data" id="toc-parsing-and-exploring-your-data" class="nav-link" data-scroll-target="#parsing-and-exploring-your-data"><span class="header-section-number">2.3</span> Parsing and Exploring your data</a></li>
<li><a href="#making-your-first-data-visulation-the-humble-bar-chart" id="toc-making-your-first-data-visulation-the-humble-bar-chart" class="nav-link" data-scroll-target="#making-your-first-data-visulation-the-humble-bar-chart"><span class="header-section-number">2.4</span> Making your first data visulation: the humble bar chart</a>
<ul class="collapse">
<li><a href="#base-r" id="toc-base-r" class="nav-link" data-scroll-target="#base-r"><span class="header-section-number">2.4.1</span> Base R</a></li>
<li><a href="#ggplot" id="toc-ggplot" class="nav-link" data-scroll-target="#ggplot"><span class="header-section-number">2.4.2</span> GGPlot</a></li>
</ul></li>
<li><a href="#is-your-chart-accurate-telling-the-truth-in-data-science" id="toc-is-your-chart-accurate-telling-the-truth-in-data-science" class="nav-link" data-scroll-target="#is-your-chart-accurate-telling-the-truth-in-data-science"><span class="header-section-number">2.5</span> Is your chart accurate? Telling the truth in data science</a></li>
<li><a href="#multifactor-visualisation" id="toc-multifactor-visualisation" class="nav-link" data-scroll-target="#multifactor-visualisation"><span class="header-section-number">2.6</span> Multifactor Visualisation</a></li>
<li><a href="#base-r" id="toc-base-r" class="nav-link" data-scroll-target="#base-r"><span class="header-section-number">6.1</span> Base R</a></li>
<li><a href="#ggplot" id="toc-ggplot" class="nav-link" data-scroll-target="#ggplot"><span class="header-section-number">6.2</span> GGPlot</a></li>
</ul></li>
<li><a href="#telling-the-truth-in-data-science-is-your-chart-accurate" id="toc-telling-the-truth-in-data-science-is-your-chart-accurate" class="nav-link" data-scroll-target="#telling-the-truth-in-data-science-is-your-chart-accurate"><span class="header-section-number">7</span> Telling the truth in data science: Is your chart accurate?</a></li>
<li><a href="#multifactor-visualisation" id="toc-multifactor-visualisation" class="nav-link" data-scroll-target="#multifactor-visualisation"><span class="header-section-number">8</span> Multifactor Visualisation</a></li>
</ul>
</nav>
</div>
@ -238,7 +236,7 @@ div.csl-indent {
<header id="title-block-header" class="quarto-title-block default">
<div class="quarto-title">
<h1 class="title"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></h1>
<h1 class="title"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></h1>
</div>
@ -264,22 +262,22 @@ div.csl-indent {
<div class="cell-output cell-output-stderr">
<pre><code>here() starts at /Users/kidwellj/gits/hacking_religion_textbook/hacking_religion</code></pre>
</div>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Set up local workspace:</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"data"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"data"</span>) </span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"figures"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"figures"</span>) </span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"derivedData"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"derivedData"</span>)</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"data"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"data"</span>) </span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"figures"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"figures"</span>) </span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (<span class="fu">dir.exists</span>(<span class="st">"derivedData"</span>) <span class="sc">==</span> <span class="cn">FALSE</span>) {</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(<span class="st">"derivedData"</span>)</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>}</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<section id="the-2021-uk-census" class="level1 page-columns page-full" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> The 2021 UK Census</h1>
<section id="introducing-the-2021-uk-census" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> Introducing the 2021 UK Census</h1>
<p>For our first exercise in this book, were going to work with a census dataset. As youll see by contrast in chapter 2, census data is intended to represent as fully as possible the demographic features of a specific community, in this case, the United Kingdom. We might assume that a large-scale survey given to 1000 or more respondents and distributed appropriately across a variety of demographics will approximate the results of a census, but theres really no substitite for a survey which has been given to (nearly) the entire population. This also allows us to compare a number of different subsets, as well explore further below. The big question that were confronting in this chapter is how best to represent religious belonging and participation at such a large scale, and to flag up some of the hidden limitations in this seemingly comprehensive dataset.</p>
<section id="getting-started-with-uk-census-data" class="level2 page-columns page-full" data-number="2.1">
<h2 data-number="2.1" class="anchored" data-anchor-id="getting-started-with-uk-census-data"><span class="header-section-number">2.1</span> Getting started with UK Census data</h2>
</section>
<section id="getting-started-with-uk-census-data" class="level1 page-columns page-full" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> Getting started with UK Census data</h1>
<p>Lets start by importing some data into R. Because R is what is called an object-oriented programming language, well always take our information and give it a home inside a named object. There are many different kinds of objects, which you can specify, but usually R will assign a type that seems to fit best, often a table of data which looks a bit like a spreadsheet which is called a <code>dataframe</code>.</p>
<div class="page-columns page-full"><p></p><div class="no-row-height column-margin column-container"><span class="margin-aside">If youd like to explore this all in a bit more depth, you can find a very helpful summary in R for Data Science, chapter 8, <a href="https://r4ds.hadley.nz/data-import#reading-data-from-a-file">“data import”</a>.</span></div></div>
<p>In the example below, were going to begin by reading in data from a comma separated value file (“csv”) which has rows of information on separate lines in a text file with each column separated by a comma. This is one of the standard plain text file formats. R has a function you can use to import this efficiently called <code>read.csv</code>. Each line of code in R usually starts with the object, and then follows with instructions on what were going to put inside it, where that comes from, and how to format it:</p>
@ -287,8 +285,8 @@ div.csl-indent {
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>uk_census_2021_religion <span class="ot">&lt;-</span> <span class="fu">read.csv</span>(<span class="fu">here</span>(<span class="st">"example_data"</span>, <span class="st">"census2021-ts030-rgn.csv"</span>)) </span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
</section>
<section id="examining-data" class="level2" data-number="2.2">
<h2 data-number="2.2" class="anchored" data-anchor-id="examining-data"><span class="header-section-number">2.2</span> Examining data:</h2>
<section id="examining-data" class="level1" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> Examining data:</h1>
<p>Whats in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">head</span>(uk_census_2021_religion)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@ -550,8 +548,8 @@ div.csl-indent {
</div>
</div>
</section>
<section id="parsing-and-exploring-your-data" class="level2 page-columns page-full" data-number="2.3">
<h2 data-number="2.3" class="anchored" data-anchor-id="parsing-and-exploring-your-data"><span class="header-section-number">2.3</span> Parsing and Exploring your data</h2>
<section id="parsing-and-exploring-your-data" class="level1 page-columns page-full" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> Parsing and Exploring your data</h1>
<p>The first thing youre going to want to do is to take a smaller subset of a large data set, either by filtering out certain columns or rows. Now lets say we want to just work with the data from the West Midlands, and wed like to omit some of the columns. We can choose a specific range of columns using <code>select</code>, like this:</p>
<p>You can use the <code>filter</code> command to do this. To give an example, <code>filter</code> can pick a single row in the following way:</p>
<div class="cell">
@ -561,16 +559,16 @@ div.csl-indent {
<div class="page-columns page-full"><p></p><div class="no-row-height column-margin column-container"><span class="margin-aside">Some readers will want to pause here and check out Hadley Wickhams “R For Data Science” book, in the section, <a href="https://r4ds.hadley.nz/data-visualize#introduction">“Data visualisation”</a> to get a fuller explanation of how to explore your data.</span></div></div>
<p>In keeping with my goal to demonstrate data science through examples, were going to move on to producing some snappy looking charts for this data.</p>
</section>
<section id="making-your-first-data-visulation-the-humble-bar-chart" class="level2 page-columns page-full" data-number="2.4">
<h2 data-number="2.4" class="anchored" data-anchor-id="making-your-first-data-visulation-the-humble-bar-chart"><span class="header-section-number">2.4</span> Making your first data visulation: the humble bar chart</h2>
<section id="making-your-first-data-visulation-the-humble-bar-chart" class="level1 page-columns page-full" data-number="6">
<h1 data-number="6"><span class="header-section-number">6</span> Making your first data visulation: the humble bar chart</h1>
<p>Weve got a nice lean set of data, so now its time to visualise this. Well start by making a pie chart:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>uk_census_2021_religion_wmids <span class="ot">&lt;-</span> uk_census_2021_religion_wmids <span class="sc">%&gt;%</span> <span class="fu">select</span>(no_religion<span class="sc">:</span>no_response)</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>uk_census_2021_religion_wmids <span class="ot">&lt;-</span> <span class="fu">gather</span>(uk_census_2021_religion_wmids)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>There are two basic ways to do visualisations in R. You can work with basic functions in R, often called “base R” or you can work with an alternative library called ggplot:</p>
<section id="base-r" class="level3" data-number="2.4.1">
<h3 data-number="2.4.1" class="anchored" data-anchor-id="base-r"><span class="header-section-number">2.4.1</span> Base R</h3>
<section id="base-r" class="level2" data-number="6.1">
<h2 data-number="6.1" class="anchored" data-anchor-id="base-r"><span class="header-section-number">6.1</span> Base R</h2>
<div class="cell">
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>df <span class="ot">&lt;-</span> uk_census_2021_religion_wmids[<span class="fu">order</span>(uk_census_2021_religion_wmids<span class="sc">$</span>value,<span class="at">decreasing =</span> <span class="cn">TRUE</span>),]</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="fu">barplot</span>(<span class="at">height=</span>df<span class="sc">$</span>value, <span class="at">names=</span>df<span class="sc">$</span>key)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@ -583,8 +581,8 @@ div.csl-indent {
</div>
</div>
</section>
<section id="ggplot" class="level3 page-columns page-full" data-number="2.4.2">
<h3 data-number="2.4.2" class="anchored" data-anchor-id="ggplot"><span class="header-section-number">2.4.2</span> GGPlot</h3>
<section id="ggplot" class="level2 page-columns page-full" data-number="6.2">
<h2 data-number="6.2" class="anchored" data-anchor-id="ggplot"><span class="header-section-number">6.2</span> GGPlot</h2>
<div class="cell">
<div class="sourceCode cell-code" id="annotated-cell-10"><pre class="sourceCode r code-annotation-code code-with-copy"><code class="sourceCode r"><span id="annotated-cell-10-1"><a href="#annotated-cell-10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(uk_census_2021_religion_wmids, <span class="fu">aes</span>(<span class="at">x =</span> key, <span class="at">y =</span> value)) <span class="sc">+</span> <span class="fu">geom_bar</span>(<span class="at">stat =</span> <span class="st">"identity"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-annotation">
@ -725,8 +723,8 @@ div.csl-indent {
</div>
</section>
</section>
<section id="is-your-chart-accurate-telling-the-truth-in-data-science" class="level2" data-number="2.5">
<h2 data-number="2.5" class="anchored" data-anchor-id="is-your-chart-accurate-telling-the-truth-in-data-science"><span class="header-section-number">2.5</span> Is your chart accurate? Telling the truth in data science</h2>
<section id="telling-the-truth-in-data-science-is-your-chart-accurate" class="level1" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> Telling the truth in data science: Is your chart accurate?</h1>
<p>If youve been following along up until this point, youll now have produced a fairly complete data visualisation for the UK census. There is some technical work yet to be done fine-tuning the visualisation of our chart here, but Id like to pause for a moment and consider an ethical question drawn from the principles I outlined in the introduction: is the title of this chart truthful and accurate?</p>
<p>On one hand, it is a straight-forward reference to the nature of the question asked on the 2021 census survey instrument, e.g.&nbsp;something like “what is your religious affiliation”. However, as you will see in the next chapter, large data sets from the same year which asked a fairly similar question yield different results. Part of this could be attributed to the amount of non-respose to this specific question which, in the 2021 census is between 5-6% across many demographics. Its possible (though perhaps unlikely) that all those non-responses were Sikh respondents who felt uncomfortable identifying themselves on such a survey. If even half of the non-responses were of this nature, this would dramatically shift the results especially in comparison to other minority groups. So there is some work for us to do here in representing non-response as a category on the census.</p>
<p>Its equally possible that someone might feel uncertain when answering, but nonetheless land on a particular decision marking “Christian” when they wondered if they should instead tick “no religion. Some surveys attempt to capture uncertainty in this way, asking respondents to mark how confident they are about their answers, or allowing respondents to choose multiple answers, but the census hasnt captured this so we simply dont know. Its possible that a large portion of respondents in the”Christian” category were hovering between this and another response, and they might shift their answers when responding on a different day or in the context of a particular experience like a good or bad day attending church, or perhaps having just had a conversation with a friend which shifted their thinking.</p>
@ -748,8 +746,8 @@ div.csl-indent {
</div>
</div>
</section>
<section id="multifactor-visualisation" class="level2 page-columns page-full" data-number="2.6">
<h2 data-number="2.6" class="anchored" data-anchor-id="multifactor-visualisation"><span class="header-section-number">2.6</span> Multifactor Visualisation</h2>
<section id="multifactor-visualisation" class="level1 page-columns page-full" data-number="8">
<h1 data-number="8"><span class="header-section-number">8</span> Multifactor Visualisation</h1>
<p>One element of R data analysis of census datasets that can get really interesting is working with multiple variables. Above weve looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, weve already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantitative variable (also known as bivariate data when you have <em>two</em> variables) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Lets have a look at the way that religion in England and Wales breaks down by ethnicity.</p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
@ -1018,7 +1016,6 @@ Statistics 101: Logarithmic Visualisation
</div>
</section>
</section>
</main> <!-- /main -->

View file

@ -149,7 +149,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./chapter_1.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></span></a>
<span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></span></a>
</div>
</li>
<li class="sidebar-item">
@ -1440,8 +1440,8 @@ window.document.addEventListener("DOMContentLoaded", function (event) {
</script>
<nav class="page-navigation">
<div class="nav-page nav-page-previous">
<a href="./chapter_1.html" class="pagination-link aria-label=" &lt;span="">
<i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></span>
<a href="./chapter_1.html" class="pagination-link aria-label=" &lt;span="" up="" local="" workspace:&lt;="" span&gt;"="">
<i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></span>
</a>
</div>
<div class="nav-page nav-page-next">

Binary file not shown.

Before

Width:  |  Height:  |  Size: 232 KiB

After

Width:  |  Height:  |  Size: 232 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 286 KiB

After

Width:  |  Height:  |  Size: 290 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 304 KiB

After

Width:  |  Height:  |  Size: 300 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 395 KiB

After

Width:  |  Height:  |  Size: 396 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 191 KiB

After

Width:  |  Height:  |  Size: 191 KiB

View file

@ -116,7 +116,7 @@ ul.task-list li input[type="checkbox"] {
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./chapter_1.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></span></a>
<span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></span></a>
</div>
</li>
<li class="sidebar-item">
@ -642,8 +642,8 @@ window.document.addEventListener("DOMContentLoaded", function (event) {
<div class="nav-page nav-page-previous">
</div>
<div class="nav-page nav-page-next">
<a href="./chapter_1.html" class="pagination-link" aria-label="<span class='chapter-number'>1</span>&nbsp; <span class='chapter-title'>Preamble</span>">
<span class="nav-page-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Preamble</span></span> <i class="bi bi-arrow-right-short"></i>
<a href="./chapter_1.html" class="pagination-link" aria-label="<span class='chapter-number'>1</span>&nbsp; <span class='chapter-title'>Set up local workspace:</span>">
<span class="nav-page-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Set up local workspace:</span></span> <i class="bi bi-arrow-right-short"></i>
</a>
</div>
</nav>

View file

@ -1,5 +1,3 @@
# Preamble
We'll get to the good stuff in a moment, but first we need to do a bit of setup. The code provided here is intended to set up your workspace and is also necessary for the `quarto` application we use to build this book. Quarto is an application which blends together text and blocks of code. You can ignore most of it for now, though if you're running the code as we go along, you'll definitely want to include these lines, as they create directories where your files will go as you create charts and extract data below and tells R where to find those files:
```{r}
@ -22,11 +20,11 @@ if (dir.exists("derivedData") == FALSE) {
}
```
# The 2021 UK Census
# Introducing the 2021 UK Census
For our first exercise in this book, we're going to work with a census dataset. As you'll see by contrast in chapter 2, census data is intended to represent as fully as possible the demographic features of a specific community, in this case, the United Kingdom. We might assume that a large-scale survey given to 1000 or more respondents and distributed appropriately across a variety of demographics will approximate the results of a census, but there's really no substitite for a survey which has been given to (nearly) the entire population. This also allows us to compare a number of different subsets, as we'll explore further below. The big question that we're confronting in this chapter is how best to represent religious belonging and participation at such a large scale, and to flag up some of the hidden limitations in this seemingly comprehensive dataset.
## Getting started with UK Census data
# Getting started with UK Census data
Let's start by importing some data into R. Because R is what is called an object-oriented programming language, we'll always take our information and give it a home inside a named object. There are many different kinds of objects, which you can specify, but usually R will assign a type that seems to fit best, often a table of data which looks a bit like a spreadsheet which is called a `dataframe`.
@ -39,7 +37,7 @@ In the example below, we're going to begin by reading in data from a comma separ
uk_census_2021_religion <- read.csv(here("example_data", "census2021-ts030-rgn.csv"))
```
## Examining data:
# Examining data:
What's in the table? You can take a quick look at either the top of the data frame, or the bottom using one of the following commands:
@ -59,7 +57,7 @@ You can see how I've nested the previous command inside the `kable` command. For
knitr::kable(tail(uk_census_2021_religion))
```
## Parsing and Exploring your data
# Parsing and Exploring your data
The first thing you're going to want to do is to take a smaller subset of a large data set, either by filtering out certain columns or rows. Now let's say we want to just work with the data from the West Midlands, and we'd like to omit some of the columns. We can choose a specific range of columns using `select`, like this:
@ -77,7 +75,7 @@ Now we'll use select in a different way to narrow our data to specific columns t
In keeping with my goal to demonstrate data science through examples, we're going to move on to producing some snappy looking charts for this data.
## Making your first data visulation: the humble bar chart
# Making your first data visulation: the humble bar chart
We've got a nice lean set of data, so now it's time to visualise this. We'll start by making a pie chart:
@ -89,7 +87,7 @@ uk_census_2021_religion_wmids <- gather(uk_census_2021_religion_wmids)
There are two basic ways to do visualisations in R. You can work with basic functions in R, often called "base R" or you can work with an alternative library called ggplot:
### Base R
## Base R
```{r}
df <- uk_census_2021_religion_wmids[order(uk_census_2021_religion_wmids$value,decreasing = TRUE),]
@ -97,7 +95,7 @@ barplot(height=df$value, names=df$key)
```
### GGPlot
## GGPlot
```{r}
ggplot(uk_census_2021_religion_wmids, aes(x = key, y = value)) + geom_bar(stat = "identity") # <1>
@ -180,7 +178,7 @@ We can fine tune a few other visual features here as well, like adding a title w
```{r}
ggplot(uk_census_2021_religion_merged, aes(fill=fct_reorder(dataset, value), x=reorder(key,-value),value, y=perc)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the UK: 2021") + xlab("") + ylab("")
```
## Is your chart accurate? Telling the truth in data science
# Telling the truth in data science: Is your chart accurate?
If you've been following along up until this point, you'll now have produced a fairly complete data visualisation for the UK census. There is some technical work yet to be done fine-tuning the visualisation of our chart here, but I'd like to pause for a moment and consider an ethical question drawn from the principles I outlined in the introduction: is the title of this chart truthful and accurate?
@ -206,8 +204,7 @@ So if we are going to fine-tune our visuals to ensure they comport with our hack
ggplot(uk_census_2021_religion_merged, aes(fill=fct_reorder(dataset, value), x=reorder(key,-value),value, y=perc)) + geom_bar(position="dodge", stat ="identity", colour = "black") + scale_fill_brewer(palette = "Set1") + ggtitle("Religious Affiliation in the 2021 Census of England and Wales") + xlab("") + ylab("")
```
## Multifactor Visualisation
# Multifactor Visualisation
One element of R data analysis of census datasets that can get really interesting is working with multiple variables. Above we've looked at the breakdown of religious affiliation across the whole of England and Wales (Scotland operates an independent census), and by placing this data alongside a specific region, we've already made a basic entry into working with multiple variables but this can get much more interesting. Adding an additional quantitative variable (also known as bivariate data when you have *two* variables) into the mix, however can also generate a lot more information and we have to think about visualising it in different ways which can still communicate with visual clarity in spite of the additional visual noise which is inevitable with enhanced complexity. Let's have a look at the way that religion in England and Wales breaks down by ethnicity.