R code and data relating to analysis of a survey deployed in 2021 to L11-13 pupils in the UK regarding their preferences for University study
Find a file
2024-02-21 09:04:03 +00:00
.gitignore updated gitignore 2021-12-13 09:26:40 +00:00
CODE_OF_CONDUCT.md updating readme 2021-09-21 09:01:16 +01:00
final_draft.qmd cosmetic changes 2024-02-21 09:04:03 +00:00
README.md updating readme 2021-09-21 09:01:16 +01:00
TODO.taskpaper fixed likert stacked bars 2021-12-13 10:31:37 +00:00

L11-13 Admissions Preferences Survey 2021

R code and data relating to analysis of a survey deployed in 2021 to L11-13 pupils in the UK regarding their preferences for University study

Why Reproducible Research?

If you're new to github and reproducible research, welcome! It's nice to have you here. Github is ordinarily a place where software developers working on open source software projects deposit their code as they write software collaboratively. However, in recent years a number of scholarly researchers, especially people working on research that involves a digital component (including me!) have begun to deposit their papers in these same software repositories. The idea is that you can download all of the source-code and data used in this paper alongside the actual text, run it yourself and "reproduce" the results. This can serve as a useful safeguard, a layer of research transparency, and a cool teaching tool for other persons interested in doing similar work. Particularly when, as is the case in subject areas that are only just starting to get involved in the digital humanities, like religious studies, there is a dearth of work of this nature, it can be helpful to have examples of practice which can be reused, or at least used as an example.

Eschewing proprietary, expensive and unreliable software like Microsoft Word, I write in a combination of two languages: (1) Markdown which is intended to be as close as possible to plain text while still allowing for things like boldfaced type, headings and footnotes; and (2) a programming language called R to do all the data analysis. R is an object oriented language that was specifically designed for statistical analysis. It's also great fun to tinker with. As you look through this paper, you'll see that R code is integrated into the text of the document. This is indicated by a series of three backticks (```). There is a formal specification now at a mature stage of development, which is RMarkdown. You can read semi-official specification for this here.

To read a bit more on these things and start on your own path towards plain text reproducible research, I highly recommend:

The other advantage of putting this paper here is that readers and reviewers can suggest changes and point out errors in the document. To do this, I recommend that you create a github issue by clicking on the green "New issue" button here. If you must, you can also send me emails. More stuff about the project lead Jeremy can be found here.

Now for...

The technical version

Code and the paper here are written in R Markdown and for the most part, using the conventions outlined by Kieran Healy here which is best viewed (I think) in R Studio though it will be reasonably comprehensible to anyone using a Markdown editor. If I'm not working in RStudio, I'm probably in Sublime text, FYI. Co-authors and collaborators take note, generally, I use Hadley Wickham's venerable R Style Guide.

I'd be extremely happy if someone found errors, or imagined a more efficient means of analysis and either reported them as an issue on this github repository or sent me an email.

Paths in this folder are used mostly for R processing. I'm using a "project" oriented workflow, on which you can read more in a blog by Jenny Bryan here. This uses the R package here. Towards this end folders have the following significance:

  • data contains datasets used for analysis.
  • derived_data contains files which represent modified forms of files in the above path.
  • figures contains images and visualisations (graphic files) which are generated by R for the final form of the document.
  • cache isn't included in github but is usually used for working files

Note: none of the contents of the above are included in the github repository unless they are unavailable from an external repository.

Prerequisites for reproducing this codebase

We've tried to follow best practices in setting up this script for reproducibility, but some setup is required before execution will be successful.

These steps are:

  1. Acquire a working installation of R (and RStudio). I have produced a Docker container that replicates the environment I have used to execute this script that is probably the easiest way to complete this task.
  2. Install platform appropriate prerequisites for ...
  3. Clone or download the code from this repository
  4. Set up a proper R/RStudio working environment. I use the renv package to manage working environment, which takes snapshots and stores them to renv.lock. If you run renv::restore() in R after loading this code, it will install necessary libraries at proper versions.
  5. Nearly all of the data used in this study is open, with one exception, that of the Ordnance Survey PointX data product. This is available to most UK academics via the EDINA service, so the user will need to manually download this data and place it in the /data/ directory.

Contributing

Please note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

The content of any research papers in this repository are licensed under the Creative Commons Attribution-ShareAlike 4.0 International Public License, and the underlying source code used to generate the paper is licensed under the GNU AGPLv3 license. Underlying datasets designed as part of this research have their own licenses that are specified in their respective repositories.