jeremykidwell.info/static/files/bookdown/data_ethics-law_course/search_index.json

16 lines
28 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[
["index.html", "what can I do with stuff I find online? Chapter 1 Introduction to the module", " what can I do with stuff I find online? Alex Fenlon and Jeremy H. Kidwell 2016-12-02 Chapter 1 Introduction to the module This module has been designed to enable independent learners to aquire knowledge about contemporary legal and methodological issues surrounding digital data and research, and to provoke some thinking on the big questions that underpin some of the more practical issues we discuss here. Ideally, a student should expect to invest at least four hours in independent learning activities (reading articles, writing reflections, etc.) over the course of a week. It should be possible to work through this a bit more slowly and in-depth. We have tried to point to a range of resources, both those which are accessible and those which are challenging. It is also worth noting that this module is included in a broader effort underway at the University of Birmingham, the “Birmingham Digital” which has other modules working alongside this one. For the week, we have divided content into five parts. Included here (and below) is a brief introduction to the course. There are four additional sessions, which each have associated activities which explore more specific areas within the topic we are exploring here. Now lets dive into the big issues together! We will explain more along the way as we go along together. "],
["can-you-use-stuff-online-for-research.html", "Chapter 2 Can you use stuff online for research? 2.1 iframe / video here 2.2 Video Transcript:", " Chapter 2 Can you use stuff online for research? In this video, we discuss the overarching question that occupies this module. You can watch the video, and read the transcript below. 2.1 iframe / video here 2.2 Video Transcript: JK: In this module, wed like to focus on a particular question: AF: “Can you use stuff online for research?” JK: This is one of those questions which might seem simple at first hearing, but if recent controversies are any indication, this is anything but the case. AF: As we see it, the word “can” works out in two different ways - “can you use stuff” might be framed as a “legal” question… JK: Or as an “ethical” one. Lucky for you, Im an ethicist and hes a lawyer. So were going to try and combine our expertise as we think together on the issues that arise around digital data and research. AF: The legal issues get complicated pretty fast: do we have the legal right or ability to use content found online? What are the legal aspects around using online materials for research? Can we simply use it, do we need permission/ a licence from the platform/ website, from the publisher or from the author/ creator/ uploader of the content? What about the new data protection regime? What is our legal basis for processing personal data? Is the individual concerned aware of this? Do they need to be? JK: Of course, legal concerns overlap with ethical ones - the original lawyers (and some of the current ones) were philosophers after all. But there are some differen emphases when we consider this question from an ethical perspective… the word at the start of our question changes a bit, perhaps from “can” to “should we use content we find online”? The content may be ok from the legal aspect, it may be free from any copyright restrictions, but there may be ethical issues that arise in using the content or data for a research purposes. In subsequent sessions, well be talking about the idea of informed consent and how it gets more complicated in the digital realm, especially with the rise of social media. AF: For each of our sessions, well be looking at examples and case studies from recent events to explore how legal and ethical aspects intersect to influence research using digital content. We will talk about a selfie monkey who raised some fundamental questions about the nature of copyright and ownership. Well explore the recent Cambridge Analytica case and the sometimes unexpected (or unauthorised) uses social media data might get exposed to. Well also talk about the rise of “big data” and the problems surrounding “anonymous” data in this era of cloud computing. JK: We hope youll stick around for the whole module and work your way through the materials weve provided here. Youll find that there is a bit of something for everyone, whether youre completely new to the idea of research ethics, or if youre an old hand and want to dig deeper into some of the complexities of the issues. "],
["exploring-the-world-of-user-generated-data.html", "Chapter 3 Exploring the world of user-generated data 3.1 Video 3.2 Transcript (Jeremy Kidwell speaking)", " Chapter 3 Exploring the world of user-generated data 3.1 Video 3.2 Transcript (Jeremy Kidwell speaking) Youve probably heard by now of the company “Cambridge Analytica” recently renamed to Emerdata. As several media outlets reported in 2017, a little known firm called Cambridge analytica surprised many by claiming that their “evolutionary approach to data-driven communication has played such an integral part in President-elect Trumps extraordinary win.” As details emerged, it became clear that this was not mere bluster, but that this firm had managed to amass a trove of personal data about individuals, as the Washington Post suggested, up to 5000 pieces of data on each American citizen and then sought to nudge or manipulate voting behaviours by creating highly-targetted content, including ads on major social media platforms and so-called “fake news” stories. Data ethics is always easier in hindsight, but Id like to nonetheless look into the structure of this data collection to raise some issues about how data gets “out there” in the first place. Facebook is a central character in this story about data and this isnt surprising given their dominance of internet communication in recent years. In some cases, more persons answering surveys claim to be using facebook than the internet. While this is logically impossible - facebook is merely a service which sits on top of the internet, at least for now it gets towards the ubiquity of facebook use. Given this centrality, it is sensible to begin our look here to see how things are in terms of data. The story of privacy and data protection on facebook is, to be generous, an evolving one. Much of the data that users put on facebook was completely public until 2012, including the complete catalogue of your “likes”. For a company like Cambridge Analytica, this information was pure gold - enabling them to build up what psychologists call a “psychometric” profile using this data. If this information was on the internet in plain sight, could any user have assumed that their activity on facebook was private? Should they have? Since likes were made private, facebook has had a number of “gaffes” in which new features or bugs have forced this data back out into the public. Much of the reporting of the cambridge analytica scandal have referred to their access of data as a “breach” implying that Facebook had been trying to keep data that users generated private in good faith and that this company had found improper or possibly even illegal ways to harvest it, but this is actually quite misleading. Companies like CA and it is worth noting that there are probably hundreds of other similar operations which have been harvesting similarly massive datasets - can put together millions of tiny pieces of tiny information scattered across the internet - the number of contacts you have on a social network platform, or the number of profile pictures youve cycled through, hint at personality traits. The controversial part that some persons are (in my opinion inaccurately) calling a breach relates to another approach that CA took on, shortly after facebook began to make its data privacy policies a bit less free-wheeling. They used Amazons mechanical turk platform, where companies can hire consultants to do tiny tasks for small sums of money, sometimes just a single pound (or dollar in this case) to answer a personality survey. Over 200k persons answered this survey, which had a hidden gem at the end - users were asked to share their facebook profiles, with their (now private) likes and friends. Thousands compiled unwittingly. Some people who took the survey complained to Amazon that this violated Amazons terms of service, but Amazon didnt discontinue the surveys until more than a year later. Is this kind of data collection ethical? Well, Ill get into these kinds of questions from the perspective of a researcher a bit later, after we hear about the monkey selfie. For now, I want us to start thinking about ourselves as generators of data. This is a good ethical exercise, to place ourselves in a situation and see how we feel - so that we turn this dynamic around and begin to think of ourselves as collectors (and not producers) of data, we have some sensitivity to how things might be a bit complex. For this session, wed like to have you try a few exercises which will get you acquainted with the idea of “terms and conditions”. Youve likely seen dozens of T&Cs as theyre called by now, but because theyre all in legalese and often dozens of pages long, we hardly ever read them. In fact, the Guardian reported in 2011 that less than 7% of Britons ever read T&Cs and that 1/10 would rather read the whole phone book. Another more recent study found that only 1 in 4 students take the time to read terms and conditions. Jonathan Obar at York University did a study which found that it would take the average user 40 minutes a day to actually read through privacy and T&C documents in which theyre implicated. Yep, thats 40 minutes out of every single day. Whether this situation is deliberate as some scholars have suggested, or merely an unforunate accident, theres a problem here relating to user literacy of data policies. So were going to ask you to actually read through one of these documents and then to debrief how this knowledge changes your perspective on putting your data on social network platforms. Were also going to ask you to do an informal study of a digital chat. We hope youll find this exercise illuminating, and will look forward to telling you about that monkey selfie in our next session. "],
["exercise-1.html", "Chapter 4 Exercise 1", " Chapter 4 Exercise 1 As we mentioned in the video, wed like you to begin by reading the Terms and Conditions for Facebook. Before you begin read, write some notes on what you expect to find and how you think it will be structured. Open the T&Cs for Facebook here: (https://www.facebook.com/terms) When youre finished, take a moment to reflect on your reactions to the T&Cs. Write at least two paragraphs summarising your experience of reading, what you found surprising, what you expected but didnt find there, and generally what you took away from the experience of reading. Now, read over one of the following summaries which have been writtn about Facebooks T&Cs: Vice, “I Asked a Privacy Lawyer What Facebooks New Terms and Conditions Will Mean for You” Huffington Post, “Didnt Read Facebooks Fine Print? Heres Exactly What It Says” It is worth noting: these two documents detail the terms and conditions in 2014-2015. A very important part of the story about online T&Cs, however, is that they are constantly changing (read more about this here). Taking the Facebook T&Cs document as an example for your own posting on various online platforms (or imagining that you do) how does this knowledge change your perspective on putting your data on social network platforms? If youre participating in a course at Birmingham University, please share this reflection on the canvas forum and react in writing to what at least two other course participants have written. "],
["exercise-2-documentary-analysis.html", "Chapter 5 Exercise 2 - documentary analysis", " Chapter 5 Exercise 2 - documentary analysis For your next exercise, wed like to stick with the facebook theme and have you to conduct an informal study of a digital chat in an online forum. Facebook has a range of privacy settings for their groups (you can read more about public, closed and secret fb groups here and here), but for this exercise, wed like you to focus on one of the large public groups, for reasons which will become clear below. If you have a facebook account, and dont mind using it (we will completely understand if you dont or prefer not to - see below for more on this), navigate to the Facebook group directory: (https://www.facebook.com/groups/?category=discover). For this exercise, you need to avoid groups which are “closed” or “private” even if youre already a member. Facebook does not have a feature which you can use to search for a public (as opposed to private or closed) group, so your best bet is to run a search for the term “public group”. Be careful, some groups may have chat with offensive content. Try to avoid groups which centre around sensitive topics. For this example, we ran a search for “public group” and in the list which came up, one was a group called “Tea & Empathy (PUBLIC GROUP)” which the description tells us is “… a national, informal, peer-to-peer support network aiming to foster a compassionate and supportive atmosphere throughout the NHS.” Search for a different group, but make sure it has at least 500 members, so that there will be an active chat for the exercise below. Now, click on “discussion” on the bar to the left to filter just discussion postings. Spend an hour reading through the threads and taking some notes towards a documentary analysis. What types of information are people sharing in the group? Are there any persons contributing data to this group sharing personal information? Now that you have a level of understanding of the facebook T&Cs can you find any data here that a person might not want to post if they had the same level of understanding? Do you think that it would be ethical for a scholarly researcher or marketing consultant to make use of this data? Write up a brief (but at least two paragraphs - 8-10 sentences) report. This should be strictly anonymised, i.e. dont mention any specific users in your report. Imagine that you are a consultant who has been hired to detect user literacy about platform T&Cs. Summarise what youve found and your appraisal of the nature of this online chat. Note: if youd prefer to avoid using facebook. Heres an alternative approach: check out one of the major alternative chat platforms. Wed recommend you start with something like StackExchange and browse to their directory of sites. Scroll through the Q&A and see if you can answer the questions as above. "],
["exercise-3-reading.html", "Chapter 6 Exercise 3 - reading!", " Chapter 6 Exercise 3 - reading! Based on what youve found provocative or interesting so far, spend some time reading further about the content weve discussed. There are a range of options weve highligted below. "],
["reading.html", "Chapter 7 Reading:", " Chapter 7 Reading: Survey re: facebook users https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election https://www.fastcompany.com/40550423/how-facebook-blew-it https://www.npr.org/2016/08/23/491024846/do-you-read-terms-of-service-contracts-not-many-do-research-shows https://www.theatlantic.com/technology/archive/2012/03/reading-the-privacy-policies-you-encounter-in-a-year-would-take-76-work-days/253851/ http://journals.uic.edu/ojs/index.php/fm/article/view/7350 "],
["other-media.html", "Chapter 8 Other media:", " Chapter 8 Other media: Film: “Terms and Conditions May Apply” summarised in the Huffington Post "],
["copyright-licenses-and-data-as-property.html", "Chapter 9 Copyright, licenses, and data as property?", " Chapter 9 Copyright, licenses, and data as property? iframe / video here transcript below: Selfie monkey UK copyright talks about the author of a work being the person that creates the copyright work: the person that holds the pen or paintbrush the musician strumming the guitar. If the creator is employed or commissioned to produce the work their contract may say something different. Copyright applies to written text based works, images, music, film and various other media. It arises as soon as the content is create and does not require any registration, or expression of the right. It is just exists. Copyright applies equally in the physical and digital world and typical lasts for 70 years after the death of the author. Copyright in a journal article is the same in the print and the e version. Webpages and content found online then is subject to the usual copyright rules. It requires no registration and it doesnt matter if something has the copyright symbol on it or- it is copyright protected. If it has copyright then, someone owns it. If someone owns copyright, that gives them the right to control who has access to it as well as if and how they can use it. Just because something is online and open for access, it does not necessarily mean that it is free to be used. Public domain in the openly available, in the out there sense does not mean it is Public Domain, that is free from any rights. Copyright does allow some uses without the permission or consent of the author or owner. What this means is that although the owner can control who accesses and uses their work, the public can still use it in certain limited situations. These permitted acts include non-commercial research, and criticism and review. More details on these exceptions in a research context are provided in the articles complementing this section. Some of you may have heard of a photographer name David Slater who spent some time in the jungles of Indonesia. He found a troupe of Crested Black Macaques and started taking pictures. Such is their nature that they were curious and started to explore the equipment and ended up taking lots and lots of images. (insert Monkey selfie image & reference) Im sure youll agree that the photo is fantastic and David thought so to. When he got home began emailing publishers and newspapers and eventually in was published in the UK press- thats when the trouble started. The image was picked up by other blogs and newspapers and eventually by Wikipedia who used it on their definition of the species , . (Interestingly Wikipedia also now has a page on the dispute itself .) David complained claiming his copyright and licensing fees over the image. Is he right? Who owns the copyright? The photographer or the macaque? These are questions that have rumbled on since 2011 when David first released the image and subsequently when Wikipedia used the image. They found themselves at the heart of a public debate on to mainstream media when PETA sued David on behalf of the Macaque claiming the copyright on their behalf. Even Mock the Week the week got in on the action . While the copyright debate continues to puzzle the season Intellectual Property professionals and international media alike , David himself says the financial and personal cost to him has been huge. Speaking in July 2017 David was unable to represent himself in the PETA case and is struggling financially, despite the image being internationally famous. Regardless of the copyright aspect, we need to be aware that our actions can impact on the livelihood of individuals and their dependents. What this all means then that copyright in the digital world is a complex but important part of research literacy. Researchers and professionals should factor in copyright as important element when considering what you content might use in a research context. We can see that copyright is not the only or most important consideration, despite what I might think. It should be balanced with other components of the research activity. Wed now like to you go through a couple of exercises looking at copyright in the digital world. Look back at the Facebook terms think about who owns data you post on Facebook? What permission does Facebook have to use your content? Your pictures? Were you aware of this? Following that well get you to look at copyright licences and consider how you might license your content, if you were a copyright owner. Well also get you to explore the nuance of the Monkey Selfie in some more detail. "],
["confidentiality-anonymity-consent-and-privacy.html", "Chapter 10 Confidentiality, anonymity, consent, and privacy", " Chapter 10 Confidentiality, anonymity, consent, and privacy iframe / video here "],
["transcript-of-the-video.html", "Chapter 11 Transcript of the video:", " Chapter 11 Transcript of the video: Data is used in a huge variety of ways, in fact, with the rise of digital platforms, social media, and big data, as we highlighted in our second session, few few aspects of our lives are “off limits”. To be fair, its not as if platforms like Google and Facebook have invented the idea of “data,” for decades scholarly researchers have been collecting and analysing data in ways that they hope will improve the quality of life for countless individual people. Let me give you one example of what I see as a benevolent kind of data collection. In recent years, DNA sequencing has become incredibly fast, efficient, and also inexpensive. Many issues with human health and disease have genetic components, and access to information about a range of human genomes can really help researchers to accelerate the pace of research and look for new areas of impact. Though there are some movements encouraging people to share personal genetic data openly (especially given the way some of these genomic databases are privatised and expensive to access) there are expectedly rare. Instead, in most cases people who contribute their personal genomic data to research do so on the expectation that their contributions will be made confidential or anonymous. Im sure that some of you will already be aware of research concepts like confidentiality, but its helpful to pause and highlight what are three fundamental aspects of empirical research before we continue this discussion. The first “core concept” in research ethics is Informed consent: Regardless of privacy concerns, it is a fundamental right of any person participating in research that they consent to participation in a given study. Though we may imagine that consent can be implied sometimes, our ability to “read” people simply isnt that trustworthy. Consent cannot be assumed. This means that research subjects must be informed of the study, how it will work, and what the data will be used for - and then given the opportunity to choose whether to participate or not. The research arm of Facebook conducted a study in 2012, the results of which were published in the journal PNAS. For a brief period, they deliberately skewed the news for 689,003 people to emphasise either positive emotional content or negative emotional content. Researchers wanted to see if this had an impact on the way users used the platform. What they found was not surprising, it did have an impact. What did surprise this research team was the outcry by a wide range of academic researchers suggesting (rightly) that they had not observed the principle of informed consent. You can read the eventual expression of “editorial concern” by PNAS editors to see their repentance for publishing this article. Informed consent isnt the whole picture, however. In many cases, as Ive already suggested above, users might not be willing to participate in a study if their resposes were to be made public. So, one of the fundamental ways that researchers approach data ethics is to remove personally identifiable data. In some cases, identities are known to researchers but made confidential, such that the readers of research publication cannot identify individual persons. In other stricter cases, research subjects are anonymised, so that even researchers in their notes do not preserve the specific identity of the people contributing data. They are just “subject 1” “subject 2” and so on. [lower thirds information with key terms here in videography] In 2013 a team of scientists from MIT released a study with shocking results. Through some clever data analysis techniques, they had found a way to de-anonmise a major genomic database. You can read more about this later. There have also been subsequent studies by other research teams de-anonymising other anonymised research databases. Much like cambridge analytica, these researchers gathered together millions of small fragments of data together and then tested for compatibility against these databases. In the case of the work by this team at MIT, they were able to de-anonymise 50 out of 1000 subjects in that study. In other cases the rate has been much worse. So what does this mean for research ethics? Simply put, in this new internet age, we can just use “subject 1” and “subject 2”. Our computers are powerful enough, and our lives public enough that clever people can reverse engineer them. There are two possible ways to respond to this new context for digital data. First, researchers can try to lock down their data sets, ensuring that they are stored in secure digital repositories. What makes a digital repository secure? A whole lot of things - good passwords (and not ones which need to be changed every six months), strong and consistent network security and data privacy policies, clear ways to classify the levels of sensitivity of a given data-set, well-trained staff… etc. Ive included some articles you can read here if you want to learn a bit more about how to set up a proper data vault. But as the range of high-profile hacks in recent years indicate, organisations very rarely check all these boxes, and so even secured data can be at risk of breach. This continued risk has led some researchers to take up another option, to pursue alternative research methods which do not emphasise privacy and confidentiality, but rather active participation by research subjects. To be fair, this will only work in certain contexts, definitely not in the context of highly sensitive data or in studies conducted with vulnerable people. But many researchers are having innovative and surprising results with participatory studies. We can involve research subjects in the design of a study, enable them to help us interpret the results, and make their voices known. The advantage here is that a study turns from a faceless mass into a specific group of unique people. This is a brave new digital world that were working in, to be sure. And it can be easy to ignore the risks associated with data generation and use in our excitement for new research and horizons to be explored. Research is fun, exciting, and often empowering. I like to say that the task of a researcher is to help people tell their stories. We need to be sure that we do this in a way that empowers those individuals and not just ourselves. And this means that we need to make responsible choices about research design and data management. In the tasks for this session, were going to have you do a deep dive into some of the studies weve mentioned briefly here and then to write up a brief reflection on what youd like to make your own research design ethic. In the next session, were going to wrap-up and try to explore some of the ways that we can move forward as researchers in the midst of these very complicated scenarios that weve shared with you so far. "],
["resources.html", "Chapter 12 Resources:", " Chapter 12 Resources: “Genealogy Databases Enable Naming of Anonymous DNA Donors” http://science.sciencemag.org/content/339/6117/262 The full technical report, “Identifying Personal Genomes by Surname Inference” http://www.pnas.org/content/pnas/111/24/8788.full.pdf PNAS Editorial expression of concern: https://www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule/index.html "],
["final-steps-how-do-we-decide-what-to-do.html", "Chapter 13 Final Steps - How do we decide what to do?", " Chapter 13 Final Steps - How do we decide what to do? "]
]