# Exploring the world of user-generated data ## Video ## Transcript (Jeremy Kidwell speaking) You've probably heard by now of the company "Cambridge Analytica" [recently renamed to Emerdata](https://www.theregister.co.uk/2018/05/02/cambridge_analytica_shutdown/). As several media outlets reported in 2017, a little known firm called Cambridge analytica surprised many by claiming that their "evolutionary approach to data-driven communication has played such an integral part in President-elect Trump's extraordinary win." As details emerged, it became clear that this was not mere bluster, but that this firm had managed to amass a trove of personal data about individuals, as the Washington Post suggested, up to 5000 pieces of data on each American citizen and then sought to nudge or manipulate voting behaviours by creating highly-targetted content, including ads on major social media platforms and so-called "fake news" stories. Data ethics is always easier in hindsight, but I'd like to nonetheless look into the structure of this data collection to raise some issues about how data gets "out there" in the first place. Facebook is a central character in this story about data and this isn't surprising given their dominance of internet communication in recent years. In some cases, more persons answering surveys claim to be using facebook than the internet. While this is logically impossible - facebook is merely a service which sits on top of the internet, at least for now – it gets towards the ubiquity of facebook use. Given this centrality, it is sensible to begin our look here to see how things are in terms of data. The story of privacy and data protection on facebook is, to be generous, an evolving one. Much of the data that users put on facebook was completely public until 2012, including the complete catalogue of your "likes". For a company like Cambridge Analytica, this information was pure gold - enabling them to build up what psychologists call a "psychometric" profile using this data. If this information was on the internet in plain sight, could any user have assumed that their activity on facebook was private? Should they have? Since likes were made private, facebook has had a number of "gaffes" in which new features or [bugs](https://www.cnbc.com/2018/06/07/facebook-bug-made-private-posts-of-up-to-14-million-users-public.html) have forced this data back out into the public. Much of the reporting of the cambridge analytica scandal have referred to their access of data as a "breach" implying that Facebook had been trying to keep data that users generated private in good faith and that this company had found improper or possibly even illegal ways to harvest it, but this is actually quite misleading. Companies like CA – and it is worth noting that there are probably hundreds of [other similar operations](https://www.zdnet.com/article/data-firm-leaks-48-million-user-profiles-it-scraped-from-facebook-linkedin-others/) which have been harvesting similarly massive datasets - can put together millions of tiny pieces of tiny information scattered across the internet - the number of contacts you have on a social network platform, or the number of profile pictures you've cycled through, hint at personality traits. The controversial part that some persons are (in my opinion inaccurately) calling a breach relates to another approach that CA took on, shortly after facebook began to make its data privacy policies a bit less free-wheeling. They used [Amazon's mechanical turk platform](https://www.fastcompany.com/40548348/how-amazon-helped-cambridge-analytica-harvest-americans-facebook-data), where companies can hire consultants to do tiny tasks for small sums of money, sometimes just a single pound (or dollar in this case) to answer a personality survey. Over 200k persons answered this survey, which had a hidden gem at the end - users were asked to share their facebook profiles, with their (now private) likes and friends. Thousands compiled unwittingly. Some people who took the survey complained to Amazon that this violated Amazon's terms of service, but Amazon didn't discontinue the surveys until more than a year later. Is this kind of data collection ethical? Well, I'll get into these kinds of questions from the perspective of a researcher a bit later, after we hear about the monkey selfie. For now, I want us to start thinking about ourselves as generators of data. This is a good ethical exercise, to place ourselves in a situation and see how we feel - so that we turn this dynamic around and begin to think of ourselves as collectors (and not producers) of data, we have some sensitivity to how things might be a bit complex. For this session, we'd like to have you try a few exercises which will get you acquainted with the idea of "terms and conditions". You've likely seen dozens of T&C's as they're called by now, but because they're all in legalese and often dozens of pages long, we hardly ever read them. In fact, the Guardian reported in 2011 that [less than 7% of Britons](https://www.theguardian.com/money/2011/may/11/terms-conditions-small-print-big-problems) ever read T&C's and that 1/10 would rather read the whole phone book. Another [more recent study](https://www.theguardian.com/technology/2017/mar/03/terms-of-service-online-contracts-fine-print) found that only 1 in 4 students take the time to read terms and conditions. Jonathan Obar at York University did a study which found that it would take the average user 40 minutes a day to actually read through privacy and T&C documents in which they're implicated. Yep, that's 40 minutes out of every single day. Whether this situation is deliberate as some scholars have suggested, or merely an unforunate accident, there's a problem here relating to user literacy of data policies. So we're going to ask you to actually read through one of these documents and then to debrief how this knowledge changes your perspective on putting your data on social network platforms. We're also going to ask you to do an informal study of a digital chat. We hope you'll find this exercise illuminating, and will look forward to telling you about that monkey selfie in our next session. # Exercise 1 As we mentioned in the video, we'd like you to begin by reading the Terms and Conditions for Facebook. Before you begin read, write some notes on what you expect to find and how you think it will be structured. Open the T&Cs for Facebook here: (https://www.facebook.com/terms) When you're finished, take a moment to reflect on your reactions to the T&Cs. Write at least two paragraphs summarising your experience of reading, what you found surprising, what you expected but didn't find there, and generally what you took away from the experience of reading. Now, read over one of the following summaries which have been writtn about Facebook's T&C's: - Vice, ["I Asked a Privacy Lawyer What Facebook's New Terms and Conditions Will Mean for You"](https://www.vice.com/en_us/article/kwp5vx/i-asked-a-lawyer-how-facebooks-new-terms-will-affect-my-online-life-183) - Huffington Post, ["Didn't Read Facebook's Fine Print? Here's Exactly What It Says"](https://www.huffingtonpost.co.uk/entry/facebook-terms-condition_n_5551965?guccounter=1) It is worth noting: these two documents detail the terms and conditions in 2014-2015. A very important part of the story about online T&Cs, however, is that they are constantly changing ([read more about this here](https://www.wired.com/story/facebook-a-history-of-mark-zuckerberg-apologizing/?mbid=BottomRelatedStories)). Taking the Facebook T&Cs document as an example for your own posting on various online platforms (or imagining that you do) how does this knowledge change your perspective on putting your data on social network platforms? If you're participating in a course at Birmingham University, please share this reflection on the canvas forum and react in writing to what at least two other course participants have written. # Exercise 2 - documentary analysis For your next exercise, we'd like to stick with the facebook theme and have you to conduct an informal study of a digital chat in an online forum. Facebook has a range of privacy settings for their groups ([you can read more about public, closed and secret fb groups here](https://www.eff.org/deeplinks/2017/06/understanding-public-closed-and-secret-facebook-groups) and [here](https://www.lifewire.com/facebook-groups-4103720)), but for this exercise, we'd like you to focus on one of the large public groups, for reasons which will become clear below. If you have a facebook account, and don't mind using it (we will completely understand if you don't or prefer not to - see below for more on this), navigate to the Facebook group directory: (https://www.facebook.com/groups/?category=discover). For this exercise, you need to avoid groups which are "closed" or "private" even if you're already a member. Facebook does not have a feature which you can use to search for a public (as opposed to private or closed) group, so your best bet is to run a search for the term "public group". Be careful, some groups may have chat with offensive content. Try to avoid groups which centre around sensitive topics. For this example, we ran a search for "public group" and in the list which came up, one was a group called "Tea & Empathy (PUBLIC GROUP)" which the description tells us is "... a national, informal, peer-to-peer support network aiming to foster a compassionate and supportive atmosphere throughout the NHS." Search for a different group, but make sure it has at least 500 members, so that there will be an active chat for the exercise below. Now, click on "discussion" on the bar to the left to filter just discussion postings. Spend an hour reading through the threads and taking some notes towards a documentary analysis. What types of information are people sharing in the group? Are there any persons contributing data to this group sharing personal information? Now that you have a level of understanding of the facebook T&Cs can you find any data here that a person might not want to post if they had the same level of understanding? Do you think that it would be ethical for a scholarly researcher or marketing consultant to make use of this data? Write up a brief (but at least two paragraphs - 8-10 sentences) report. This should be strictly anonymised, i.e. don't mention any specific users in your report. Imagine that you are a consultant who has been hired to detect user literacy about platform T&Cs. Summarise what you've found and your appraisal of the nature of this online chat. Note: if you'd prefer to avoid using facebook. Here's an alternative approach: check out one of the major alternative chat platforms. We'd recommend you start with something like StackExchange and browse to their [directory of sites](https://stackexchange.com/sites#lifearts). Scroll through the Q&A and see if you can answer the questions as above. # Exercise 3 - reading! Based on what you've found provocative or interesting so far, spend some time reading further about the content we've discussed. There are a range of options we've highligted below. # Reading: - Survey re: facebook users - https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win - https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election - https://www.fastcompany.com/40550423/how-facebook-blew-it - https://www.npr.org/2016/08/23/491024846/do-you-read-terms-of-service-contracts-not-many-do-research-shows - https://www.theatlantic.com/technology/archive/2012/03/reading-the-privacy-policies-you-encounter-in-a-year-would-take-76-work-days/253851/ - http://journals.uic.edu/ojs/index.php/fm/article/view/7350 # Other media: - Film: ["Terms and Conditions May Apply"](http://tacma.net/) summarised in the [Huffington Post](https://www.huffingtonpost.com/mark-weinstein/terms-and-conditions-may-_b_3692883.html)