Facebook recently announced they are interested in recruiting sociologists to conduct research for their ever-growing empire. The particular article I am referring to is found here. The article notes that the two most enticing factors for sociologists are Facebook’s large data sets and the naturalness of the data. The latter factor is unsubstantiated and the point of interest for this article. The article did not mention any particular types data or research questions so it is assumed that they were referring to all Facebook data. Below is an excerpt from the VentureBeat article.
The data set available at Facebook is incredible. One reason is just the sheer scale of the data. While sociologists usually don’t have the resources to interview or survey millions of people, Facebook has data generated every day by its 802 million daily active users.
The second reason is the naturalness of the data. Sociologists typically use interview, survey, and ethnography to collect data.
“So I give you a survey you fill it out, which is very artificial,” said Laura Nelson, PhD candidate in Sociology at UC Berkeley, in an interview with VentureBeat.
“Whereas ethnography, as soon as you walk into the room, you change that room, because you are a foreign presence. There’s a scientist in the room. People get self-conscious. They don’t act naturally.”
In comparison, Facebook data is not influenced by the presence of a social science researcher. “It has no artificially construct, you are not bringing people to the lab,” Nelson said. “So you are recording social interaction in real time as it occurs completely naturally.”
I agree, Facebook has access to large amounts of data, which is great for social scientists, but it is by no means more natural than survey data and hardly natural itself. Nelson claims that with Facebook you can record “social action in real time as it occurs completely naturally.” If the goal is to use Facebook data to make generalizations about society outside of the internet then this statement cannot be any further from the truth. The only naturalness about Facebook data is that it is naturally artificial. As of this moment without any improvements to Facebook’s coding and data collection methods the only thing Facebook data is good for is telling you how people use Facebook. The data is not appropriate for telling you how typical variables (e.g., race, gender, sexuality, income, education) affect one another outside of Facebook.
Most of Facebook's data fits the definition of artificial (opposite of natural) to a T. Merriam-Webster defines artificial as “not happening or existing naturally: created or caused by people.” Even the more "natural" data Facebook gathers from spying on the websites we visit, what we buy, who we talk and the like is incomplete because it only knows what we do while on the internet. Therefore, Facebook only knows what Facebook sees and that is for now limited to internet activities. If researchers are asking research questions outside of the scope of online interaction then here is why Facebook data as of 2014, generally speaking, is artificial and not natural.
Facebook allows a space for users to put forth their best self-image. One of the most cited sociologists who studied the presentation of self his whole career once wrote:
When an individual appears before others, his actions will influence the definition of the situation which they come to have. Sometimes the individual will act in a thoroughly calculating manner, expressing himself in a given way solely in order to give the kind of impression to others that is likely to evoke from them a specific response he is concerned to obtain (Goffman, 1959).
Goffman called this process face-work. He defines face as “…the positive social value a person effectively claims for himself by the line others assume he has taken during a particular contact. Face is an image of self delineated in terms of approved social attributes.” In simpler terms, we can refer to face as one’s self-image. Not only do users manage their impressions (as they do in person) they have more time to carefully construct their performance online before clicking the post button. Herein lies the problem with Facebook data. It is extremely influenced by others.
Knowing that potentially hundreds of others will be forming opinions about your character influences what you share on Facebook. Often Facebook users embellish their best qualities and even fabricate fictional qualities or achievements. How is this not artificial? For example, Facebook users are not likely to change their employment status if fired or education statuses if they drop out. This would give a bad impression of them. How then can we say that Facebook data is not artificial when users’ actions are highly influenced by others perceptions? At least, survey participants can fill out their survey anonymously and not be influenced by hundreds or thousands or watchful friends.
While this article only scratches the surface as far as theory and research goes that address social networking and social media it demonstrates that Facebook data is all but natural and that social scientists and Facebook should consider whether this is a worthy venture or a waste of time. I have a few ideas as to how Facebook can improve their data collection, but why would anyone want to help a corporation invade people’s privacy any further than they already have. Social scientists beware.