For social scientists, Amazon’s Mechanical Turk (MTurk) has been a gold mine, allowing rapid data collection for thousands of social science studies. This past summer, this gold mine seemed to be collapsing. Data quality appeared to have suddenly plummeted, with some researchers reporting high rates of inconsistent and even random responses. Until now, researchers have had to guess why.
By analyzing respondents’ IP addresses, we were able to identify one of the major culprits: respondents, many from Venezuela, using virtual private servers (VPSs) to fraudulently join studies. While the problem is not new, our analysis suggests that it may have spiked in the past several months, potentially undermining hundreds or even thousands of studies. We’ve developed tools to identify fraudulent respondents and to block them from future studies.
Here’s the background
MTurk is an Amazon online platform that allows people to recruit workers (called “Turkers”) from all over the world for brief paid tasks. (Amazon’s founder, Jeffrey P. Bezos, also owns The Washington Post.) Researchers post a request. Turkers accept it, complete the task and are paid a small amount. While MTurk did not start as a platform for conducting social science surveys, researchers increasingly use it because they can rapidly recruit survey participants at a fraction of the cost of traditional platforms. MTurk offers a fairly diverse though still limited sample.
Early studies suggested that using Turkers resulted in similar- or better-quality data than laboratory studies, student samples and even costly survey firm panels. So it’s little surprise that searching Google Scholar turns up over 15,000 studies mentioning MTurk since 2014.
And so researchers panicked when, this past August, a University of Minnesota graduate student, Max Hui Bai, posted a query on a Facebook group for psychology researchers, asking whether anyone else had noticed a drop in data quality. In short: Yes, overwhelmingly so. Scholars from around the world reported abnormalities in their data: Respondents providing inconsistent responses to questions, nonsensical responses to open-ended questions and well-established experiments showing aberrant results. The discussion spread across social media, making it into New Scientist and Wired.
Some researchers suspected that the problems were coming from “bots” (computer code that automatically answers questions) or “cyborgs” (code that automatically answers some questions and has humans answer others). But soon three researchers — Sean A. Dennis, Brian M. Goodson and Chris Pearson — determined that international respondents were answering questions intended only for U.S. respondents. These “survey farmers” falsified data and used Virtual Private Servers (VPS) to make it seem as though they were answering questions from inside the United States. These respondents, some of whom did not speak English well, were more likely to give random responses. All this warped inferences about U.S. voting behavior, public opinion or cultural norms — and seriously, as at least one study found about 25 percent of its respondents were suspicious.
Here’s how we did our research
We conducted our own audit of 30 U.S. studies fielded on MTurk since 2013, encompassing 21,418 respondents. Using an IP traceback service (IP Hub), we found a severe spike in non-U.S. and VPS respondents since February 2018. This is bad news on two fronts. Not only did we find almost 25 percent of our respondents in a September 2018 study had come from a non-U.S. IP address or a VPS, but also that far more studies than previously thought may have been affected.
Percent of fraudulent respondents in MTurk surveys from 2013 to 2018
Of course, some U.S. respondents may use a VPS to protect their privacy or when they are traveling, rather than to fill out surveys fraudulently. To test how much VPS use is problematic, we fielded two new surveys, covering 2,010 respondents. In both surveys we found that VPS users failed more attention and quality checks than any other group, including those responding from a foreign IP address.
Some researchers suggest that the fraudulent respondents were based in India. We looked at the number of international respondents who apparently forgot to turn on their VPSs, allowing us to see from where they were connecting with the Internet. While a number of connections were from India, making up about 12 percent of the international IPs, even more — almost 18 percent — came from Venezuela.
We also combed online forums for MTurk users and found several Venezuelan Turkers who bragged about subverting the restrictions on international users. One detailed how he would acquire Amazon.com credit from MTurk, use that credit to purchase cellphones, and then have a friend in Miami ship him the cellphones.
All of this suggests that the MTurk crisis may have a surprising root. Venezuela’s economic crisis, with inflation heading toward 1 million percent. Some desperate Venezuelans are using online games to win virtual goods that they can sell for real money. Something similar seems to be happening on MTurk.
What can be done?
First, researchers must follow the standard best practices when conducting MTurk research. That includes setting the “HIT [an MTurk term for ‘task’] Approval Rate (%)” above 95 percent and the “Number of HITs Approved” to at least 100, which substantially improves data quality.
But that may not be sufficient. We recommend that scholars audit IP address metadata to identify international and VPS users. Researchers should warn workers that their IP location will be checked and that “farmers” may not be paid.
Our group has developed a software package, available on GitHub, CRAN or as an online app that enables MTurk researchers to audit their own data. We have also developed a Qualtrics protocol for screening out such respondents.
MTurk is too valuable to abandon. Many researchers, especially graduate students, cannot afford other options. But these measures should make a difference.
Ryan Kennedy (@RyanKennedy7) is an associate professor of political science at the University of Houston, director of the Machine-Assisted Human Decision-making (MAHD) Lab and associate director for analytics for the Initiative on Sustainable Energy Policy (ISEP).
Scott Clifford (@ScottClif) is an associate professor of political science at the University of Houston.
Tyler Burleigh (@tylerburleigh) is a psychology research scientist at Data Cubed.
Philip Waggoner (@philipdwaggoner) is a visiting assistant professor of government and a faculty affiliate at the Social Science Research Methods Center at the College of William Mary, and a research associate of the MAHD Lab.
Ryan Jewell (@RyMJewell) is a graduate student in political science at the University of Houston and a research associate of the MAHD Lab.
This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2017-17061500006. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. government. The U.S. government is authorized to reproduce and distribute reprints for governmental purposes, notwithstanding any copyright annotation therein.