August 2022

IZA DP No. 15478: We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments

Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.