We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments

Cookie settings

Necessary

These necessary cookies are required to enable the core functionality of the website. Opting out of these cookies is not possible.

cb-enable

This cookie stores the user's cookie consent status for the current domain. Expiry: 1 year.

laravel_session

Stores the session ID to recognize the user when the page reloads and to restore their login session. Expiry: 2 hours.

XSRF-TOKEN

Provides CSRF protection for forms. Expiry: 2 hours.

Home
Publications
IZA Discussion Papers
We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us abou...

IZA Discussion Paper No. 15478

August 2022

We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments

Abel Brodeur, Nikolai Cook, Anthony Heyes

Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.

Download

Keywords

online crowd-sourcing platforms Amazon Mechanical Turk p-hacking publication bias statistical power research credibility

JEL Codes

B41 C13 C40 C90