Earning While Learning: How to Run Batched Bandit Experiments

Cookie settings

Necessary

These necessary cookies are required to enable the core functionality of the website. Opting out of these cookies is not possible.

cb-enable

This cookie stores the user's cookie consent status for the current domain. Expiry: 1 year.

laravel_session

Stores the session ID to recognize the user when the page reloads and to restore their login session. Expiry: 2 hours.

XSRF-TOKEN

Provides CSRF protection for forms. Expiry: 2 hours.

Home
Publications
IZA Discussion Papers
Earning While Learning: How to Run Batched Bandit Experiments

IZA Discussion Paper No. 18429

February 2026

Earning While Learning: How to Run Batched Bandit Experiments

Jan Kemper, Davud Rostam-Afschar

Researchers typically collect experimental data sequentially, allowing early outcome observations and adaptive treatment assignment to reduce exposure to inferior treatments. This article reviews multi-armed-bandit adaptive experimental designs that balance exploration and exploitation. Because adaptively collected experimental data through bandit algorithms violate standard asymptotics, inference is challenging. We implement an estimator that yields valid heteroskedasticity-robust confidence intervals in batched bandit designs and compare coverage in Monte Carlo simulations. We introduce bbandits for Stata, a tool for designing experiments via simulation, running interactive bandit experiments, and implementing and analyzing adaptively collected data. bbandits includes three common assignment algorithms—ε-first, ε-greedy, and Thompson sampling—and supports estimation, inference, and visualization.

Download

Keywords

randomized controlled trial causal inference multi-armed bandits experimental design machine learning

JEL Codes

C1 C11 C12 C13 C15 C18 C8 C87 C88 C9 D83