TY  - RPRT
        AU  - Kemper, Jan
AU  - Rostam-Afschar, Davud
        TI  - Earning While Learning: How to Run Batched Bandit Experiments
        PY  - 2026/Feb/
        PB  - Institute of Labor Economics (IZA)
        CY  - Bonn
        T2  - IZA Discussion Paper
        IS  - 18429
        UR  - https://www.iza.org/publications/dp18429
        AB  - Researchers typically collect experimental data sequentially, allowing early outcome observations and adaptive treatment assignment to reduce exposure to inferior treatments. This article reviews multi-armed-bandit adaptive experimental designs that balance exploration and exploitation. Because adaptively collected experimental data through bandit algorithms violate standard asymptotics, inference is challenging. We implement an estimator that yields valid heteroskedasticity-robust confidence intervals in batched bandit designs and compare coverage in Monte Carlo simulations. We introduce bbandits for Stata, a tool for designing experiments via simulation, running interactive bandit experiments, and implementing and analyzing adaptively collected data. bbandits includes three common assignment algorithms—ε-first, ε-greedy, and Thompson sampling—and supports estimation, inference, and visualization.
        KW  - randomized controlled trial
KW  - causal inference
KW  - multi-armed bandits
KW  - experimental design
KW  - machine learning
        ER  -