A large body of research across management, psychology, accounting, and economics shows that subjective performance evaluations are systematically biased: ratings cluster near the midpoint of scales and are often excessively lenient. As organizations increasingly adopt large language models (LLMs) for evaluative tasks, little is known about how these systems perform when assessing human performance. We document that, in the absence of clear objective standards and when individuals are rated independently, LLMs reproduce the familiar patterns of human raters. However, LLMs generate greater dispersion and accuracy when evaluating multiple individuals simultaneously. With noisy but objective performance signals, LLMs provide substantially more accurate evaluations than human raters, as they (i) are less subject to biases arising from concern for the evaluated employee and (ii) make fewer mistakes in information processing closely approximating rational Bayesian benchmarks.
Rilke, R. M. & Sliwka, D. (2026). When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?. IZA Discussion Paper, 18371.
Chicago
Rainer Michael Rilke and Dirk Sliwka. "When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?." IZA Discussion Paper, No. 18371 (2026).
Harvard
Rilke, R. M. and Sliwka, D., 2026. When Algorithms Rate Performance: Do Large Language Models Replicate Human Evaluation Biases?. IZA Discussion Paper, 18371.
We use cookies to provide you with an optimal website experience. This includes cookies that are necessary for the operation of the site as well as cookies that are only used for anonymous statistical purposes, for comfort settings or to display personalized content. You can decide for yourself which categories you want to allow. Please note that based on your settings, you may not be able to use all of the site's functions.
Cookie settings
These necessary cookies are required to activate the core functionality of the website. An opt-out from these technologies is not available.
In order to further improve our offer and our website, we collect anonymous data for statistics and analyses. With the help of these cookies we can, for example, determine the number of visitors and the effect of certain pages on our website and optimize our content.