IZA@LISER Network

We use cookies to provide you with the best possible website experience. This includes cookies that are necessary for the operation of the site, as well as cookies used for anonymous statistics, comfort settings, or displaying personalized content. You can decide which categories you want to allow. Please note that depending on your settings, some features of the website may not be available.

Cookie settings

Necessary

These necessary cookies are required to enable the core functionality of the website. Opting out of these cookies is not possible.

cb-enable

This cookie stores the user's cookie consent status for the current domain. Expiry: 1 year.

laravel_session

Stores the session ID to recognize the user when the page reloads and to restore their login session. Expiry: 2 hours.

XSRF-TOKEN

Provides CSRF protection for forms. Expiry: 2 hours.

IZA@LISER Network | June 3, 2026

New research reveals that while AI mimics the human habit of being "too nice" in subjective reviews, it significantly outperforms us when evaluations are grounded in objective data.

AI is often assumed to be entirely objective, but a recent IZA Discussion Paper by Rainer Michael Rilke and Dirk Sliwka provides the first systematic evidence on how large language models (LLMs) behave when evaluating human performance—and whether they replicate or reduce well-known biases commonly observed when human managers rate employees.

Why AI hesitates to give low ratings

The authors show that when performance information is subjective or ambiguous, LLMs tend to behave much like human supervisors: they avoid the lowest rating categories, cluster heavily around the midpoint of the scale, and display a clear tendency toward leniency. This becomes especially visible when the model is asked to rate S&P 500 CEOs. Even when instructed to assign 20 percent of CEOs to each rating category, the LLM almost never uses the lowest category, mirroring the reluctance of human evaluators to issue very negative assessments.

Judging groups vs. individuals

When testing whether LLMs become more discerning by evaluating several individuals at once rather than one at a time, the results mirror decades of psychological research on human raters. The model becomes more differentiating when assessing groups of three or five CEOs simultaneously. Ratings spread out more, and relative differences become clearer. Yet the fundamental leniency persists, suggesting that the model’s learned habits—shaped by overwhelmingly positive or neutral human-written texts—continue to dominate whenever objective standards are missing.

The job application experiment

To introduce clearer benchmarks, the researchers also tested the AI on job applications whose quality levels were artificially constructed. An LLM evaluated these applications without knowing their true quality. Once again, individual evaluations show strong leniency and limited use of the lower categories. Comparative evaluations, however, lead to more variation and better alignment with the intended distribution, especially when the rating scale explicitly ties each score to a percentile range. Still, the model remains hesitant to classify any application as belonging to the bottom 20 percent, even when prompted to do so.

The power of objective data

The most decisive evidence comes from a controlled experiment in which human raters evaluated workers based on noisy but objective performance signals. Here, the LLM receives exactly the same information as the human evaluators. In this setting, the model performs remarkably well. It produces ratings that are substantially more accurate than those of human raters, shows no leniency bias, and closely approximates the mathematical ideal that represents the best possible use of the available information. Unlike humans, the LLM is unaffected by whether its rating influences a worker’s bonus, indicating that it does not display the social concerns or favoritism that often distort human evaluations.

What this means for management

Taken together, the findings reveal a clear pattern. When performance is subjective and evaluators must rely on general impressions, LLMs reproduce familiar human biases. When performance information is structured, comparable, and at least partly objective, LLMs can significantly outperform human raters. They process information more consistently and without social or emotional distortions. The results highlight both the promise and the limitations of using LLMs in organizational performance management. They are not a remedy for the challenges of subjective evaluation, but they can meaningfully improve accuracy in settings where objective signals exist and can be systematically interpreted.

Download the full paper here.

Related news

Browse all news

IZA@LISER Network | July 21, 2026

Inaugural IZA@LISER Conference in Labour Economics to be held in Luxembourg in December

Call for papers now open! Submissions invited in labour economics and adjacent fields

IZA@LISER Network | July 20, 2026

Beyond maturity: How school-entry rules reshape the life course

Two new studies trace the complex chain reaction connecting a child's first day of class to their adult livelihood.

IZA@LISER Network | July 10, 2026

How local heat exposure shapes mortality in cities

Rising temperatures call for a redesign of citywide preventive and emergency care strategies.

Communications

Mark Fallak

mark.fallak@liser.lu

+352 585-855-526

World of Labour

Olga Nottmeyer

olga.nottmeyer-ext@liser.lu

+352 585-855-501

Network Coordination

Christina Gathmann

christina.gathmann@liser.lu

The IZA@LISER Network is a global community of scholars dedicated to excellence in labor economics and related fields, now coordinated at the Luxembourg Institute of Socio-Economic Research (LISER) following its transition from Bonn.

About IZA@LISER Network

Contact

IZA@LISER NETWORK (Current Site Operator):

Luxembourg Institute of Socio-Economic Research (LISER)
11, Porte des Sciences
Maison des Sciences Humaines
L-4366 Esch-sur-Alzette / Belval, Luxembourg

IZA Institute (In Liquidation):

Forschungsinstitut zur Zukunft der Arbeit GmbH i. L.
Schaumburg-Lippe-Str. 5-9, 53113 Bonn. Germany
Phone: +49 228 3894-0 | Fax: +49 228 3894-510
E-Mail: info@iza.org | Web: www.iza.org
Represented by: Martin T. Clemens (Liquidator)