Artificial Jagged Intelligence: When AI Benchmarks Misstate Deployment Value

Joshua S. Gans

doi:10.3386/w34712

Artificial Jagged Intelligence: When AI Benchmarks Misstate Deployment Value

Joshua S. Gans

Working Paper 34712

DOI 10.3386/w34712

Issue Date January 2026

Revision Date June 2026

Organisations increasingly select and deploy artificial intelligence systems on the strength of public benchmarks. A benchmark, however, scores a system on a single distribution of tasks, whereas each organisation meets its own. Because AI performance is uneven across tasks, a property called artificial jagged intelligence, these distributions diverge, and a system that looks reliable on average can fail on the tasks a given workflow uses most. We model this gap and show that it is not noise but a predictable exposure effect: deployment loss exceeds benchmark loss exactly when the tasks an organisation uses most are those the system handles worst. This single mechanism links managerial choices usually studied in isolation. It governs when to roll out a system, where to direct scarce reliability investment, whether to audit one’s own task mix before committing, and when to verify outputs after deployment. Better information about the workflow redirects investment towards targeted fixes whose value a public benchmark hides. The same logic explains why a single benchmark score is not enough: providers should report performance by task category so that organisations can reweight it for their own use.

This paper replaces an earlier version of the paper called “A Model of Artificial Jagged Intelligence." Thanks to Tom Cunningham for helpful comments. Research assistance from Refine.ink, Claude Opus 4.8, and ChatGPT 5.5 are acknowledged. Funding from the SSHRC is gratefully acknowledged. Responsibility for all errors remains our own. The views expressed herein are those of the author and do not necessarily reflect the views of the National Bureau of Economic Research.

Joshua S. Gans
Joshua Gans has drawn on the findings of his research for both compensated speaking engagements and consulting engagements. He has written the books Prediction Machines, Power & Prediction, and Innovation + Equality on the economics of AI for which he receives royalties. He is also chief economist of the Creative Destruction Lab, a University of Toronto-based program that helps seed stage companies, from which he receives compensation. He conducts consulting on anti-trust and intellectual property matters with an association with Keystone Strategy and his ownership of Core Economic Research Ltd. He also has equity and advisory relationships with a number of startup firms. Joshua is also a co-founder of All Day TA.
Copy Citation

Joshua S. Gans, "Artificial Jagged Intelligence: When AI Benchmarks Misstate Deployment Value," NBER Working Paper 34712 (2026), https://doi.org/10.3386/w34712.

Download Citation

MARC RIS BibTeΧ
- January 13, 2026

Artificial Jagged Intelligence: When AI Benchmarks Misstate Deployment Value

Related

Topics

Programs

More from the NBER