2025

Why Language Models Hallucinate

Edwin Zhang

citations

Cite Score

1

AI summary

This paper demystifies hallucinations in language models, attributing them to training/evaluation procedures that reward guessing over uncertainty acknowledgment. It connects generative errors to binary classification errors and proposes modifying existing benchmarks to penalize uncertainty less, aiming for more trustworthy AI systems.

Main Contributions

  • Identifies that language models hallucinate due to training and evaluation procedures rewarding guessing over acknowledging uncertainty.
  • Analyzes hallucinations as errors in binary classification, showing they arise from natural statistical pressures in pretrained language models.
  • Argues that the persistence of hallucinations is due to most evaluations grading models as good test-takers, where guessing when uncertain improves performance.
  • Proposes a socio-technical mitigation: modifying the scoring of existing misaligned benchmarks to stop penalizing uncertain responses, rather than introducing new hallucination evaluations.
  • Demonstrates how a novel connection between supervised and unsupervised learning demystifies the origin of hallucinations, even with IDK in training data.

Abstract

Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline. Hallucinations need not be mysterious—they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded—language models are optimized to be good test-takers, and guessing when uncertain improves test performance. This "epidemic" of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems.

Citation Graph

Loading graph...

References [94]

Sort:
Filter:

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - 2022

11 papers in library cite

Deepseek Ai - 2025

2 papers in library cite

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn - 2023

3 papers in library cite

P. Lewis, Ethan Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Kuttler, Martha Lewis, W. T. Yih, Tim Rocktaschel, Sebastian Riedel, K. Douwe - 2020

5 papers in library cite

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021

8 papers in library cite

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron Mckinnon, C. C. Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosuite, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noemi Mercado, Nova Dassarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen Lawton, Tom Conerly, Tom Henighan, Tristan Hume, Samuel R. Bowman, Zac Hatfield Dodds, Benjamin Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom B. Brown, Jared Kaplan - 2022

2 papers in library cite

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aman Gupta, Adria Garriga Alonso - 2022

4 papers in library cite

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeffrey Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman - 2021

7 papers in library cite

Mirac Suzgun, Nathan Scales, Nathanael Scharli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei - 2022

4 papers in library cite

Openai - 2023

6 papers in library cite

Stephen Lin, Jacob Hilton, Owain Evans - 2022

4 papers in library cite

D. Rein, B. L. Hou, A. C. Stickland, J. Petty, R. Y. Pang, J. Dirani, J. Michael, Samuel R. Bowman - 2024

3 papers in library cite

J. Maynez, Shashi Narayan, B. Bohnet, R. Mcdonald - 2020

6 papers in library cite

C. E. Jimenez, Jihan Yang, A. Wettig, S. Yao, K. Pei, O. Press, K. R. Narasimhan - 2024

2 papers in library cite

Yuzhi Wang, X. Ma, G. Zhang, Yuan Ni, A. Chandra, S. Guo, W. Ren, A. Arulraj, X. He, Zhejun Jiang, Tao Li, M. Ku, K. Wang, A. Zhuang, R. Fan, Xiang Yue, Weizhu Chen - 2024

3 papers in library cite

K. Shuster, S. Poff, Mark Chen, Douwe Kiela, Jason Weston - 2021

3 papers in library cite

Jingren Zhou, T. Lu, Swaroop Mishra, S. Brahma, S. Basu, Y. Luan, Denny Zhou, L. Hou - 2023

2 papers in library cite

J. Q. Candela, M. Sugiyama, A. Schwaighofer, N. D. Lawrence - 2009

4 papers in library cite

Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield Dodds, Nova Dassarma, Eli Tran Johnson, Scott Johnston, Sheer El Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, S. Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, J. Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom B. Brown, Jack Clark, Nicholas Joseph, Benjamin Mann, Sam McCandlish, Christopher Olah, Jared Kaplan - 2022

3 papers in library cite

L. Phan, A. Gatti, Z. Han, N. Li, Jiaxi Hu, Haowei Zhang, Chen Bo Calvin Zhang, M. Shaaban, J. Ling, Sherry Shi, M. Choi, A. Agrawal, A. Chopra, A. Khoja, R. Kim, R. Ren, J. Hausenloy, Oliver Zhang, Mantas Mazeika, D. Dodonov, T. N. Nguyen, Jaehoon Lee, A. 1. Others - 2025

1 paper in library cites

S. Russell, P. Norvig - 1995

4 papers in library cite

Alicia Parrish, Anna Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, S. Bowman - 2022

4 papers in library cite

V. N. Vapnik, A. Y. Chervonenkis - 1971

3 papers in library cite

Leo Gao, J. Tow, B. Abbasi, Stella Biderman, S. Black, A. Dipofi, C. Foster, L. Golding, J. Hsu, A. L. Noac'h, H. Li, Kyle Mcdonell, Niklas Muennighoff, C. Ociepa, Jason Phang, Laria Reynolds, H. Schoelkopf, A. Skowron, L. Sutawika, Eric Tang, A. Thite, B. Wang, K. Wang, Andy Zou - 2024

3 papers in library cite

Percy Liang, R. Bommasani, Teddy Lee, D. Tsipras, Dilara Soylu, Michihiro Yasunaga, Y. Z. Zhang, D. Narayanan, Yonghui Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, Chiyuan Zhang, C. Cosgrove, Christopher D. Manning, C. Re, D. A. Navas, D. A. Hudson, E. Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, H. Ren, H. Yao, J. Wang, K. Santhanam, L. Orr, L. Zheng, M. Yuksekgonul, Mirac Suzgun, N. Kim, N. Guha, N. S. Chatterji, Omar Khattab, P. Henderson, Q. Huang, R. A. Chi, S. M. Xie, S. Santurkar, Surya Ganguli, Tatsunori Hashimoto, T. Icard, Tong Zhang, V. Chaudhary, Wenyi Wang, Xiang Lisa Li, Y. Mai, Y. Z. Zhang, Y. Koreeda - 2023

2 papers in library cite

C. Mcdiarmid - 1989

2 papers in library cite

C. Fourrier, N. Habib, A. Lozovskaya, K. Szafer, Thomas Wolf - 2024

2 papers in library cite

I. J. Good - 1953

2 papers in library cite

P. Domingos - 2012

1 paper in library cites

J. G. M. Torres, T. Raeder, R. A. Rodriguez, N. V. Chawla, F. Herrera - 2012

1 paper in library cites

S. Hanneke, A. T. Kalai, G. Kamath, C. Tzamos - 2018

1 paper in library cites

L. Leffer - 2024

1 paper in library cites

N. Jones - 2025

1 paper in library cites

M. J. Kearns, U. V. Vazirani - 1994

1 paper in library cites

C. K. Wu, Z. R. Tam, Chin Yew Lin, Y. N. Chen, H. Y. Lee - 2025

1 paper in library cites

N. Maslej, L. Fattorini, R. Perrault, Y. Gil, V. Parli, N. Kariuki, E. Capstick, A. Reuel, E. Brynjolfsson, J. Etchemendy, K. Ligett, T. Lyons, J. Manyika, J. C. Niebles, Y. Shoham, R. Wald, T. Walsh, A. Hamrah, L. Santarlasci, J. B. Lotufo, A. Rome, A. Shi, S. Oak - 2025

1 paper in library cites

M. Damani, I. Puri, S. Slocum, I. Shenfeld, L. Choshen, Yoon Kim, Jacob Andreas - 2025

1 paper in library cites

A. T. Kalai, S. S. Vempala - 2024

1 paper in library cites

D. Mcallester, L. Ortiz - 2003

1 paper in library cites

Dario Amodei, L. Fridman - 2024

1 paper in library cites

S. Farquhar, J. Kossen, L. Kuhn, Yarin Gal - 2024

1 paper in library cites

A. Agrawal, Mirac Suzgun, L. Mackey, A. Kalai - 2024

1 paper in library cites

Z. Yin, Q. Sun, Q. Guo, Jeffrey Wu, X. Qiu, X. Huang - 2023

1 paper in library cites

Z. Gekhman, G. Yona, R. Aharoni, M. Eyal, A. Feder, R. Reichart, J. Herzig - 2024

1 paper in library cites

T. H. Costello, G. Pennycook, D. G. Rand - 2024

1 paper in library cites

Nayeon Lee, W. Ping, P. Xu, M. Patwary, Pascale Fung, M. Shoeybi, Bryan Catanzaro - 2022

1 paper in library cites

K. Tian, Eric Mitchell, H. Yao, Christopher D. Manning, Chelsea Finn - 2024

1 paper in library cites

O. Goldreich - 2001

1 paper in library cites

G. Deepmind - 2025

1 paper in library cites

Openai - 2025

1 paper in library cites

Y. Anand, Z. Nussbaum, B. Duderstadt, B. Schmidt, A. Mulyar - 2023

1 paper in library cites

Jie Tang, Q. Zhang, Yiwei Li, N. Chen, Jeffrey Li - 2025

1 paper in library cites

Zhiwei Xu, Shantanu Jain, M. Kankanhalli - 2024

1 paper in library cites

Wenxuan Zhang, J. Zhang - 2025

1 paper in library cites

M. M. Miao, M. Kearns - 2025

1 paper in library cites

Y. Bang, Z. Ji, A. Schelten, A. Hartshorn, T. Fowler, Chiyuan Zhang, N. Cancedda, Pascale Fung - 2025

1 paper in library cites

Ke Sun, Y. E. Xu, H. Zha, Yibo Liu, X. L. Dong - 2023

1 paper in library cites

Jiacheng Xu, Y. Mai, Percy Liang - 2025

1 paper in library cites

J. Jeong - 2024

1 paper in library cites

Mingchuan Zhang, O. Press, W. Merrill, A. Liu, Noah A. Smith - 2023

1 paper in library cites

Shanda Li, Xiang Lisa Li, L. Shang, Zhikang Dong, C. Sun, Bing Liu, Z. Ji, Xu Jiang, Qian Liu - 2022

1 paper in library cites

J. L. Austin - 1962

1 paper in library cites

Openai - 2023

1 paper in library cites

Openai - 2025

1 paper in library cites

Openai - 2025

1 paper in library cites

S. Levy, M. Saxon, W. Y. Wang - 2021

1 paper in library cites

J. Kleinberg, S. Mullainathan - 2024

1 paper in library cites

A. Beygelzimer, H. D. Iii, John Langford, P. Mineiro - 2016

1 paper in library cites

Openai - 2024

1 paper in library cites

H. P. Grice - 1975

1 paper in library cites

D. A. Alber, Zhilin Yang, A. Alyakin, E. Yang, S. Rai, A. A. Valliani - 2025

1 paper in library cites

Z. Sprague, X. Ye, K. Bostrom, S. Chaudhuri, G. Durrett - 2024

1 paper in library cites

L. Fan, W. Hua, Lei Li, H. Ling, Y. Z. Zhang - 2024

1 paper in library cites

B. Gao, Francis Song, Zhilin Yang, Zhipeng Cai, Y. Miao, Q. Dong, Lei Li, C. Ma, L. C. Chen, Runxin Xu, Z. Tang, B. Wang, D. Zan, S. Quan, G. Zhang, L. Sha, Y. Z. Zhang, Xiang Ren, T. Liu, B. Chang - 2024

1 paper in library cites

D. A. Mcallester, R. E. Schapire - 2000

1 paper in library cites

A. Kalavasis, A. Mehrotra, G. Velegkas - 2025

1 paper in library cites

A. Myrzakhan, S. M. Bsharat, Z. Shen - 2024

1 paper in library cites

Openai - 2025

1 paper in library cites

B. Ma, Yiwei Li, W. Zhou, Z. Gong, Y. J. Liu, K. Jasinskaja, A. Friedrich, J. Hirschberg, F. Kreuter, Barbara Plank - 2025

1 paper in library cites

A. Kalai - 2001

1 paper in library cites

S. J. Mielke, A. Szlam, E. Dinan, Y. Lan Boureau - 2022

1 paper in library cites

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer - 2015

1 paper in library cites

P. Manakul, A. Liusie, M. Gales - 2023

1 paper in library cites

Jason Wei, Dong Huang, Y. Lu, Denny Zhou, Quoc V. Le - 2023

1 paper in library cites

Z. Ji, Nayeon Lee, R. Frieske, Tao Yu, D. Su, Yiheng Xu, E. Ishii, Y. Bang, Deli Chen, W. Dai, H. S. Chan, Andrea Madotto, Pascale Fung - 2023

1 paper in library cites

Stephen Lin, Jacob Hilton, Owain Evans - 2022

1 paper in library cites

G. Hong, A. P. Gema, R. Saxena, X. Du, P. Nie, Y. Zhao, Laura Perez-Beltrachini, M. Ryabinin, X. He, C. Fourrier, P. Minervini - 2024

1 paper in library cites

L. Berglund, Meg Tong, M. Kaufmann, M. Balesni, A. C. Stickland, Tomasz Korbak, Owain Evans - 2024

1 paper in library cites

A. P. Dawid - 1982

1 paper in library cites

M. J. Kearns, R. E. Schapire, L. M. Sellie - 1994

1 paper in library cites

Y. Xue, K. Greenewald, Y. Mroueh, B. Mirzasoleiman - 2025

1 paper in library cites

Wei-Lin Chiang, Zhiyuan Li, Zongyu Lin, Y. Sheng, Ziyi Wu, Haowei Zhang, L. Zheng, S. Zhuang, Y. Zhuang, Joseph E. Gonzalez, Ion Stoica, E. P. Xing - 2023

1 paper in library cites

Y. S. Sun, Y. Gai, L. C. Chen, A. Ravichander, Yejin Choi, Dawn Song - 2025

1 paper in library cites

Bill Yuchen Lin, Y. Deng, K. Chandu, A. Ravichander, V. Pyatkin, N. Dziri, R. L. Bras, Yejin Choi - 2025

1 paper in library cites

Cited by

0

papers in your library

Cites

20

papers in your library

Read

on October 5, 2025

Your review

Tags

Paper Aliases

No aliases