AI Models Learn to Admit Ignorance, Curbing Overconfidence and Hallucinations
AI Models Learn to Say 'I Don't Know', Curbing Overconfidence

South Korean researchers have developed a new training method that enables artificial intelligence models to acknowledge their lack of knowledge on certain topics, a behaviour that mimics human cognition. This breakthrough, achieved by scientists at the Korea Advanced Institute of Science and Technology (KAIST), could significantly enhance the reliability of AI systems used in high-stakes fields such as autonomous driving and medical diagnosis.

The Problem of AI Overconfidence

Previous studies have highlighted AI overconfidence as a major risk, particularly when these tools are employed for decision-making in critical areas like healthcare. Widely used models, including OpenAI's ChatGPT, are known to "hallucinate"—generating fabricated facts—because they are incentivised to guess rather than admit ignorance. This overconfidence stems from the way artificial neural networks, the backbone of AI, learn from initial data. Small errors introduced during this phase can propagate and amplify through subsequent training, leading to significant inaccuracies.

New Training Method Inspired by the Human Brain

To address this issue, the KAIST team looked to the human brain for inspiration. In humans, brain signals are generated even before birth, without external input, which helps the brain manage uncertainty. Mimicking this process, the researchers developed a system where the neural network undergoes brief pre-training with random noise inputs before actual learning begins. This warm-up phase allows the AI to establish a baseline of uncertainty, setting its initial confidence to a low level—close to chance—and substantially reducing overconfidence bias.

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

"While conventional models tend to give incorrect answers with high confidence even for data they have not encountered during training, models with warm-up training showed a clear improvement in their ability to lower confidence and recognise that they 'do not know'," the researchers explained. This approach helps the AI first learn the state of "I don't know anything yet," enabling it to distinguish between what it knows and what it does not know.

Implications for AI Reliability

The study, published in the journal Nature Machine Intelligence, demonstrates that incorporating key principles of brain development can make AI more human-like in recognising its own knowledge state. "This is important because it helps AI understand when it is uncertain or might be mistaken, not just improve how often it gives the right answer," said Se-Bum Paik, an author of the study. The method could lead to more trustworthy AI systems, particularly in applications where overconfidence could have serious consequences.

Pickt after-article banner — collaborative shopping lists app with family illustration