ChatGPT Goblin Obsession Reveals AI Training Flaws

OpenAI has unraveled a peculiar bug affecting ChatGPT that led the AI chatbot to develop an unusual fixation on goblins and other mythical creatures. Over the past six months, references to the word 'goblin' skyrocketed within ChatGPT, even when users posed unrelated queries. This phenomenon prompted an investigation by OpenAI researchers, who determined that the issue "crept in subtly" following the release of a new ChatGPT model last November.

The Rise of the Goblin

The new model, designed to be "smarter and more conversational" than its predecessors, introduced personality settings such as 'Nerdy', 'Candid', and 'Quirky'. Shortly after its debut, users and researchers began noticing a recurring pattern: ChatGPT repeatedly mentioned goblins, gremlins, and other fantasy creatures in its responses.

According to OpenAI's blog post, "Starting with GPT-5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors." The company explained that "we unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread."

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Training Gone Awry

Safety researchers at OpenAI reported a 175 per cent increase in mentions of the word 'goblin' following the release of GPT-5.1, attributing this to the model being incentivised to use playful metaphors. The training method was not corrected for subsequent models, and when GPT-5.4 launched in March, goblin mentions surged nearly 4,000 per cent in the Nerdy personality type, with similar increases across other models.

OpenAI noted, "The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data."

Broader Implications

While this particular glitch was relatively harmless, it underscores a broader flaw in leading artificial intelligence models and their training methodologies. Reinforcement learning and reward signals can cause AI models to mutate in unexpected and unintended ways. OpenAI stated that its research and safety team has developed new methods to investigate rogue patterns and will conduct more audits of model behavior in the future.