Increaing AI Hallucinations in new LLMs and Ways to Solve Them

Back to blog

AI Models

Are Newer LLMs Hallucinating More? Ways to Solve AI Hallucinations

Apr 28, 2025

15 min read

Are OpenAI's newest models hallucinating more than before?

Hallucinations have always been one of the biggest issues plaguing AI deployment. It now seems that this problem is getting worse - not better - with newer AI models. It's now been widely reported that newer SOTA models - especially powerful new reasoning models from the o family - hallucinate more than ever before.

it seems like new, more powerful models aren't going to hallucinate less - so we have collected some techniques we are using with customers to reduce hallucinations in real-life deployments.

But First - What Are LLM Hallucinations?

Hallucinations are outputs that sound plausible - sometimes even authoritative - but are factually inaccurate, entirely fabricated, or logically inconsistent. Hallucinations may take the form of invented references, non-existent events, subtly warped facts, or imaginary data. Ever seen the ChatGPT invent a fact that doesn't exist (very convincingly) or seen GitHub copilot invent and use an NPM package that doesn't exist? Those are hallucinations.

Unfortunately, hallucinations are not a "bug" per-se, but rather a byproduct of how AI models are built and trained. LLMs are statistical model, aimed at predicting the next token in a sequence that will sound the most correct - not the one that'll necessarily be the most correct. If you've ever seen the movie "Catch me if you can" - you can think of AI models like Leonardo DiCaprio - great at sounding right, even if they aren't.

What Causes Hallucinations?

Hallucinations stem from the very nature of LLMs and how they’re trained:

Lack of Grounding: LLMs use probabilistic reasoning, not real-time fact-checking. They don’t access databases of truth on demand.
Training Data Limitations: The massive text datasets used for training contain errors, outdated facts, and misinformation, which models inherit and replicate.
Overgeneralization: When facing ambiguous or unfamiliar prompts, models “fill in” gaps with plausible-sounding stories, regardless of real-world validity.
Prompt Sensitivity: Minor changes in input phrasing can drastically shift answer quality, especially for vague prompts.
Lack of Feedback: LLMs don’t know when they’re wrong unless explicitly trained with correction signals, such as in reinforcement learning.

Why Are Hallucinations Hard to Fix in Model Training?

Efforts to eliminate hallucinations during training run up against deep-seated challenges:

Statistical Objective vs. Factual Accuracy: Models optimize for likely-sounding next words, not ground-truth facts.
Ambiguity in Training Data: The internet (a common source) is a noisy mix of truths and errors - automatically distinguishing between them is non-trivial.
No Discrete Memory Store: LLMs compress knowledge into billions of parameters, making surgical fact correction difficult.
Scale and Complexity: Large models are brittle - curing one hallucination often introduces another.
Confidence Calibration: Models reliably state hallucinations with high confidence; teaching them to express uncertainty is an open research area.

Are Newer LLMs Really Hallucinating More?

Counterintuitively, as models become larger and more complex, emerging research and anecdotal reports suggest that hallucination frequency can increase - particularly on long-form, open-ended, or ambiguous tasks. Why?

New models, when trained to be more “helpful,” generalize more aggressively rather than resorting to “I don’t know” or refusals. This broader generalizability allows them to spin convincing explanations to unfamiliar or underspecified prompts, upping the risk of fabrication. At the same time, their scale unlocks powerful mitigation tools (like retrieval augmentation and self-verification) - tools that must be used intentionally to keep hallucinations in check.

How to Solve Increasing LLM Hallucinations

No single solution suffices. The most reliable approach is layered defense, incorporating grounding, in-place verification, and post-hoc checks. Here are the main families of solutions making the biggest impact today:

1. Grounding the Model in Verifiable Context

Retrieval-Augmented Generation (RAG)

Before the LLM generates text, the user’s query is used to fetch relevant documents (via vector search, keyword analysis, etc.), which are then included in the prompt. The model is incentivized - sometimes required - to construct answers only using these supporting snippets. Properly implemented, RAG can drastically reduce hallucinations, as the model is steered to “stay within the lines” of verified content. However, weak retrieval (low recall or irrelevant results) simply moves the hallucination risk to the search layer. (More on RAG)

Knowledge Graph Lookups

For domains with structured data (e.g., pharmaceuticals, product catalogs), inject factual “triples” directly and instruct the model to limit answers to these facts. When applied to medical QA, this targeted retrieval slashes hallucination rates - by over 30% in some studies. (Study on KGR)

2. Prompt-Time Self-Checking

Chain-of-Verification (CoVe)

Ask the model to draft an answer, generate a list of fact-check questions about that answer, answer them independently, and update its final output only if everything checks out. This “think-check-fix” loop has cut hallucinations by 15–20 percentage points in open-ended and structured datasets. (CoVe Paper)

Reflect-then-Answer

A hidden prompt instructs the model: “Think step-by-step, spot possible errors, and fix them before finalizing your reply.” Even without external facts, this leverages the model’s own prior knowledge for internal consistency and can reduce hallucinations in complex tasks by ~17%. (Reflect Paper)

Refusal Triggers & Temperature Control

By setting thresholds for confidence (based on log probabilities) or penalizing unsupported statements, models can be trained to say “I don’t know” more often instead of fabricating. New methods like GRAIT adjust these refusal triggers automatically. (GRAIT Paper)

3. Fine-Tuning & Reinforcement Learning

Truthfulness-Weighted RLHF / RLFH

Reward the model based on fact-level agreement with trusted sources, not just for overall helpfulness. Methods like RLFH decompose each answer into atomic facts and check them externally, giving granular feedback. Recent results show up to 40% reduction in hallucinations on QA benchmarks. (RLFH Paper)

Segment-Level Corrective Feedback

Human annotators flag only the specific span of hallucination. The model’s fine-tuner then targets just those regions during optimization. Reported hallucination drops: 34% with fewer than 1.5k annotated examples. (RLHF-V Paper)

4. Post-Generation Verification Pipelines

Fact-Checker Models: Pass outputs through small, fast models trained to flag unsupported claims.
Model Ensembles/Self-consistency: Generate multiple answers and accept content only if a majority “agree.”
Semantic Entropy Detectors: Algorithms measure answer consistency across paraphrased prompts, flagging high-risk hallucinations with up to 80% accuracy. (Algorithm News)

When flagged, options include: rerunning RAG, replacing answers with “Unknown,” or sending to a human reviewer.

5. Operational & Product-Level Guards

A robust production stack tackles hallucinations beyond the model itself:

Domain-specific fine-tuning on clean, up-to-date data. Regularly re-train or adapter-tune the model on your own high-quality corpus so it speaks the “native language” of your domain and doesn’t have to invent missing facts.

Strict output schemas. When the answer must follow a JSON schema or contain only enumerated values, the decoder has far less freedom to fabricate. Narrow, validated formats are an easy win—particularly for API calls, internal tools, and data pipelines.

User-visible citations and confidence cues. Requiring the system to cite sources (or emit confidence scores) makes unsupported claims stand out to both users and downstream verifiers, reducing the cost of manual review.

Continuous evaluation dashboards. Track hallucination metrics (e.g., factual-precision@K, claim-support rate) in real time across production traffic. Alerts let you catch spikes caused by upstream data drift, new model versions, or prompt changes before they reach end-users.

Layering these operational controls on top of retrieval, self-verification, and fine-tuning measures gives you a defense-in-depth posture: if one layer misses, the next catches, and your overall hallucination rate keeps trending down instead of creeping back up.

6. A Pragmatic Recipe for Teams

Start with RAG wherever factual accuracy matters.
Add a self-verification step (CoVe, reflect-then-answer) and keep model temperature low (≤0.3).
Log all claims and their supporting sources. Pass outputs to a lightweight fact-checker and systematically flag unsupported statements.
Fine-tune on flagged cases with RLHF, RLFH, or refusal-aware objectives.
Continuously monitor hallucination rates using evolving, timestamped benchmarks - since the outside world (and what’s “true”) changes over time.

Key Takeaways

Newer, larger LLMs can hallucinate more, not less, without targeted interventions. Their greater generalization capabilities make them more prone to fabricate convincing-sounding answers.
Layered defenses are essential. Effective pipelines blend preventive grounding (RAG/KG), real-time self-critique, and robust verification - no single approach suffices.
Production LLMs require continuous vigilance and adaptation. Regular evaluation, human-in-the-loop feedback, and fine-tuning with truthfulness-oriented rewards are all critical.

The hallucination problem won’t vanish overnight, but with multifaceted strategies - spanning retrieval, prompt engineering, verification, and operational monitoring - teams can keep LLM outputs accurate, trustworthy, and production-ready, even as models grow ever more powerful.