Livescience article below claims Hallucinations are a feature not a bug.
AI LLM models and others, but let’s back up and set some descriptions:
From this site Tolokai.ai Difference between AI, ML, LLM, and Generative AI
Image above is from Devblogit
AI is a broad topic and only a small section:
AI belongs to the field of computer science that deals with the development of computer systems that can perform tasks that typically require human intelligence, such as speech recognition, natural language processing (NLP), text generation and translation, video, sound, and image generation, decision making, planning, and more.
Generative AI 1 paragraph:
Discriminative and generative AI are two different approaches to building AI systems. Discriminative AI focuses on learning the boundaries that separate different classes or categories in the training data. These models do not aim to generate new samples, but rather to classify or label input data based on what class it belongs to. Discriminative models are trained to identify the patterns and features that are specific to each class and make predictions based on those patterns.
ML or Machine Learning:
Machine Learning is a specific subset or application of AI that focuses on providing systems the ability to learn and improve from experience without being explicitly programmed. ML is a critical component of many AI systems. ML algorithms are used to train AI models by providing them with datasets containing labeled examples or historical data. The model then learns the underlying patterns in the training data, enabling it to make accurate predictions or decisions on new, unseen data. By continuously feeding data to ML models, they can adapt and improve their performance over time.
LLM – Large Language Models (versus Generative AI) – Let’s get into this:
Generative AI is a broad concept encompassing various forms of content generation, while LLM is a specific application of generative AI. Large language models serve as foundation models, providing a basis for a wide range of natural language processing (NLP) tasks. Generative AI can encompass a range of tasks beyond language generation, including image and video generation, music composition, and more. Large language models, as one specific application of generative AI, are specifically designed for tasks revolving around natural language generation and comprehension.
This is a picture format From a different site (medical angle):
and comparing different LLMs from a windows.net section of the net
Sometimes in comparing LLMs one can learn something about LLMs.
Although this site goes into the internals of the LLM engines like what is the backbone (Bloom, GPT3, Pythia, and LLaMa). Whether an engine is Foundation or Instruction model. Inference vs Training and Finetuning code availability is check marked. and whether the license is open or closed.
Now that we set a little background to AI, GenAI, ML, or LLM.
Check out these 4 stories:
Newscientist.com “AI hallucinations are getting worse – and they’re here to stay”
An OpenAI technical report evaluating its latest LLMs showed that its o3 and o4-mini models, which were released in April, had significantly higher hallucination rates than the company’s previous o1 model that came out in late 2024. For example, when summarising publicly available facts about people, o3 hallucinated 33 per cent of the time while o4-mini did so 48 per cent of the time. In comparison, o1 had a hallucination rate of 16 per cent.
Techopedia.com “48% Error Rate: AI Hallucinations Rise in 2025 Reasoning Systems”
Key Takeaways
- OpenAI’s o3 system hallucinates 33% of the time, twice the rate of its predecessor, o1, despite improved mathematical abilities.
- OpenAI acknowledges that “more research is needed” to understand why hallucinations worsen as reasoning models scale up.
- Reasoning models from OpenAI, Google, and Chinese startup DeepSeek are all generating more errors, not fewer, as they become more powerful.
- AI hallucinations have caused business disruptions, with incidents such as Cursor’s AI support bot falsely announcing policy changes, leading to customer cancellations.
- Many experts believe hallucinations are intrinsic to the technology.
Lifehacker.com “AI Models Are Hallucinating More (and It’s Not Clear Why)”
Hallucinations have always been an issue for generative AI models: The same structure that enables them to be creative and produce text and images also makes them prone to making stuff up. And the hallucination problem isn’t getting better as AI models progress—in fact, it’s getting worse.
Livescience.com “AI hallucinates more frequently as it gets more advanced — is there any way to stop it from happening, and should we even try?”
Starting with Livescience article — the electrical engineer comments:
This raises a concern over the accuracy and reliability of large language models (LLMs) such as AI chatbots, said Eleanor Watson, an Institute of Electrical and Electronics Engineers (IEEE) member and AI ethics engineer at Singularity University.
“When a system outputs fabricated information — such as invented facts, citations or events — with the same fluency and coherence it uses for accurate content, it risks misleading users in subtle and consequential ways,” Watson told Live Science.
In order to develop creative, and potentially novel, solutions to problems, AI needs to hallucinate —otherwise it’s limited by rigid data its LLM ingests.
“It’s important to note that hallucination is a feature, not a bug, of AI,” Sohrob Kazerounian, an AI researcher at Vectra AI, told Live Science. “To paraphrase a colleague of mine, ‘Everything an LLM outputs is a hallucination. It’s just that some of those hallucinations are true.’ If an AI only generated verbatim outputs that it had seen during training, all of AI would reduce to a massive search problem.”
In effect, LLMs and the AI systems they power need to hallucinate in order to create, rather than simply serve up existing information. It is similar, conceptually, to the way that humans dream or imagine scenarios when conjuring new ideas.
other text….
Kazerounian backed this viewpoint up. “Despite the general belief that the problem of AI hallucination can and will get better over time, it appears that the most recent generation of advanced reasoning models may have actually begun to hallucinate more than their simpler counterparts — and there are no agreed-upon explanations for why this is,” he said.
The situation is further complicated because it can be very difficult to ascertain how LLMs come up with their answers; a parallel could be drawn here with how we still don’t really know, comprehensively, how a human brain works.
So instead of continuing to catalog the reasons for the hallucinations let’s get into how to reduce them. I believe also that I made the case that there are hallucinations (errors in answers) and also that there HAS to be errors so that the system can be creative. This is similar to humans, where I know in a creative process there are many times the answers are wrong, but it may have helped you in creating a truly great product.
So let’s ask you.com AI this:
How can a person asking questions to GenAI or LLM reduce hallucinations? What kind of parameters need to be used? and the answer is:
How to Reduce Hallucinations in GenAI or LLMs
Hallucinations in large language models (LLMs) occur when the model generates false or misleading information that appears factual. While hallucinations cannot be entirely eliminated, users can take specific steps to reduce their occurrence. Below are strategies and parameters that can help minimize hallucinations when interacting with LLMs.
1. Use Advanced Prompting Techniques
Advanced prompting methods can guide the model to produce more accurate and reliable responses. These include:
- Explicit Instructions: Clearly instruct the model to avoid adding unverifiable or false information. For example, include a system prompt like: “Only provide information that is factually accurate and verifiable.”
- Few-Shot Prompting: Provide a few examples of the desired response format or content. This helps the model focus on the specific context and reduces the likelihood of hallucinations. However, the quality of examples is critical—poor examples can lead to more errors
- Chain-of-Thought Prompting: Encourage the model to reason step-by-step before providing an answer. For instance, you can prompt it with: “Think step-by-step before answering.” This method improves logical reasoning but may still require careful evaluation of the reasoning steps
2. Incorporate Retrieval-Augmented Generation (RAG)
RAG combines the model’s pre-trained knowledge with real-time retrieval of relevant data from external sources. This ensures that the model’s responses are grounded in factual, up-to-date information. Here’s how it works:
- Data Retrieval: The system searches a database (e.g., internal documents or trusted sources) for relevant information based on the user’s query.
- Enhanced Prompting: The retrieved data is combined with the user’s query to create a more accurate and contextually grounded response
RAG is particularly effective in reducing hallucinations in high-stakes domains like healthcare, finance, and legal services
.
3. Fine-Tune the Model with Domain-Specific Data
Fine-tuning involves training the model on high-quality, domain-specific datasets. This process helps the model better understand specialized topics and reduces the likelihood of hallucinations caused by knowledge gaps. Key considerations include:
- High-Quality Data: Use accurate, diverse, and unbiased datasets.
- Sufficient Training: Ensure the model is exposed to enough examples to learn the nuances of the domain
Fine-tuning is especially useful for standardized tasks where the model needs to perform consistently
.
4. Implement Guardrails
Guardrails are programmable safety controls that monitor and restrict the model’s outputs. They ensure that the AI operates within predefined boundaries. Examples include:
- Contextual Grounding: Guardrails verify that the model’s responses are factually accurate based on trusted sources.
- Rule-Based Systems: These systems can block or flag outputs that deviate from the desired behavior
Guardrails are particularly important in enterprise applications where accuracy and reliability are critical
.
5. Evaluate and Monitor Outputs
Regular evaluation of the model’s outputs is essential to identify and address hallucinations. Techniques include:
- Human Feedback: Use human reviewers to assess the accuracy and relevance of responses.
- Automated Validation: Implement tools like SelfCheckGPT, which compare multiple generated responses for consistency and flag potential hallucinations
Evaluation metrics such as semantic similarity, relevance, and factual accuracy can help measure the model’s performance
.
6. Use External Tools and APIs
Integrating external tools, such as APIs or databases, can enhance the model’s accuracy. For example:
- Real-Time Data Access: Use APIs to fetch real-time information, such as weather updates or stock prices.
- Specialized Models: Combine the LLM with smaller, task-specific models for areas like sentiment analysis or translation
7. Design Better Prompts
The structure and clarity of your prompts significantly impact the model’s output. To reduce hallucinations:
- Be Specific: Avoid vague or open-ended prompts. For example, instead of asking, “Tell me about climate change,” ask, “What are the main causes of climate change according to the IPCC?”
- Avoid Ambiguity: Ensure the prompt is clear and unambiguous to prevent the model from “guessing”
8. Understand the Model’s Limitations
Recognize that LLMs are not inherently fact-checkers. They generate responses based on patterns in their training data and may lack real-time knowledge or domain expertise. To mitigate this:
- Avoid Over-Reliance: Cross-check critical information with trusted sources.
- Set Realistic Expectations: Use LLMs as assistants rather than definitive sources of truth
Conclusion
Reducing hallucinations in LLMs requires a combination of advanced prompting, retrieval-augmented generation, fine-tuning, and robust evaluation. By implementing these strategies and understanding the model’s limitations, users can significantly improve the reliability and accuracy of AI-generated responses.
I knew that AI was not perfect and have heard or read some of the more controversial and newsworthy issues. Which made me wary of “going all in” without knowing more. And if you look at the suggestions of creating more defined questions, while if you depend on the answer there needs to be some research
Contact me to discuss how to incorporate AI into your systems – auditing, security policies and more.