How RAG Prevents Hallucinations
RAG reduces hallucinations by giving the model something concrete to work from. Instead of relying only on patterns learned during training, the model receives retrieved chunks and is typically instructed to answer only from that context. If the answer isn't in the documents, a well-built RAG system can refuse or say so instead of guessing.

Why grounding helps
LLMs are good at producing plausible text but have no built-in notion of truth. When you inject retrieved passages into the prompt, you constrain the model to that context. That doesn't eliminate all errors—retrieval can be wrong or incomplete—but it ties answers to verifiable sources and makes citations possible.
Designing for safety
# Example instruction to reduce hallucination
system_prompt = """
Answer ONLY using the following context. If the answer is not in the context,
say "I don't have that information in the provided documents."
Do not guess or use outside knowledge.
"""
user_prompt = f"Context:\n{retrieved_chunks}\n\nQuestion: {query}"Retrieval quality matters
RAG only reduces hallucinations when the retrieved context actually contains the answer. If retrieval returns the wrong chunks or misses the right one, the model may still guess or refuse incorrectly. So improving retrieval—chunking, embeddings, and top-k—is as important as the prompt. Evaluate retrieval accuracy separately from answer quality.