RAG Is Not a Vibe: Retrieval That Doesn’t Lie

Executive Summary

Retrieval-Augmented Generation (RAG) works when it’s measurable, auditable, and brutally honest about uncertainty.

RAG fails when retrieval is treated as a magic spell instead of an information system. The model can only be as honest as your evidence pipeline.

Build retrieval like you build search: measurable quality, explicit citations, and graceful uncertainty when evidence is thin.

“Production is where good ideas meet boring reality. The winners instrument the boring part.”

AI & Dev Dispatch

The Core Idea

Most “AI failures” are system failures: missing contracts, missing logs, missing ownership lines. Fix the system, and the model suddenly looks smarter.

Contract

Define the stable input/output boundary first.

Logs

Capture raw facts, not just summaries.

Policy

Centralize allow/deny decisions and expose reason codes.

UX

Make failure legible and recoverable.

// Retrieval contract: keep provenance.
{
  "doc_id": "handbook-2026-02",
  "chunk_id": "handbook-2026-02#sec-3.2",
  "source_url": "https://example.com/handbook#sec-3.2",
  "score": 0.82
}

That snippet is not a complete app. It’s a reminder: your system should prefer verifiable facts over narrative.

Failure Modes You’ll Actually See

Weak chunking

Chunks that are too big dilute relevance; too small lose meaning.
No citations

Users can’t verify. You can’t debug.
Stale indexes

If data changes but embeddings don’t, retrieval lies silently.
Overconfident answers

RAG must degrade gracefully: ‘I don’t have evidence for that’ is a feature.

Implementation Notes

Measure retrieval: top‑k hit rate, citation coverage, and ‘answerable vs unanswerable’ classification.

Keep chunk provenance: document_id, section, timestamp, and a stable URL.

Force the model to cite: output should include evidence IDs or quoted spans with limits.

For architecture and rollout planning, use the Contact Hub.

Ship‑Ready Checklist

Use this as a pre‑deploy gate. If you can’t check these boxes, don’t pretend you’re “done.”

- A single source of truth for versions (prompt/policy/schema) and a way to display them in-app.
- Request correlation ID visible in UI, logged server-side, and searchable.
- Explicit failure UX: what happened, why, and a safe next step.
- An audit trail you can replay: inputs, decisions, outputs, and cost facts.
- A small test harness (even 20 cases) that runs before deployment.

RAG Is Not a Vibe: Retrieval That Doesn’t Lie

Executive Summary

The Core Idea

Contract

Logs

Policy

UX

Failure Modes You’ll Actually See

Weak chunking

No citations

Stale indexes

Overconfident answers

Implementation Notes

Ship‑Ready Checklist

Further Reading

Related Reads in This Series

Want this turned into a working product?