Executive Summary
What to log, how to correlate it, and how to debug failures without guessing or staring at token counts.
AI apps create new kinds of ‘unknown unknowns’: prompt drift, retrieval mismatch, policy denials, and provider hiccups.
Observability means correlating every answer with: model, prompt version, policy decision, retrieval evidence, and cost.
“Production is where good ideas meet boring reality. The winners instrument the boring part.”AI & Dev Dispatch
The Core Idea
Most “AI failures” are system failures: missing contracts, missing logs, missing ownership lines. Fix the system, and the model suddenly looks smarter.
Contract
Define the stable input/output boundary first.
Logs
Capture raw facts, not just summaries.
Policy
Centralize allow/deny decisions and expose reason codes.
UX
Make failure legible and recoverable.
// Stable contract surface (gateway request)
POST /.netlify/functions/gateway-chat
{
"org_id": "...",
"user_id": "...",
"model": "gpt-4.1-mini",
"messages": [...]
}
That snippet is not a complete app. It’s a reminder: your system should prefer verifiable facts over narrative.
Failure Modes You’ll Actually See
-
No correlation IDs
If you can’t connect a user action to a gateway event, debugging is guesswork.
-
Logging secrets
Never log raw keys, tokens, or private content unless encrypted and necessary.
-
Missing policy decisions
‘Denied’ needs a reason code or you’ll never fix false positives.
-
No replay tooling
The best debugging tool is ‘re-run the exact request’ under controlled conditions.
Implementation Notes
Use a request_id everywhere: UI → function → gateway → provider logs.
Persist a ‘policy_decision’ object: allow/deny, reason_code, limit_snapshot.
Add a replay endpoint for admins/owners that re-runs a request using stored inputs (redacted).
For architecture and rollout planning, use the Contact Hub.
Ship‑Ready Checklist
Use this as a pre‑deploy gate. If you can’t check these boxes, don’t pretend you’re “done.”
- A single source of truth for versions (prompt/policy/schema) and a way to display them in-app.
- Request correlation ID visible in UI, logged server-side, and searchable.
- Explicit failure UX: what happened, why, and a safe next step.
- An audit trail you can replay: inputs, decisions, outputs, and cost facts.
- A small test harness (even 20 cases) that runs before deployment.
Further Reading
External references (full links):
Related Reads in This Series
Want this turned into a working product?
Use the Contact Hub to scope features, security, billing, and the deployment plan.