Executive Summary

What to log, how to correlate it, and how to debug failures without guessing or staring at token counts.

AI apps create new kinds of ‘unknown unknowns’: prompt drift, retrieval mismatch, policy denials, and provider hiccups.

Observability means correlating every answer with: model, prompt version, policy decision, retrieval evidence, and cost.

“Production is where good ideas meet boring reality. The winners instrument the boring part.”
AI & Dev Dispatch

The Core Idea

Most “AI failures” are system failures: missing contracts, missing logs, missing ownership lines. Fix the system, and the model suddenly looks smarter.

Contract

Define the stable input/output boundary first.

Logs

Capture raw facts, not just summaries.

Policy

Centralize allow/deny decisions and expose reason codes.

UX

Make failure legible and recoverable.

// Stable contract surface (gateway request)
POST /.netlify/functions/gateway-chat
{
  "org_id": "...",
  "user_id": "...",
  "model": "gpt-4.1-mini",
  "messages": [...]
}

That snippet is not a complete app. It’s a reminder: your system should prefer verifiable facts over narrative.

Failure Modes You’ll Actually See

  • No correlation IDs

    If you can’t connect a user action to a gateway event, debugging is guesswork.

  • Logging secrets

    Never log raw keys, tokens, or private content unless encrypted and necessary.

  • Missing policy decisions

    ‘Denied’ needs a reason code or you’ll never fix false positives.

  • No replay tooling

    The best debugging tool is ‘re-run the exact request’ under controlled conditions.

Implementation Notes

Use a request_id everywhere: UI → function → gateway → provider logs.

Persist a ‘policy_decision’ object: allow/deny, reason_code, limit_snapshot.

Add a replay endpoint for admins/owners that re-runs a request using stored inputs (redacted).

For architecture and rollout planning, use the Contact Hub.

Ship‑Ready Checklist

Use this as a pre‑deploy gate. If you can’t check these boxes, don’t pretend you’re “done.”

- A single source of truth for versions (prompt/policy/schema) and a way to display them in-app.
- Request correlation ID visible in UI, logged server-side, and searchable.
- Explicit failure UX: what happened, why, and a safe next step.
- An audit trail you can replay: inputs, decisions, outputs, and cost facts.
- A small test harness (even 20 cases) that runs before deployment.

Want this turned into a working product?

Use the Contact Hub to scope features, security, billing, and the deployment plan.

Open Contact Hub