How fingerprinting works

We do not store raw inputs or outputs. We capture behavioral and structural signals, summarize them into a fingerprint, and compare fingerprints to detect drift.

Signals captured

  • Tool / action distribution — which tools were called and how often (e.g. search 78%, lookup 22%).
  • Latency profile — percentiles (p50, p95, p99) and trends.
  • Decision rates — e.g. escalation rate, referral rate, or other outcome ratios.
  • Error pattern — error rate and type distribution, not message content.

Content is hashed at capture; only these aggregates and structural features leave your environment.

Fingerprint and divergence

A fingerprint is a compact summary of the above over a window of runs (e.g. 50+). We compute divergence between two fingerprints (e.g. v1.0 vs v2.0) using JS divergence and related metrics. The result is a drift score from 0 (no change) to 1 (maximal change).

What the score means

The score tells you how much observed behavior changed between two versions. It does not tell you why; we optionally provide plain-English root-cause analysis. Use thresholds (e.g. 0.2–0.3) to gate deploys or fire alerts. See Drift score reference for details.