How fingerprinting works
We do not store raw inputs or outputs. We capture behavioral and structural signals, summarize them into a fingerprint, and compare fingerprints to detect drift.
Signals captured
- Tool / action distribution — which tools were called and how often (e.g. search 78%, lookup 22%).
- Latency profile — percentiles (p50, p95, p99) and trends.
- Decision rates — e.g. escalation rate, referral rate, or other outcome ratios.
- Error pattern — error rate and type distribution, not message content.
Content is hashed at capture; only these aggregates and structural features leave your environment.
Fingerprint and divergence
A fingerprint is a compact summary of the above over a window of runs (e.g. 50+). We compute divergence between two fingerprints (e.g. v1.0 vs v2.0) using JS divergence and related metrics. The result is a drift score from 0 (no change) to 1 (maximal change).
What the score means
The score tells you how much observed behavior changed between two versions. It does not tell you why; we optionally provide plain-English root-cause analysis. Use thresholds (e.g. 0.2–0.3) to gate deploys or fire alerts. See Drift score reference for details.