fawp_index.benchmarks — Benchmark Suite

Five canonical ground-truth cases that verify the detector behaves correctly across the full detection landscape.

Quick start

from fawp_index import run_benchmarks

suite = run_benchmarks()
print(suite.summary())       # pass/fail table
suite.verify_all()           # raises BenchmarkFailure if any case fails
suite.to_html("bench.html")
suite.to_json("bench.json")

CLI

fawp-index benchmarks
fawp-index benchmarks --verify          # exit 1 if any fail
fawp-index benchmarks --out bench.html

The five cases

Case	FAWP expected?	Description
`clean_control`	✅ Yes	Textbook FAWP: steering collapses, prediction survives
`prediction_only`	❌ No	Predictable system, no steering channel
`control_only`	❌ No	Active controller, no predictive horizon
`noisy_false_positive`	❌ No	Noisy stable system designed to trap detectors
`delayed_collapse`	✅ Yes	Fast-collapsing unstable system, narrow ODW

All five run in < 1 second (analytic curves, no simulation).

Run individual cases

from fawp_index import clean_control, delayed_collapse

case = clean_control()
case.verify()          # raises if wrong result
case.plot()            # leverage gap plot

case2 = delayed_collapse()
print(case2.result.summary())

Simulate instead of analytic curves

from fawp_index.benchmarks import clean_control
case = clean_control(simulate=True)   # runs FAWPSimulator

BenchmarkResult fields

Field	Description
`name`	Case name
`passed`	Boolean
`expected_fawp`	Expected detection outcome
`actual_fawp`	Actual detection outcome
`result`	Full `ODWResult`
`notes`	Human-readable description