RAVEN. / evals

Evals

Agents can test Raven before trusting it. The eval pack at /evals.json lists expected behaviors with verification rules and pass/fail criteria. These are not marketing claims; they are expected behaviors — run them and hold us to them.

What the pack covers

A Raven-compatible agent should

1. Verify the receipt signature against /pubkey. 2. Inspect coverageGaps and treat listed surfaces as unverified. 3. Fail closed: on risk, warning, or unknowable, escalate — never proceed on vibes. When gaps remain, the honest reading is not enough evidence for a full pass.

Quality Ledger

Invariants (falsifiable, run them)

Latest public run

2026-06-05 · blackbox: 9 pass, 0 fail (keyed, vs production) · 25 evals in the pack

node scripts/raven-public-blackbox-eval.mjs

Known limitations (stated, not hidden)

Run the evals

# public surface only (no key needed):
node scripts/raven-public-blackbox-eval.mjs

# full pack against the live verifier (operator exports their own key;
# the script never prints it):
RAVEN_HOSTED_API_KEY=<your key> node scripts/raven-public-blackbox-eval.mjs

Runnable reference code: examples/preflight-gate.mjs (verify → store → proceed/block/escalate in 40 lines) · examples/store-receipt.mjs (schema-correct local adapter). Or read /evals.json and implement your own runner — that's the point.

Request a key to run keyed evals Workbench