Evals
Agents can test Raven before trusting it. The eval pack at /evals.json lists expected behaviors with verification rules and pass/fail criteria. These are not marketing claims; they are expected behaviors — run them and hold us to them.
What the pack covers
- usdc_authorities_are_risk — blue chips are not special-cased.
- gaps_never_mean_safe — pass-grade over unevaluated surfaces publishes as warning.
- token2022_extensions_surface — extension tail decoded, not ignored.
- invalid_request_structured_error / forbidden_fields_rejected — strict schema, no caller infrastructure.
- pubkey_published — key rvk_c2997e90215279c2 served without auth.
- receipt_fields_complete — replayHash, officialAttestationHash, keyId, observedSlot, signature: independently verifiable.
- agent_*/skill_*/workbench_* — workbench pack ingestible; skill refuses "safe" with gaps and bans advice.
- receipt_*/storage_*/walrus_* — stored receipts: raw = truth, verify before use, adapters documented-only, storage never changes a verdict.
- decision_*/status_* — verdict actions, staleness re-verification, dependency disclosure, gaps never "safe".
- key_policy_available / receipt_key_binding — key policy public and secret-free; receipts key-scoped, no post-quantum claim.
- no_price_prediction_claims — no response claims price prediction or financial advice.
A Raven-compatible agent should
1. Verify the receipt signature against /pubkey. 2. Inspect coverageGaps and treat listed surfaces as unverified. 3. Fail closed: on risk, warning, or unknowable, escalate — never proceed on vibes. When gaps remain, the honest reading is not enough evidence for a full pass.
Quality Ledger
Invariants (falsifiable, run them)
- USDC returns risk (issuer authorities active — no blue-chip special-casing).
- Invalid requests return a structured 400 error, never a verdict.
- rpcUrl and issuerIdentity are rejected.
- Every receipt includes keyId, observedSlot, replayHash, officialAttestationHash, signature.
- Raven never says "safe" when coverage gaps remain.
- Raven does not predict price and does not provide financial advice.
Latest public run
2026-06-05 · blackbox: 9 pass, 0 fail (keyed, vs production) · 25 evals in the pack
node scripts/raven-public-blackbox-eval.mjs
Known limitations (stated, not hidden)
- Holder concentration evidence is key-gated beta, not globally enabled.
- Deployer history is not yet live (listed as a coverage gap).
- Liquidity gap may remain without pool evidence (Raydium CPMM via poolAddress).
- No price prediction, no trading advice — by design, permanently.
Run the evals
# public surface only (no key needed):
node scripts/raven-public-blackbox-eval.mjs
# full pack against the live verifier (operator exports their own key;
# the script never prints it):
RAVEN_HOSTED_API_KEY=<your key> node scripts/raven-public-blackbox-eval.mjs
Runnable reference code: examples/preflight-gate.mjs (verify → store → proceed/block/escalate in 40 lines) · examples/store-receipt.mjs (schema-correct local adapter). Or read /evals.json and implement your own runner — that's the point.
Request a key to run keyed evals
Workbench