Evals

Agents can test Raven before trusting it. The eval pack at /evals.json lists expected behaviors with verification rules and pass/fail criteria. These are not marketing claims; they are expected behaviors — run them and hold us to them.

What the pack covers

usdc_authorities_are_risk — blue chips are not special-cased.
gaps_never_mean_safe — pass-grade over unevaluated surfaces publishes as warning.
token2022_extensions_surface — extension tail decoded, not ignored.
invalid_request_structured_error / forbidden_fields_rejected — strict schema, no caller infrastructure.
pubkey_published — key rvk_c2997e90215279c2 served without auth.
receipt_fields_complete — replayHash, officialAttestationHash, keyId, observedSlot, signature: independently verifiable.
agent_*/skill_*/workbench_* — workbench pack ingestible; skill refuses "safe" with gaps and bans advice.
receipt_*/storage_*/walrus_* — stored receipts: raw = truth, verify before use, adapters documented-only, storage never changes a verdict.
decision_*/status_* — verdict actions, staleness re-verification, dependency disclosure, gaps never "safe".
key_policy_available / receipt_key_binding — key policy public and secret-free; receipts key-scoped, no post-quantum claim.
no_price_prediction_claims — no response claims price prediction or financial advice.

A Raven-compatible agent should

1. Verify the receipt signature against /pubkey. 2. Inspect coverageGaps and treat listed surfaces as unverified. 3. Fail closed: on risk, warning, or unknowable, escalate — never proceed on vibes. When gaps remain, the honest reading is not enough evidence for a full pass.

Quality Ledger

Invariants (falsifiable, run them)

USDC returns risk (issuer authorities active — no blue-chip special-casing).
Invalid requests return a structured 400 error, never a verdict.
rpcUrl and issuerIdentity are rejected.
Every receipt includes keyId, observedSlot, replayHash, officialAttestationHash, signature.
Raven never says "safe" when coverage gaps remain.
Raven does not predict price and does not provide financial advice.

Latest public run

2026-06-05 · blackbox: 9 pass, 0 fail (keyed, vs production) · 25 evals in the pack

node scripts/raven-public-blackbox-eval.mjs

Known limitations (stated, not hidden)

Holder concentration evidence is key-gated beta, not globally enabled.
Deployer history is not yet live (listed as a coverage gap).
Liquidity gap may remain without pool evidence (Raydium CPMM via poolAddress).
No price prediction, no trading advice — by design, permanently.

Run the evals

# public surface only (no key needed):
node scripts/raven-public-blackbox-eval.mjs

# full pack against the live verifier (operator exports their own key;
# the script never prints it):
RAVEN_HOSTED_API_KEY=<your key> node scripts/raven-public-blackbox-eval.mjs

Runnable reference code: examples/preflight-gate.mjs (verify → store → proceed/block/escalate in 40 lines) · examples/store-receipt.mjs (schema-correct local adapter). Or read /evals.json and implement your own runner — that's the point.

Request a key to run keyed evals Workbench