-
Notifications
You must be signed in to change notification settings - Fork 1
Final Report Spec
This page describes the final /b/[agent] report as a product surface.
The report must feel like a professional due-diligence memo, not a dashboard full of unrelated metrics.
┌─────────────────────────────────────────────────────────────┐
│ Header: Upgrade Siren Bench │
│ Search / Compare / Export │
├─────────────────────────────────────────────────────────────┤
│ Subject Hero │
│ ENS name, display name, subject kind, confidence warning │
│ Headline score, seniority, relevance, confidence │
├─────────────────────────────────────────────────────────────┤
│ Source Coverage Strip │
│ Sourcify | GitHub | On-chain | ENS | Portfolio Claims │
├─────────────────────────────────────────────────────────────┤
│ Executive Summary │
│ 3 strengths, 3 risks, reviewer outcome │
├─────────────────────────────────────────────────────────────┤
│ Source Grid │
│ 5 cards, each opens a drawer │
├─────────────────────────────────────────────────────────────┤
│ Score Breakdown Table │
│ Attribute-by-attribute math │
├─────────────────────────────────────────────────────────────┤
│ Portfolio │
│ Contracts, repos, claims, endpoints, reports │
├─────────────────────────────────────────────────────────────┤
│ Improve This Score │
│ Concrete next actions │
├─────────────────────────────────────────────────────────────┤
│ Raw Evidence / JSON │
│ Copy/export report │
└─────────────────────────────────────────────────────────────┘
Required controls:
- product mark,
- search input,
- compare button,
- export button,
- timestamp/freshness indicator.
Header copy:
Upgrade Siren Bench
Proof, not promises, for agentic ventures.
Required fields:
| Field | Example |
|---|---|
| ENS name | atlas.demo.upgradesiren.eth |
| Display name | Atlas |
| Subject kind | AI Agent |
| Primary context | Agentic venture due diligence |
| Headline score | 82 / 100 |
| Tier | A |
| Seniority | 86 |
| Relevance | 78 |
| Confidence | High |
| Last updated | 2 min ago |
| Mode |
manifest, public-read, or mock
|
Hero states:
| State | Visual treatment |
|---|---|
| High confidence | Strong normal score display |
| Medium confidence | Score visible with amber confidence note |
| Low confidence | Score visible but capped; warning visible |
| Public-read | Badge: public-read, explanation line |
| Mock | Persistent mock: true banner |
| Not enough evidence | Tier U, no fake headline score |
Hero copy examples:
High confidence:
Atlas has strong verified evidence across code, identity, and on-chain activity.
Low confidence:
Mirage makes relevant claims, but most evidence is unverified or self-asserted.
One compact row:
Sourcify: verified
GitHub: discounted
On-chain: live
ENS: manifest
Portfolio: mixed
Each item shows:
- source status,
- freshness,
- contribution,
- trust level.
This section gives non-technical reviewers an answer in 15 seconds.
Required blocks:
Three bullets max.
Example:
- 5 of 6 portfolio contracts are exact-match verified on Sourcify.
- GitHub repos have tests, CI, and recent releases.
- On-chain activity is continuous over 14 months.
Three bullets max.
Example:
- GitHub ownership is not cross-signed, so GitHub contribution is discounted.
- One portfolio claim has no public evidence.
- One contract has missing storage-layout metadata.
One of:
| Outcome | Meaning |
|---|---|
fast-track |
Strong enough for deeper review |
emerging-review |
Promising, but not yet senior |
evidence-required |
Claims need proof before review |
manual-security-review |
Contract risk requires manual inspection |
reject-or-redirect |
Low relevance or insufficient evidence |
Visible fields:
- verified contracts count,
- unverified contracts count,
- exact match ratio,
- storage hygiene,
- risky upgrade count,
- contribution to seniority,
- contribution to confidence.
Open drawer:
- contract table,
- verification status,
- implementation history,
- storage-layout timeline,
- ABI risk list,
- links to source.
Visible fields:
- repository count,
- recent activity,
- CI status,
- tests detected,
- releases,
- trust multiplier.
Open drawer:
- repo table,
- last push,
- workflow status,
- test/license/security badges,
- issue hygiene,
- trust-discount explanation.
Visible fields:
- first seen,
- total activity,
- recent activity,
- deployments,
- portfolio address match.
Open drawer:
- address list,
- chain list,
- deployment timeline,
- explorer links,
- recent activity sparkline or table.
Visible fields:
- manifest present,
- owner/operator,
- record completeness,
- subnames,
- endpoint records.
Open drawer:
- raw manifest,
- ENS text records,
- owner/operator consistency,
- subname list,
- standards records.
Visible fields:
- total claims,
- verified claims,
- discounted claims,
- missing evidence claims.
Open drawer:
- claim table,
- evidence URL,
- trust state,
- contribution to relevance/confidence.
The table must be sortable by contribution.
Columns:
| Column | Required? | Notes |
|---|---|---|
| Attribute | Yes | Human-readable and machine key |
| Axis | Yes | Seniority, relevance, confidence |
| Source | Yes | Sourcify, GitHub, on-chain, ENS, portfolio |
| Raw value | Yes | Never hide raw input |
| Normalized | Yes | 0..1 |
| Weight | Yes | Points or percentage |
| Trust | Yes | Multiplier |
| Contribution | Yes | Final points |
| Evidence | Yes | Link/drawer |
| Freshness | Yes | Data age |
Example row:
| Attribute | Axis | Source | Raw | Normalized | Weight | Trust | Contribution |
|---|---|---|---|---|---|---|---|
| exactMatchRatio | seniority | Sourcify | 5/6 | 0.83 | 10 | 1.0 | 8.3 |
Portfolio item fields:
| Field | Example |
|---|---|
| Name | Atlas Treasury Monitor |
| Kind | Contract |
| Role | Production monitor |
| Chain | Sepolia |
| Address | 0x... |
| ENS | treasury.atlas.demo.upgradesiren.eth |
| Verification | Sourcify exact match |
| Risk | SAFE / REVIEW / SIREN |
| Evidence | links |
Portfolio item kinds:
- contract,
- repository,
- endpoint,
- dataset,
- report,
- claim,
- integration.
The recommendations must be concrete.
Examples:
| Problem | Recommendation |
|---|---|
| GitHub discounted | Cross-sign GitHub ownership from ENS owner |
| Unverified contract | Verify source on Sourcify |
| Missing tests | Add test directory and CI |
| Missing manifest | Publish agent-bench:bench_manifest
|
| Self-asserted claim | Add public evidence URL |
| Low relevance | Add context-specific portfolio items |
At the bottom:
- view JSON,
- copy JSON,
- download report,
- copy share link,
- copy reviewer summary.
The raw JSON must include:
- source payload summaries,
- score breakdown,
- error states,
- freshness,
- mode,
- mock flag.
Mobile layout:
- hero first,
- source coverage strip wraps,
- source cards stack,
- comparison cards stack,
- breakdown table becomes horizontally scrollable,
- drawers become full-screen sheets.
The headline score and confidence must remain visible without horizontal scrolling.
Required:
- no status conveyed by color only,
- buttons have labels,
- score changes are announced with text,
- table headers are real table headers,
- drawer focus is trapped,
- export/copy actions have confirmation text.
The report page is done when a reviewer can:
- understand the score in 15 seconds,
- audit one score row in 30 seconds,
- open every source,
- see which claims are discounted,
- export a summary,
- explain why Atlas beats Mirage.