Skip to content

Final Report Spec

Daniel Babjak edited this page May 9, 2026 · 1 revision

Final Report Spec

This page describes the final /b/[agent] report as a product surface.

The report must feel like a professional due-diligence memo, not a dashboard full of unrelated metrics.

Page Structure

┌─────────────────────────────────────────────────────────────┐
│ Header: Upgrade Siren Bench                                 │
│ Search / Compare / Export                                   │
├─────────────────────────────────────────────────────────────┤
│ Subject Hero                                                 │
│ ENS name, display name, subject kind, confidence warning     │
│ Headline score, seniority, relevance, confidence             │
├─────────────────────────────────────────────────────────────┤
│ Source Coverage Strip                                        │
│ Sourcify | GitHub | On-chain | ENS | Portfolio Claims        │
├─────────────────────────────────────────────────────────────┤
│ Executive Summary                                            │
│ 3 strengths, 3 risks, reviewer outcome                       │
├─────────────────────────────────────────────────────────────┤
│ Source Grid                                                  │
│ 5 cards, each opens a drawer                                 │
├─────────────────────────────────────────────────────────────┤
│ Score Breakdown Table                                        │
│ Attribute-by-attribute math                                  │
├─────────────────────────────────────────────────────────────┤
│ Portfolio                                                    │
│ Contracts, repos, claims, endpoints, reports                 │
├─────────────────────────────────────────────────────────────┤
│ Improve This Score                                           │
│ Concrete next actions                                        │
├─────────────────────────────────────────────────────────────┤
│ Raw Evidence / JSON                                          │
│ Copy/export report                                           │
└─────────────────────────────────────────────────────────────┘

Header

Required controls:

  • product mark,
  • search input,
  • compare button,
  • export button,
  • timestamp/freshness indicator.

Header copy:

Upgrade Siren Bench
Proof, not promises, for agentic ventures.

Subject Hero

Required fields:

Field Example
ENS name atlas.demo.upgradesiren.eth
Display name Atlas
Subject kind AI Agent
Primary context Agentic venture due diligence
Headline score 82 / 100
Tier A
Seniority 86
Relevance 78
Confidence High
Last updated 2 min ago
Mode manifest, public-read, or mock

Hero states:

State Visual treatment
High confidence Strong normal score display
Medium confidence Score visible with amber confidence note
Low confidence Score visible but capped; warning visible
Public-read Badge: public-read, explanation line
Mock Persistent mock: true banner
Not enough evidence Tier U, no fake headline score

Hero copy examples:

High confidence:

Atlas has strong verified evidence across code, identity, and on-chain activity.

Low confidence:

Mirage makes relevant claims, but most evidence is unverified or self-asserted.

Source Coverage Strip

One compact row:

Sourcify: verified
GitHub: discounted
On-chain: live
ENS: manifest
Portfolio: mixed

Each item shows:

  • source status,
  • freshness,
  • contribution,
  • trust level.

Executive Summary

This section gives non-technical reviewers an answer in 15 seconds.

Required blocks:

Strengths

Three bullets max.

Example:

  • 5 of 6 portfolio contracts are exact-match verified on Sourcify.
  • GitHub repos have tests, CI, and recent releases.
  • On-chain activity is continuous over 14 months.

Risks

Three bullets max.

Example:

  • GitHub ownership is not cross-signed, so GitHub contribution is discounted.
  • One portfolio claim has no public evidence.
  • One contract has missing storage-layout metadata.

Reviewer Outcome

One of:

Outcome Meaning
fast-track Strong enough for deeper review
emerging-review Promising, but not yet senior
evidence-required Claims need proof before review
manual-security-review Contract risk requires manual inspection
reject-or-redirect Low relevance or insufficient evidence

Source Grid

Sourcify Card

Visible fields:

  • verified contracts count,
  • unverified contracts count,
  • exact match ratio,
  • storage hygiene,
  • risky upgrade count,
  • contribution to seniority,
  • contribution to confidence.

Open drawer:

  • contract table,
  • verification status,
  • implementation history,
  • storage-layout timeline,
  • ABI risk list,
  • links to source.

GitHub Card

Visible fields:

  • repository count,
  • recent activity,
  • CI status,
  • tests detected,
  • releases,
  • trust multiplier.

Open drawer:

  • repo table,
  • last push,
  • workflow status,
  • test/license/security badges,
  • issue hygiene,
  • trust-discount explanation.

On-Chain Card

Visible fields:

  • first seen,
  • total activity,
  • recent activity,
  • deployments,
  • portfolio address match.

Open drawer:

  • address list,
  • chain list,
  • deployment timeline,
  • explorer links,
  • recent activity sparkline or table.

ENS Card

Visible fields:

  • manifest present,
  • owner/operator,
  • record completeness,
  • subnames,
  • endpoint records.

Open drawer:

  • raw manifest,
  • ENS text records,
  • owner/operator consistency,
  • subname list,
  • standards records.

Portfolio Claims Card

Visible fields:

  • total claims,
  • verified claims,
  • discounted claims,
  • missing evidence claims.

Open drawer:

  • claim table,
  • evidence URL,
  • trust state,
  • contribution to relevance/confidence.

Score Breakdown Table

The table must be sortable by contribution.

Columns:

Column Required? Notes
Attribute Yes Human-readable and machine key
Axis Yes Seniority, relevance, confidence
Source Yes Sourcify, GitHub, on-chain, ENS, portfolio
Raw value Yes Never hide raw input
Normalized Yes 0..1
Weight Yes Points or percentage
Trust Yes Multiplier
Contribution Yes Final points
Evidence Yes Link/drawer
Freshness Yes Data age

Example row:

Attribute Axis Source Raw Normalized Weight Trust Contribution
exactMatchRatio seniority Sourcify 5/6 0.83 10 1.0 8.3

Portfolio Section

Portfolio item fields:

Field Example
Name Atlas Treasury Monitor
Kind Contract
Role Production monitor
Chain Sepolia
Address 0x...
ENS treasury.atlas.demo.upgradesiren.eth
Verification Sourcify exact match
Risk SAFE / REVIEW / SIREN
Evidence links

Portfolio item kinds:

  • contract,
  • repository,
  • endpoint,
  • dataset,
  • report,
  • claim,
  • integration.

Improve This Score

The recommendations must be concrete.

Examples:

Problem Recommendation
GitHub discounted Cross-sign GitHub ownership from ENS owner
Unverified contract Verify source on Sourcify
Missing tests Add test directory and CI
Missing manifest Publish agent-bench:bench_manifest
Self-asserted claim Add public evidence URL
Low relevance Add context-specific portfolio items

Raw Evidence

At the bottom:

  • view JSON,
  • copy JSON,
  • download report,
  • copy share link,
  • copy reviewer summary.

The raw JSON must include:

  • source payload summaries,
  • score breakdown,
  • error states,
  • freshness,
  • mode,
  • mock flag.

Mobile Behavior

Mobile layout:

  • hero first,
  • source coverage strip wraps,
  • source cards stack,
  • comparison cards stack,
  • breakdown table becomes horizontally scrollable,
  • drawers become full-screen sheets.

The headline score and confidence must remain visible without horizontal scrolling.

Accessibility

Required:

  • no status conveyed by color only,
  • buttons have labels,
  • score changes are announced with text,
  • table headers are real table headers,
  • drawer focus is trapped,
  • export/copy actions have confirmation text.

Done Definition

The report page is done when a reviewer can:

  1. understand the score in 15 seconds,
  2. audit one score row in 30 seconds,
  3. open every source,
  4. see which claims are discounted,
  5. export a summary,
  6. explain why Atlas beats Mirage.

Clone this wiki locally