Home

Upgrade Siren Wiki

🏆 Won 1st place — Umia track + ENS Best ENS Integration for AI Agents at ETHPrague 2026 (2026-05-10). See Win Postmortem for what won, what made the difference, and the stack.

This wiki is the internal final-product specification for the ETHPrague build.

It describes what the product must look like at the end of the build, what the team should show in the demo, and how every visible score, attribute, and UX state should work. It is not a deployment changelog. Current implementation status lives in the repository README and docs/13-backlog.md.

Product Definition

Upgrade Siren is a due-diligence engine for ENS-named agents and projects.

The first final-demo surface is Agent Bench:

/b/[agent]

Agent Bench takes an ENS-named agent, reads its public evidence, evaluates its portfolio, and produces:

Seniority score — how much proven history, engineering maturity, and operational depth the agent has.
Relevance score — how relevant the agent is to the selected evaluation context.
Confidence score — how much of the claim is backed by verifiable public evidence.
Full evidence report — every score component, every source, every weight, every trust discount, every raw value.

The supporting surface is Contract Risk:

/r/[name]

Contract Risk evaluates one upgradeable contract or proxy and returns SAFE, REVIEW, or SIREN. It is used directly in the demo and also becomes the Sourcify drilldown inside Agent Bench.

The later commercial surface is Project Bench for Umia-style venture screening:

/b/[project]

Project Bench reuses the same scoring engine, but evaluates projects, teams, and ventures instead of individual agents. It is built after the agent experience is final.

Final Demo Shape

The final demo must show three agents side by side.

Demo agent	Purpose	Expected story
Atlas	Strong verified agent	High seniority, high relevance, high confidence
Nova	Emerging agent	Medium seniority, high relevance, medium confidence
Mirage	Claim-heavy agent	Low confidence because claims are weakly verified

The demo flow:

Open the comparison view with three agent cards.
Show that the headline scores differ for a reason, not by vibes.
Open one agent report at /b/[agent].
Show the score banner: seniority, relevance, confidence, tier.
Open the source grid: Sourcify, GitHub, on-chain, ENS, portfolio claims.
Drill into a score component and show raw evidence, normalization, weight, trust multiplier, and final contribution.
Open a contract inside the portfolio and jump into /r/[name].
Return to the report and show "How to improve this score".

The intended wow moment:

Three agents make similar claims.
Only one has verifiable evidence.
The score explains the difference line by line.

Primary Pages

Page	Purpose
Project Overview	What the product is, what the demo shows, and what the final system contains
Product Architecture	Routes, data flow, scoring model, report schema, source engines, UX states
Final Report Spec	Exact `/b/[agent]` page layout, components, drawers, tables, and states
Scoring Formula	Score axes, weights, attributes, normalization, trust multipliers
Business Architecture	Umia use case, buyer logic, packaging, business model
Sponsor Strategy	How the final product maps to Umia, Sourcify, ENS, and Future Society
Demo Script	Exact end-state demo flow for the three-agent comparison and report page
Risk Register	Product, scoring, demo, and implementation risks

Final Product Principles

Proof over promises. Self-reported claims are allowed, but discounted unless verified.
No black-box scoring. Every point in the score must be explainable from visible evidence.
ENS is the identity spine. Agents and projects are named, found, and bound through ENS records.
Sourcify is the verified-code spine. Contract evidence comes from Sourcify, not screenshots or explorer vibes.
GitHub claims are not trusted by default. They receive a trust discount until cross-signed or otherwise verified.
On-chain history matters. Age, deployments, transaction history, and live contract usage affect seniority.
Relevance is contextual. A strong agent in one context can be weak in another.
Confidence is separate from quality. A high-quality claim with no evidence is still low confidence.
The report page is the product. The score is only the headline; the detailed report is what sells the system.

Output Contract

Every final report must answer these questions:

Who or what is being evaluated?
What did the subject claim?
Which claims are verified?
Which claims are unverified?
Which public sources were used?
How fresh are the sources?
What is the seniority score?
What is the relevance score?
What is the confidence score?
Which exact attributes contributed to each score?
Which evidence links support each attribute?
Which parts are discounted?
What would improve the score?

If the page cannot answer those questions, the page is not finished.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Upgrade Siren Wiki

Product Definition

Final Demo Shape

Primary Pages

Final Product Principles

Output Contract

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally