-
Notifications
You must be signed in to change notification settings - Fork 1
Home
🏆 Won 1st place — Umia track + ENS Best ENS Integration for AI Agents at ETHPrague 2026 (2026-05-10). See Win Postmortem for what won, what made the difference, and the stack.
This wiki is the internal final-product specification for the ETHPrague build.
It describes what the product must look like at the end of the build, what the team should show in the demo, and how every visible score, attribute, and UX state should work. It is not a deployment changelog. Current implementation status lives in the repository README and docs/13-backlog.md.
Upgrade Siren is a due-diligence engine for ENS-named agents and projects.
The first final-demo surface is Agent Bench:
/b/[agent]
Agent Bench takes an ENS-named agent, reads its public evidence, evaluates its portfolio, and produces:
- Seniority score — how much proven history, engineering maturity, and operational depth the agent has.
- Relevance score — how relevant the agent is to the selected evaluation context.
- Confidence score — how much of the claim is backed by verifiable public evidence.
- Full evidence report — every score component, every source, every weight, every trust discount, every raw value.
The supporting surface is Contract Risk:
/r/[name]
Contract Risk evaluates one upgradeable contract or proxy and returns SAFE, REVIEW, or SIREN. It is used directly in the demo and also becomes the Sourcify drilldown inside Agent Bench.
The later commercial surface is Project Bench for Umia-style venture screening:
/b/[project]
Project Bench reuses the same scoring engine, but evaluates projects, teams, and ventures instead of individual agents. It is built after the agent experience is final.
The final demo must show three agents side by side.
| Demo agent | Purpose | Expected story |
|---|---|---|
| Atlas | Strong verified agent | High seniority, high relevance, high confidence |
| Nova | Emerging agent | Medium seniority, high relevance, medium confidence |
| Mirage | Claim-heavy agent | Low confidence because claims are weakly verified |
The demo flow:
- Open the comparison view with three agent cards.
- Show that the headline scores differ for a reason, not by vibes.
- Open one agent report at
/b/[agent]. - Show the score banner: seniority, relevance, confidence, tier.
- Open the source grid: Sourcify, GitHub, on-chain, ENS, portfolio claims.
- Drill into a score component and show raw evidence, normalization, weight, trust multiplier, and final contribution.
- Open a contract inside the portfolio and jump into
/r/[name]. - Return to the report and show "How to improve this score".
The intended wow moment:
Three agents make similar claims.
Only one has verifiable evidence.
The score explains the difference line by line.
| Page | Purpose |
|---|---|
| Project Overview | What the product is, what the demo shows, and what the final system contains |
| Product Architecture | Routes, data flow, scoring model, report schema, source engines, UX states |
| Final Report Spec | Exact /b/[agent] page layout, components, drawers, tables, and states |
| Scoring Formula | Score axes, weights, attributes, normalization, trust multipliers |
| Business Architecture | Umia use case, buyer logic, packaging, business model |
| Sponsor Strategy | How the final product maps to Umia, Sourcify, ENS, and Future Society |
| Demo Script | Exact end-state demo flow for the three-agent comparison and report page |
| Risk Register | Product, scoring, demo, and implementation risks |
- Proof over promises. Self-reported claims are allowed, but discounted unless verified.
- No black-box scoring. Every point in the score must be explainable from visible evidence.
- ENS is the identity spine. Agents and projects are named, found, and bound through ENS records.
- Sourcify is the verified-code spine. Contract evidence comes from Sourcify, not screenshots or explorer vibes.
- GitHub claims are not trusted by default. They receive a trust discount until cross-signed or otherwise verified.
- On-chain history matters. Age, deployments, transaction history, and live contract usage affect seniority.
- Relevance is contextual. A strong agent in one context can be weak in another.
- Confidence is separate from quality. A high-quality claim with no evidence is still low confidence.
- The report page is the product. The score is only the headline; the detailed report is what sells the system.
Every final report must answer these questions:
- Who or what is being evaluated?
- What did the subject claim?
- Which claims are verified?
- Which claims are unverified?
- Which public sources were used?
- How fresh are the sources?
- What is the seniority score?
- What is the relevance score?
- What is the confidence score?
- Which exact attributes contributed to each score?
- Which evidence links support each attribute?
- Which parts are discounted?
- What would improve the score?
If the page cannot answer those questions, the page is not finished.