Scoring Formula

This page defines the final scoring model for Agent Bench.

The formula must be deterministic, inspectable, and explainable.

Score Axes

Axis	Meaning
Seniority	Proven maturity, history, and technical depth
Relevance	Fit for the selected evaluation context
Confidence	Trustworthiness and completeness of the evidence

The headline score is derived from seniority and relevance. Confidence caps the displayed tier.

Headline Formula

headlineScore = seniority * 0.55 + relevance * 0.45

Tier caps:

Confidence	Max tier
High	S
Medium	A
Low	B
Public-read	A
Mock	no production tier

Seniority Components

Component	Weight
Sourcify verified-code maturity	30
GitHub engineering maturity	25
On-chain operational history	20
ENS identity maturity	10
Portfolio evidence depth	15
Total	100

Sourcify Verified-Code Maturity

Weight: 30.

Attribute	Weight	Normalization
`verifiedContractsCount`	6	`min(count / 5, 1)`
`exactMatchRatio`	6	`exactMatches / totalContracts`
`metadataCompleteness`	4	avg(ABI, source, compiler, storageLayout presence)
`storageHygieneScore`	6	1 no incompatible changes, 0.5 unknown, 0 collision
`proxyResolutionCoverage`	4	resolved proxies / proxy-like contracts
`riskySelectorPenalty`	4	`1 - min(riskySelectors / 3, 1)`

GitHub Engineering Maturity

Weight: 25.

Attribute	Weight	Normalization
`repoCount`	3	`min(repos / 5, 1)`
`repoAgeMonths`	4	`min(ageMonths / 18, 1)`
`recentCommits90d`	4	`min(commits / 60, 1)`
`ciPassRate`	4	successful runs / total recent runs
`testPresence`	3	1 tests found, 0.5 partial, 0 none
`releaseCount`	2	`min(releases / 5, 1)`
`issueHygiene`	3	closed/resolved ratio with stale issue penalty
`securityHygiene`	2	avg(SECURITY, dependabot, lockfiles)

Apply GitHub trust multiplier after component normalization.

On-Chain Operational History

Weight: 20.

Attribute	Weight	Normalization
`firstSeenAgeDays`	4	`min(days / 365, 1)`
`txCountTotal`	4	log-scaled to avoid spam dominance
`txCountRecent90d`	4	log-scaled recent activity
`contractsDeployedCount`	3	`min(count / 5, 1)`
`uniqueInteractors`	3	log-scaled
`activityContinuity`	2	active months / observed months

Spam rule:

High transaction count without source diversity does not produce high seniority.

ENS Identity Maturity

Weight: 10.

Attribute	Weight	Normalization
`ensNameAgeDays`	2	`min(days / 365, 1)`
`manifestPresent`	2	1 present, 0 absent
`recordsCompleteness`	2	present expected records / expected records
`ownerConsistency`	2	owner/operator/signers align
`endpointPresence`	1	web/context records present
`subnameSignal`	1	capped subname count

Portfolio Evidence Depth

Weight: 15.

Attribute	Weight	Normalization
`portfolioItemCount`	3	`min(items / 6, 1)`
`portfolioSourceDiversity`	3	source kinds present / 5
`verifiedClaimRatio`	4	verified claims / total claims
`contractBackedItemRatio`	2	contract-backed items / applicable items
`evidenceLinkCoverage`	3	claims with usable evidence / total claims

Relevance Components

Default context:

agentic venture due diligence

Component	Weight
Category fit	25
Recent activity	20
Portfolio alignment	20
Public-good / Umia fit	15
Claim credibility	10
Demo readiness	10
Total	100

Category Fit

Weight: 25.

Signals:

manifest context tags,
repository topics,
README keywords,
portfolio item roles,
claims taxonomy.

Example contexts:

public goods,
grants,
audit/safety,
governance,
data extraction,
developer tooling,
trading,
research.

Recent Activity

Weight: 20.

Signals:

GitHub commits last 90 days,
releases last 180 days,
on-chain activity last 90 days,
recent verified contract update,
manifest update freshness.

Portfolio Alignment

Weight: 20.

Signals:

portfolio items match claimed category,
contracts/repos support stated capability,
evidence exists for core claim,
no major contradiction between sources.

Public-Good / Umia Fit

Weight: 15.

Signals:

due-diligence utility,
agentic venture applicability,
civic/public-good usefulness,
non-extractive framing,
compatibility with launch platform review.

Claim Credibility

Weight: 10.

Signals:

verified claims,
signed claims,
discounted claims,
missing evidence.

Demo Readiness

Weight: 10.

Signals:

public endpoint works,
report is shareable,
portfolio has inspectable items,
clear story for reviewer.

Confidence Formula

Confidence is separate.

confidence =
  sourceCoverage * 0.25 +
  verificationCoverage * 0.30 +
  identityBinding * 0.20 +
  freshness * 0.15 +
  errorHealth * 0.10

Component	Meaning
sourceCoverage	Required source cards returned usable data
verificationCoverage	Inputs are verified/signed rather than self-asserted
identityBinding	ENS, GitHub, contracts, and operator connect cleanly
freshness	Data is within TTL
errorHealth	Few failed collectors

Trust Multipliers

Trust state	Multiplier
Sourcify verified exact match	1.0
On-chain direct read	1.0
ENS direct record	1.0
Cross-signed GitHub	1.0
ENS-signed claim	0.85
Public but unverified GitHub claim	0.6
Public URL without ownership proof	0.6
Self-asserted manifest claim	0.35
Missing evidence	0
Mock	0 for production score; demo-only

Score Row Calculation

For every row:

contribution = normalizedValue * weight * trustMultiplier

The UI must show all three factors.

Report Outcomes

Outcome	Suggested thresholds
fast-track	headline >= 75 and confidence high/medium
emerging-review	relevance >= 70 and seniority < 60
evidence-required	confidence low or verifiedClaimRatio < 0.4
manual-security-review	any portfolio contract returns SIREN
reject-or-redirect	relevance < 45 or evidence count too low

These are reviewer routing labels, not investment advice.

Formula Governance

The formula is public.

Any weight change must update:

this page,
product constants,
report JSON schema version if output changes,
demo fixture expected scores,
explanation copy.

Hidden score changes are not allowed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring Formula

Scoring Formula

Score Axes

Headline Formula

Seniority Components

Sourcify Verified-Code Maturity

GitHub Engineering Maturity

On-Chain Operational History

ENS Identity Maturity

Portfolio Evidence Depth

Relevance Components

Category Fit

Recent Activity

Portfolio Alignment

Public-Good / Umia Fit

Claim Credibility

Demo Readiness

Confidence Formula

Trust Multipliers

Score Row Calculation

Report Outcomes

Formula Governance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally