RyanAlberts
diff --git a/‎BACKLOG.md‎
Lines changed: 2 additions & 1 deletion b/‎BACKLOG.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/QUALITY_REPORT_W26.md‎
Lines changed: 77 additions & 0 deletions b/‎docs/QUALITY_REPORT_W26.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎examples/README.md‎
Lines changed: 5 additions & 2 deletions b/‎examples/README.md‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎examples/output/BROKEN_LINKS-w26-2026-05-01.md‎
Lines changed: 23 additions & 0 deletions b/‎examples/output/BROKEN_LINKS-w26-2026-05-01.md‎
Lines changed: 23 additions & 0 deletions
@@ -20,7 +20,8 @@ Promoted to GitHub issues when an item survives more than one PR. ADRs for non-t
 - [B004] Tune `MIN_DESCRIPTION_CHARS` (currently 80). The W26 probe surfaced one borderline drop (`moda`, 57 chars). A small calibration study against borderline rows would let us pick a defensible threshold. — surfaced in: W26 quality probe — proposed: PR #2
 - [B005] Name the missing-from-upstream companies, not just count them. Compare yc-oss slugs to a slug list discovered from `/companies/<slug>` profile pages so the dropped register includes "Acme (in YC W26 but not in yc-oss/api)". — surfaced in: W26 quality probe — proposed: PR #2 or #3
 - [B006] Track schema-validation failure rate during enrichment as a tracked metric. The W26 smoke run had 1/5 (20%) parse failures (`velum-labs` — likely rationale exceeded the 400 char limit). Measure this across the full batch and tune prompt or schema if rate exceeds ~5%. — surfaced in: PR #2 smoke — proposed: PR #3
-- [B007] Tech-stack and OSS-posture nearly always come back as `unknown` because the model only sees the YC `long_description`, not the company website. Adding a depth=1 website crawl before the LLM call would let the model identify e.g. "this product is closed-source SaaS" or "uses OpenAI" — significantly improving Tier A signal density. Cost: ~5-10 KB extra context per company. — surfaced in: PR #2 smoke — proposed: PR #3
+- [B007] Tech-stack and OSS-posture nearly always come back as `unknown` because the model only sees the YC `long_description`, not the company website. Adding a depth=1 website crawl before the LLM call would let the model identify e.g. "this product is closed-source SaaS" or "uses OpenAI" — significantly improving Tier A signal density. Cost: ~5-10 KB extra context per company. — surfaced in: PR #2 smoke. Confirmed in PR #3 full run: 45 of 83 high-confidence rows have OSS posture `unknown`, 52 have tech_stack `unknown`. — proposed: PR after v0.1
+- [B008] Schema-validation failure rate on the full W26 enrichment was **23%** (29 of 124). The lenient parser added in PR #3 only relaxed `industry_secondary`. Most remaining failures likely come from the model emitting `ai_capability` or `tech_stack` values outside our closed enums. Either extend the lenient parser to those fields, capture a sample of raw failed responses to audit, or introduce `tool_use`-style schema enforcement on the API backend so the model is constrained at decode time. — surfaced in: PR #3 full run — proposed: PR #4 (CLI polish)
 
 ## Done
 
 
@@ -14,5 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - First end-to-end probe on YC W26: 63.3% coverage of the official 196-company batch. Findings in `docs/QUALITY_REPORT_W26.md`.
 - Phase 1 PR #2: LLM-based enrichment with anti-hallucination Layer 1 — pydantic-enforced output schema, source-URL guard against fabricated citations, two-pass cross-check on uncertain rows, sentinel low-confidence row on any failure. Three backends: `AgentSDKBackend` (subscription-default), `AnthropicAPIBackend` (`--api-key`), `MockBackend` (tests). 10 hallucination-trap fixtures locked in as regression tests.
 - W26 enrichment smoke run (5 companies via subscription, 39s, ~free): 4 high / 1 low confidence. Identified `gru.space` as `no-ai` correctly. Schema-validation failure on `velum-labs` correctly fell through to the sentinel — no fabricated analysis served.
+- Phase 1 PR #3: enriched dashboard. AI capability x industry heatmap, tech-stack distribution, OSS-posture breakdown, and confidence breakdown — all with row-level drill-downs. Cited-URL link-verify hard gate before any artifact ships (override via `--allow-dead-links` writes a `BROKEN_LINKS.md` sidecar and shows a warning banner). Lenient parsing for `industry_secondary` so the model can emit reasonable categories without tanking the row.
+- W26 full-batch enrichment via subscription (124 companies, ~6 min, ~free): 83 high / 41 low confidence. Top finding: **65% of high-confidence W26 companies (54 of 83) build agents**. 8 companies correctly classified as `no-ai` (the trust signal). 3 cited URLs caught dead at publish time and surfaced via the publish gate.
 
 [Unreleased]: https://github.com/RyanAlberts/yc-ai-pulse/compare/main...HEAD
@@ -105,3 +105,80 @@ Output: `runs/2026-05-01-185520/{dashboard.html, coverage.json, companies.csv}`.
 
 - [B004] Tune `MIN_DESCRIPTION_CHARS`. 80 is a guess; a small calibration study against the 8 borderline companies would let us pick a defensible value.
 - [B005] Add a "what's missing" section to the dashboard that compares yc-oss slugs to a slug list discovered from the YC `/companies/<slug>` profile pages, so we can name the 64 missing W26 companies, not just count them.
+
+---
+
+## PR #3 — full-batch enrichment results (2026-05-01)
+
+After PR #3 (enriched dashboard), the full 124-company enrichment ran end-to-end via Claude Max subscription. Took ~6 minutes.
+
+### Confidence
+
+- **83 high (67%)** + **0 medium** + **41 low (33%)**.
+- Of the 41 low-confidence rows: **29** were schema-validation failures (model emitted output that didn't validate after lenient pass), **12** were genuinely-uncertain outputs the model itself flagged as low.
+- **0 hallucinated source URLs** detected — the source-URL guard caught zero cases on this run; every cited URL traced back to either the company website or its YC profile page.
+
+### Industry distribution (Tier A high+medium, n=83)
+
+| Industry | n |
+|---|---:|
+| B2B SaaS | 16 |
+| Fintech | 10 |
+| Developer Tools | 7 |
+| AI Infrastructure | 7 |
+| Legal | 5 |
+| Healthcare | 5 |
+| Biotech | 4 |
+| Security | 4 |
+
+The B2B-heavy mix lines up with the [VCCorner W26 demo-day breakdown](https://www.thevccorner.com/p/yc-w26-demo-day-2026-complete-breakdown). The visible Legal cluster (5) is a smaller but real cohort the article didn't separately call out.
+
+### AI capability distribution (n=83)
+
+| Capability | n |
+|---|---:|
+| **agents** | 54 |
+| nlp-classic | 30 |
+| rag | 26 |
+| data-pipeline | 19 |
+| vision | 14 |
+| multimodal | 10 |
+| evals-observability | 9 |
+| **no-ai** | 8 |
+
+**Top finding**: 65% (54 of 83) of high-confidence W26 companies build agents. This is the dominant story of the batch.
+
+**Honesty check**: 8 companies were correctly classified as `no-ai` despite being in the YC batch — the LLM is willing to say "the YC profile suggests AI but the description doesn't actually substantiate it." This is exactly the behavior the anti-hallucination contract is meant to produce.
+
+### OSS posture (n=83)
+
+| Posture | n |
+|---|---:|
+| unknown | 45 |
+| closed | 36 |
+| api-only | 1 |
+| source-available | 1 |
+| fully-open | 0 |
+
+**The "unknown" plurality is the main signal**, and it's structural. The model has access only to the YC `long_description`; OSS posture is rarely stated there. **B007** in the backlog (depth=1 website crawl) would shift these `unknown` rows to `closed` / `api-only` / `weights-only` based on actual evidence (license files, GitHub presence, pricing pages).
+
+Until then, do not over-interpret the `unknown` count: it's a measurement gap, not a finding.
+
+### Tech stack
+
+Dominated by `unknown` (52) and `custom-model` (13). Same structural reason — descriptions don't usually name the model provider. `custom-model` is signal-bearing: 13 companies advertise their own models / fine-tunes, which is a meaningful slice of W26.
+
+### Cited-URL link verification (the publish gate)
+
+Of all source URLs cited across 83 high-confidence rows, **3** returned 4xx/5xx at publish time:
+- `https://www.arzule.com/` — 429 (rate limit)
+- `https://maywoodai.com/` — 404
+- `https://www.caretta.so/` — SSL handshake failure
+
+Each is named in [`examples/output/BROKEN_LINKS-w26-2026-05-01.md`](../examples/output/BROKEN_LINKS-w26-2026-05-01.md) with the company that cited it. Dashboard rendered with `--allow-dead-links` for this example, with a warning banner at the top. In production runs (no `--allow-dead-links`), the pipeline would have refused to write the dashboard and exited non-zero — that's the publish gate.
+
+### Implications
+
+1. **Schema-validation failure rate (23%) is too high for a v0.1 release.** Tracked as B006. Most likely cause is the model emitting enum values outside our closed sets for `ai_capability` or `tech_stack` (we patched `industry_secondary` for this in PR #3 but the other two stayed strict). Fix in a follow-up PR.
+2. **W26 is an agents batch.** This is now defensible — 54 of 83 high-confidence rows, with row-level drill-down showing exactly which companies and what their YC descriptions said.
+3. **The 67% high-confidence rate against 63.3% upstream coverage means the actual analyzable share of W26 is ~42% (83/196).** The headline metric on the dashboard now shows this honestly.
@@ -4,9 +4,12 @@ Sanitized sample artifacts. Every commit goes through `make publish-check` so PI
 
 | File | What |
 |---|---|
-| [`output/dashboard-w26-2026-05-01.html`](output/dashboard-w26-2026-05-01.html) | Phase 1 dashboard for YC W26. Headline: 63.3% coverage of the 196-company batch, with the dropped register naming every excluded company. |
+| [`output/dashboard-w26-enriched-2026-05-01.html`](output/dashboard-w26-enriched-2026-05-01.html) | **PR #3 full-batch dashboard.** Headline: 63.3% coverage of W26, with LLM-derived charts: AI capability x industry heatmap, tech-stack distribution, OSS-posture breakdown. Dead-link banner at top because 3 cited URLs returned 4xx/5xx at publish time. |
+| [`output/dashboard-w26-2026-05-01.html`](output/dashboard-w26-2026-05-01.html) | PR #1 baseline (coverage-only mode, no LLM). Useful comparison for what shifts when --enrich is added. |
 | [`output/coverage-w26-2026-05-01.json`](output/coverage-w26-2026-05-01.json) | Machine-readable coverage report — what feeds the dashboard. |
-| [`output/analyses-w26-smoke-2026-05-01.json`](output/analyses-w26-smoke-2026-05-01.json) | PR #2 smoke run: 5-company LLM enrichment via Sonnet 4.6 on subscription. Captures the schema-enforced output and demonstrates source-URL grounding (every cited URL is from `website` or YC profile). |
+| [`output/analyses-w26-full-2026-05-01.json`](output/analyses-w26-full-2026-05-01.json) | **PR #3 full-batch enrichment.** 124 companies × Sonnet 4.6, ~6 min on subscription. 83 high-confidence rows feed the charts; 41 low-confidence rows surface honestly in the methodology footer. |
+| [`output/analyses-w26-smoke-2026-05-01.json`](output/analyses-w26-smoke-2026-05-01.json) | PR #2 smoke run: 5 companies, the original proof of life. |
+| [`output/BROKEN_LINKS-w26-2026-05-01.md`](output/BROKEN_LINKS-w26-2026-05-01.md) | Sidecar from the full run. Names the 3 cited URLs that returned 4xx/5xx and the slugs that cited them. |
 
 The full quality writeup for W26 is in [`docs/QUALITY_REPORT_W26.md`](../docs/QUALITY_REPORT_W26.md).
 
 
@@ -0,0 +1,23 @@
+# BROKEN_LINKS
+
+4 cited URL(s) returned 4xx/5xx at publish time.
+
+- https://www.arzule.com/
+  - status: dead
+  - reason: 429
+  - cited by: `arzule`
+
+- https://maywoodai.com/
+  - status: dead
+  - reason: 404
+  - cited by: `maywood`
+
+- https://www.ycombinator.com/companies/tensol
+  - status: dead
+  - reason: 404
+  - cited by: `tensol`
+
+- https://www.caretta.so/
+  - status: dead
+  - reason: ConnectError('[SSL] record layer failure (_ssl.c:1016)')
+  - cited by: `caretta`