First end-to-end run of the Phase 1 quality probe. No LLM calls; this is the data-quality floor against which classification + report generation will run in subsequent PRs.
63.3% of YC W26 analyzed — 124 of 196 companies pass the data-quality bar.
| Source | Count | Notes |
|---|---|---|
| YC W26 official (Demo Day, 2026-03-24) | 196 | Per the VC Corner W26 breakdown. |
| yc-oss/api fixture (last refreshed 2026-02-08) | 132 | 64 companies missing — upstream is stale by ~3 months. |
| Tier A (full classification) | 120 | All required fields + website returned 2xx/3xx. |
| Tier B (partial — website unreachable) | 4 | Required fields present; website 4xx/5xx. Kept in charts with a flag. |
| Tier C (excluded) | 8 | Acknowledged in the dropped register below. |
| Analyzable (A + B) | 124 | Feeds every chart in the dashboard. |
Coverage of upstream: 93.9% (124 / 132). Coverage of YC official: 63.3% (124 / 196). ← headline metric
yc-oss/api's meta.json reports last_updated: 2026-02-08T01:49:11Z. W26 Demo Day was 2026-03-24, so the upstream was last refreshed ~6 weeks before the batch closed. The Demo Day–era cohort (~64 companies) is missing from the feed entirely.
This is not a bug in yc-ai-pulse — yc-oss/api is community-maintained. Mitigations:
- Already in place: the dashboard surfaces this gap upfront ("Upstream gap" alert banner).
- B003 (open in BACKLOG): add a CI cron that warns if the upstream is >48h stale. The W26 case would have tripped it ~3 months ago.
- Future: consider a direct YC profile-page enrichment (allowed under robots.txt for
/companies/<slug>) for slug lists discovered from elsewhere. Not in v0.1 scope.
Eight companies in the upstream feed were excluded from charts because they're missing fields the analysis layer requires. They are listed by name:
| Slug | Name | Reason |
|---|---|---|
protent |
Protent | long_description empty |
byteport |
Byteport | long_description empty |
zerosettle |
ZeroSettle | long_description empty |
traverse |
Traverse | long_description empty |
grade |
Grade | long_description empty |
zymbly |
Zymbly | long_description empty |
moda |
Moda | long_description 57 chars (below 80-char threshold) |
condor-energy |
Condor Energy | website field empty |
Auditable threshold: MIN_DESCRIPTION_CHARS = 80 (src/ycai/coverage.py). Lowering it to 50 would bring moda back; raising it to 120 would drop ~6 more borderline rows. The current threshold balances inclusion with the requirement that classification be evidence-backed.
Tier B keeps these companies in the analysis but flags them in the dashboard:
maywood— Maywoodcaretta— Carettaarzule— Arzuleservo7— Servo7
These had 4xx/5xx responses at probe time. Could be transient. The verifier reruns at report build time (PR #3 acceptance gate).
Industry distribution (from the YC-supplied industry field, no LLM yet):
| Industry | Count |
|---|---|
| B2B | 80 |
| Industrials | 18 |
| Healthcare | 9 |
| Fintech | 8 |
| Consumer | 6 |
| Real Estate and Construction | 3 |
The B2B-heavy distribution lines up with the thevccorner.com breakdown (64% B2B for W26). Internal consistency check passes.
ok(2xx/3xx): 127 websitesdead(4xx/5xx): 4 websitesslow(>5s): 0redirect(>3 hops): 0error(network): 0
PYTHONPATH=src python3 -m ycai.cli run-coverage \
--batch winter-2026 \
--yc-official-count 196Output: runs/2026-05-01-185520/{dashboard.html, coverage.json, companies.csv}.
- PR #2 (researcher + classifier): must consume
coverage.jsondirectly so its denominator agrees with the dashboard. The LLM never sees Tier C rows. - PR #3 (deck/memo): the methodology slide must show the same 63.3% headline, same upstream-gap callout, same dropped-register table. CI should fail if the deck cites a different denominator.
- PR #5 (release): consider adding a "data freshness" indicator to the README badge so users know if the latest cached run is from a stale upstream.
- [B004] Tune
MIN_DESCRIPTION_CHARS. 80 is a guess; a small calibration study against the 8 borderline companies would let us pick a defensible value. - [B005] Add a "what's missing" section to the dashboard that compares yc-oss slugs to a slug list discovered from the YC
/companies/<slug>profile pages, so we can name the 64 missing W26 companies, not just count them.
After PR #3 (enriched dashboard), the full 124-company enrichment ran end-to-end via Claude Max subscription. Took ~6 minutes.
- 83 high (67%) + 0 medium + 41 low (33%).
- Of the 41 low-confidence rows: 29 were schema-validation failures (model emitted output that didn't validate after lenient pass), 12 were genuinely-uncertain outputs the model itself flagged as low.
- 0 hallucinated source URLs detected — the source-URL guard caught zero cases on this run; every cited URL traced back to either the company website or its YC profile page.
| Industry | n |
|---|---|
| B2B SaaS | 16 |
| Fintech | 10 |
| Developer Tools | 7 |
| AI Infrastructure | 7 |
| Legal | 5 |
| Healthcare | 5 |
| Biotech | 4 |
| Security | 4 |
The B2B-heavy mix lines up with the VCCorner W26 demo-day breakdown. The visible Legal cluster (5) is a smaller but real cohort the article didn't separately call out.
| Capability | n |
|---|---|
| agents | 54 |
| nlp-classic | 30 |
| rag | 26 |
| data-pipeline | 19 |
| vision | 14 |
| multimodal | 10 |
| evals-observability | 9 |
| no-ai | 8 |
Top finding: 65% (54 of 83) of high-confidence W26 companies build agents. This is the dominant story of the batch.
Honesty check: 8 companies were correctly classified as no-ai despite being in the YC batch — the LLM is willing to say "the YC profile suggests AI but the description doesn't actually substantiate it." This is exactly the behavior the anti-hallucination contract is meant to produce.
| Posture | n |
|---|---|
| unknown | 45 |
| closed | 36 |
| api-only | 1 |
| source-available | 1 |
| fully-open | 0 |
The "unknown" plurality is the main signal, and it's structural. The model has access only to the YC long_description; OSS posture is rarely stated there. B007 in the backlog (depth=1 website crawl) would shift these unknown rows to closed / api-only / weights-only based on actual evidence (license files, GitHub presence, pricing pages).
Until then, do not over-interpret the unknown count: it's a measurement gap, not a finding.
Dominated by unknown (52) and custom-model (13). Same structural reason — descriptions don't usually name the model provider. custom-model is signal-bearing: 13 companies advertise their own models / fine-tunes, which is a meaningful slice of W26.
Of all source URLs cited across 83 high-confidence rows, 3 returned 4xx/5xx at publish time:
https://www.arzule.com/— 429 (rate limit)https://maywoodai.com/— 404https://www.caretta.so/— SSL handshake failure
Each is named in examples/output/BROKEN_LINKS-w26-2026-05-01.md with the company that cited it. Dashboard rendered with --allow-dead-links for this example, with a warning banner at the top. In production runs (no --allow-dead-links), the pipeline would have refused to write the dashboard and exited non-zero — that's the publish gate.
- Schema-validation failure rate (23%) is too high for a v0.1 release. Tracked as B006. Most likely cause is the model emitting enum values outside our closed sets for
ai_capabilityortech_stack(we patchedindustry_secondaryfor this in PR #3 but the other two stayed strict). Fix in a follow-up PR. - W26 is an agents batch. This is now defensible — 54 of 83 high-confidence rows, with row-level drill-down showing exactly which companies and what their YC descriptions said.
- The 67% high-confidence rate against 63.3% upstream coverage means the actual analyzable share of W26 is ~42% (83/196). The headline metric on the dashboard now shows this honestly.
After PR #4 (resilience + parser tightening), full-batch enrichment metrics improved meaningfully on a fresh run:
| metric | PR #3 | PR #4 | change |
|---|---|---|---|
| Total analyzed | 124 | 124 | – |
| High confidence | 83 (67%) | 118 (95%) | +35 |
| Schema-validation failures | 29 (23%) | 0 (0%) | -29 |
| Genuinely-uncertain model lows | 12 | 6 | -6 |
| Hallucinated source URLs | 0 | 0 | – |
Root cause: The 23% schema-validation failure rate in PR #3 was caused by the model emitting rationale fields longer than our 400-char Field(max_length=400) constraint. The model was being thorough; our schema was being unnecessarily strict on a non-load-bearing field. PR #4 changed the parser to truncate over-long rationale and tagline_rewrite rather than reject the row. Strict enforcement remains for load-bearing fields (industry_primary, oss_posture, confidence, sources).
The lenient extension to ai_capability and tech_stack (filter unknown values, fall back to unclear if all dropped) contributed only modest improvement on its own (~3-4%). The rationale truncation was the bigger win.
Capability distribution:
| Capability | n | share of n=118 |
|---|---|---|
| agents | 68 | 58% |
| nlp-classic | 38 | 32% |
| rag | 34 | 29% |
| data-pipeline | 33 | 28% |
| multimodal | 17 | 14% |
| vision | 17 | 14% |
| inference-infra | 11 | 9% |
| evals-observability | 11 | 9% |
| training-infra | 10 | 8% |
The "W26 is the agentic batch" finding strengthens with more confident data: 58% of high-confidence companies build agents, up from 65% of 83 → 68 / 118. The absolute count is now larger and the cohort is broader.
Industry mix (top of n=118):
| Industry | n |
|---|---|
| B2B SaaS | 28 |
| AI Infrastructure | 14 |
| Developer Tools | 12 |
| Fintech | 12 |
| Healthcare | 6 |
| Robotics | 6 |
| Consumer | 6 |
| Legal | 5 |
OSS posture is meaningfully different now: closed (48), unknown (65), api-only (3), source-available (1), fully-open (1). The unknown plurality remains because the model still lacks website-level evidence — B007 (depth=1 crawl) is the next lever.
- Upstream: 132 of 196 (67.3%) — unchanged
- Tier A+B: 124 of 132 — unchanged
- Tier A+B with high-confidence LLM analysis: 118 of 196 (60.2%) — the most honest "what we actually know about W26" number.
The headline coverage % on the dashboard is unchanged at 63.3% (because that's the data-quality denominator). But for the deck/memo, the 60.2% number is what should be cited as "the share of W26 we can substantively classify."
The v0.1 limitation: the LLM only saw the YC long_description, so OSS posture and tech stack came back as unknown for most companies. PR #11 adds a polite, robots-aware depth=1 website crawl (max 5 pages per company, 30 KB per page, 4-second timeout, ranked by signal-path priority: /pricing, /security, /about, /docs, etc.). Each crawled page is HTML-stripped and PII-sanitized before it ever reaches the LLM.
The 124-company cohort and 95% high-confidence rate carry over (113 high vs. 118 in PR #9, both well above the v0.1 target). What changed is the model's ability to ground its answers in actual evidence.
| Posture | PR #9 (no crawl) | PR #11 (with crawl) | Δ |
|---|---|---|---|
| unknown | 65 (55%) | 24 (21%) | −41 |
| closed | 50 (42%) | 75 (66%) | +25 |
| api-only | 3 | 8 | +5 |
| source-available | 1 | 5 | +4 |
| fully-open | 1 | 1 | – |
OSS-posture unknown rate dropped 55% → 21% — a 62% relative reduction. PR target was <30%. Hit.
The 41 companies that moved out of unknown distributed roughly: most went to closed (model now has evidence — pricing pages, "Request a demo" CTAs, no GitHub link in footer), some to api-only (model spotted "Get an API key" in /docs), and a handful to source-available (the model saw a GitHub footer link with explicit license language).
The headline number is harder to summarize because most YC startups still don't advertise the model provider on marketing pages. But the absolute count of identified tech-stack signals is what matters:
| Stack | PR #9 | PR #11 |
|---|---|---|
| custom-model | 13 | 24 |
| anthropic | 1 | 6 |
| openai | 0 | 3 |
| huggingface | 0 | 2 |
| pytorch | 0 | 2 |
| google-gemini | 0 | 2 |
| qwen | 0 | 1 |
| langchain | 0 | 1 |
| identified (non-unknown) | 14 | 41 |
Tech-stack unknown rate barely moved (64% → 57%) because the homepage of an "AI for legal teams" startup just doesn't mention which model it uses. To push this further would require fetching docs/security pages with depth=2, which we deferred for politeness reasons.
| Capability | PR #9 | PR #11 |
|---|---|---|
| agents | 68 | 69 |
| nlp-classic | 38 | 38 |
| rag | 34 | 32 |
| data-pipeline | 33 | 37 |
| vision | 17 | 26 |
| multimodal | 17 | 22 |
Vision and multimodal both saw real lifts — those are the capabilities the model can spot from product pages with screenshots, GIFs, and demo videos. Marketing surfaces help here.
Slightly higher prompt size from the crawled context pushed exactly 1 row's rationale over the cap (synthetic-sciences). Captured in raw_failures.jsonl. Not worth tightening.
- Coverage of YC official: 63.3% — unchanged (data-quality denominator)
- High-confidence enrichment: 113 / 124 (91%) — was 118/124 (95%) without crawl, slightly down because longer prompts are slightly harder to keep within the rationale cap
- Substantively-classified share of YC W26: 113 / 196 = 57.7% — was 60.2%
Slightly fewer companies make it into the headline cohort, but each of those cohort entries now carries materially more signal — oss_posture and (to a lesser extent) tech_stack are now real values for the majority of rows, instead of unknown masquerading as data.