You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+40-12Lines changed: 40 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,44 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
+
_(no changes since 0.1.0)_
11
+
12
+
## [0.1.0] — 2026-05-01
13
+
14
+
First publishable release. End-to-end pipeline that pulls the latest YC batch, classifies it with a Sonnet-class model under strict anti-hallucination guards, and renders a single-file HTML dashboard with row-level drill-downs.
15
+
10
16
### Added
11
-
- Phase 0 bootstrap: MIT license, repo scaffolding, pre-commit + secret-scan, CI workflow, BACKLOG discipline, first ADR.
- Coverage metric is the dashboard headline. The dropped register acknowledges every excluded company and the specific reason — no quiet drops.
14
-
- First end-to-end probe on YC W26: 63.3% coverage of the official 196-company batch. Findings in `docs/QUALITY_REPORT_W26.md`.
15
-
- Phase 1 PR #2: LLM-based enrichment with anti-hallucination Layer 1 — pydantic-enforced output schema, source-URL guard against fabricated citations, two-pass cross-check on uncertain rows, sentinel low-confidence row on any failure. Three backends: `AgentSDKBackend` (subscription-default), `AnthropicAPIBackend` (`--api-key`), `MockBackend` (tests). 10 hallucination-trap fixtures locked in as regression tests.
16
-
- W26 enrichment smoke run (5 companies via subscription, 39s, ~free): 4 high / 1 low confidence. Identified `gru.space` as `no-ai` correctly. Schema-validation failure on `velum-labs` correctly fell through to the sentinel — no fabricated analysis served.
17
-
- Phase 1 PR #3: enriched dashboard. AI capability x industry heatmap, tech-stack distribution, OSS-posture breakdown, and confidence breakdown — all with row-level drill-downs. Cited-URL link-verify hard gate before any artifact ships (override via `--allow-dead-links` writes a `BROKEN_LINKS.md` sidecar and shows a warning banner). Lenient parsing for `industry_secondary` so the model can emit reasonable categories without tanking the row.
18
-
- W26 full-batch enrichment via subscription (124 companies, ~6 min, ~free): 83 high / 41 low confidence. Top finding: **65% of high-confidence W26 companies (54 of 83) build agents**. 8 companies correctly classified as `no-ai` (the trust signal). 3 cited URLs caught dead at publish time and surfaced via the publish gate.
19
-
- Phase 1 PR #4: resilience + parser tightening. Lenient parsing extended to `ai_capability` (drop unknowns, fall back to `unclear`) and `tech_stack` (drop unknowns). `rationale` and `tagline_rewrite` truncate at the schema cap rather than fail the whole row. Raw failure capture (`raw_failures.jsonl`) for audit. Incremental writes to `analyses.jsonl` so partial state survives a crash. New `ycai resume <run-dir>` command resumes interrupted enrichment. New `ycai dashboard <run-dir>` re-renders the dashboard from existing artifacts at zero LLM cost. Live progress shows high/medium/low counts during enrichment.
20
-
- W26 full-batch re-run after PR #4: schema-validation failure rate dropped from 23% to **0%**. High-confidence rate went from 67% to **95%** (118 of 124). The "W26 is the agentic batch" finding strengthened to 58% of n=118. Quality writeup updated at `docs/QUALITY_REPORT_W26.md`.
- MIT license, repo scaffolding, pre-commit + secret-scan + gitleaks + custom Anthropic-key regex, CI workflow, BACKLOG discipline, first two ADRs (yc-oss/api as the only sanctioned source; localhost FastAPI deferred to Phase 3).
20
+
21
+
**Phase 1 — analysis pipeline**
22
+
-**PR #6 — coverage probe**: yc-oss/api scraper with hard-fail when upstream is unreachable (no scraping `ycombinator.com/companies?...` per [robots.txt](docs/decisions/0001-yc-data-source.md)). PII sanitizer (idempotent strip before disk and before any LLM call). Async link verifier. Coverage probe with three tiers (A: full / B: website unreachable / C: missing required field) and a dropped register that names every excluded company. Coverage % is the dashboard headline.
23
+
-**PR #7 — LLM enrichment with anti-hallucination Layer 1**: pydantic-enforced classification schema, three backends (AgentSDK / Anthropic API / Mock), source-URL grounding (the cited URL must come from the company's website or YC profile), two-pass cross-check on medium-confidence rows, sentinel low-confidence row on any failure. 10 hallucination-trap fixtures as regression tests.
24
+
-**PR #8 — enriched dashboard + cited-URL publish gate**: capability×industry heatmap, tech-stack distribution, OSS-posture breakdown, confidence breakdown. Each chart drills down to source rows. Cited URLs are HEAD/GET-verified before publish; `--allow-dead-links` writes a sidecar `BROKEN_LINKS.md` and surfaces a banner.
25
+
-**PR #9 — resilience + parser tightening**: schema-failure rate dropped 23% → 0%. Truncate-not-reject for verbose free-text fields (`rationale`, `tagline_rewrite`). Lenient parsing for `ai_capability` and `tech_stack`. Raw failure capture (`raw_failures.jsonl`). Incremental writes to `analyses.jsonl`. `ycai resume` recovers from interrupted runs. `ycai dashboard` re-renders from existing artifacts at zero LLM cost.
26
+
27
+
**Real W26 results captured under `examples/output/`:**
28
+
- 63.3% coverage of the 196-company batch (132 in upstream, 124 Tier A+B, 8 named drops, 4 Tier B with dead websites)
29
+
- 118 of 124 high-confidence (95%) on the LLM enrichment, 0 schema failures, 0 hallucinated source URLs
30
+
-**Top finding: 58% of high-confidence W26 companies build agents.** "W26 is the agentic batch" is now defensible with row-level evidence.
31
+
32
+
### Backlog status at release
33
+
34
+
| ID | Status | Note |
35
+
|---|---|---|
36
+
| B001 | resolved | yc-oss/api is sole source; ADR 0001 amended in PR #6|
37
+
| B002 | open | Cloudflare cache-headroom check on `yc-oss.github.io/api/*`|
38
+
| B003 | open | Node 20 actions deprecated by 2026-06-02 — bump CI before then |
39
+
| B004 | open | Calibrate `MIN_DESCRIPTION_CHARS` against borderline rows |
40
+
| B005 | open | Name the missing-from-upstream W26 companies, not just count |
See [BACKLOG.md](BACKLOG.md) for the working backlog and `docs/decisions/` for architecture decisions.
37
+
See [CHANGELOG.md](CHANGELOG.md) for what 0.1.0 includes, [BACKLOG.md](BACKLOG.md) for the working backlog, and `docs/decisions/` for architecture decisions.
By default `yc-ai-pulse` uses the [Claude Agent SDK](https://github.com/anthropics/anthropic-sdk-python) against your Claude Max subscription. To pay-per-run instead:
44
+
# Coverage probe only (no LLM cost) — fetches the latest batch and shows
45
+
# what's analyzable. Headline: % of YC batch covered, with the dropped
# Resume an interrupted run (quota wall, crash, network blip):
60
+
ycai resume runs/2026-05-01-XXXXXX
61
+
62
+
# Re-render the dashboard from existing artifacts at zero LLM cost
63
+
# (useful when the dashboard layout changes):
64
+
ycai dashboard runs/2026-05-01-XXXXXX
52
65
```
53
66
54
-
See [`docs/`](docs/) for the full guide.
67
+
A real run on YC W26 is checked in as a working example: see [`examples/output/dashboard-w26-pr4-2026-05-01.html`](examples/output/dashboard-w26-pr4-2026-05-01.html). The full quality writeup is in [`docs/QUALITY_REPORT_W26.md`](docs/QUALITY_REPORT_W26.md).
0 commit comments