feat(phase-1): resilience + parser tightening (95% high-confidence) by RyanAlberts · Pull Request #9 · RyanAlberts/yc-ai-pulse

RyanAlberts · 2026-05-01T21:11:52Z

What

Closes #4. Resolves B008.

PR #4 lifts the high-confidence enrichment rate from 67% to 95% on the same W26 batch. Adds ycai resume (recover from interrupted runs) and ycai dashboard (re-render from existing artifacts at zero LLM cost). Captures raw model failures for audit so future B008-shaped problems surface fast.

Why

PR #3's full-batch run had a 23% schema-validation failure rate. That's too high to ship. The ycai resume and standalone ycai dashboard commands also unblock the v0.1 release plan — without resume, a quota wall mid-run would mean burning the partial work.

Root cause of the 23% (now 0%)

The raw failure logger surfaced it immediately: every "schema-validation-failure" was the model's rationale field exceeding our 400-char Field(max_length=400) cap. The model was being thorough; the schema was being unnecessarily strict on a non-load-bearing field. PR #4 truncates rather than rejects.

The lenient extension to ai_capability and tech_stack (drop unknown values, fall back to unclear if emptied) contributes a smaller win on top.

Strict enforcement preserved for the load-bearing fields: industry_primary, oss_posture, confidence, sources. The contract that numbers come from validated data hasn't changed.

Before/after on real W26 data

metric	PR #3	PR #4	delta
Total analyzed	124	124	–
High confidence	83 (67%)	118 (95%)	+35
Schema failures	29 (23%)	0 (0%)	-29
Genuine model lows	12	6	-6
Hallucinated source URLs	0	0	–

Top finding strengthens: 68 of 118 high-confidence W26 companies build agents (58%). "W26 is the agentic batch" is now defensible on a larger cohort with zero schema noise.

New CLI surfaces

ycai resume <run-dir> — reads analyses.jsonl, identifies missing slugs from coverage.json, runs enrichment only on what's left. Crash-resilient because each completed analysis writes to JSONL immediately.
ycai dashboard <run-dir> — re-renders the HTML from existing artifacts. Re-runs the cited-URL publish gate. Zero LLM cost. Useful for iterating on dashboard layout, or generating partial reports after a quota wall.
Live progress during enrichment shows running high / medium / low counts so you can spot a regression mid-run.
raw_failures.jsonl captures every failed model response (schema/cross-check/hallucinated-URL) for offline audit. Capped at 4000 chars per record.

Test plan

103 tests passing (12 new). Lenient parsing for both new fields, rationale/tagline truncation, raw-failure capture across multiple calls, no-op when no log path provided.
Mypy --strict clean.
make publish-check clean.
W26 full-batch re-run captured under examples/output/dashboard-w26-pr4-2026-05-01.html and analyses-w26-pr4-2026-05-01.json.
ycai resume smoke-tested on the W26 run dir (correctly detected all 124 slugs already enriched).
ycai dashboard smoke-tested on the same run dir (re-rendered without LLM call).

Backlog status

B008 resolved by this PR (rate dropped from 23% to 0%).
B007 (depth=1 website crawl for tech_stack/oss_posture) remains open. The PR PR #4 — Typer CLI, Rich progress, hot-restart resume #4 numbers confirm it's still the next-biggest signal lever — unknown stack/OSS still dominates.

Acceptance

make validate-p0 green
make publish-check green
LLM-path invariants preserved (load-bearing fields stay strict)
Live smoke captured on real W26

🤖 Generated with Claude Code

Closes #4. Resolves B008. What ships - src/ycai/researcher.py: lenient parser extended. * ai_capability + tech_stack now drop unknown enum values rather than fail the row. ai_capability falls back to ['unclear'] if all values dropped. * rationale + tagline_rewrite truncate at the schema cap rather than fail the row. This was the actual cause of the 23% schema- validation failure rate in PR #3 — the model was being thorough, our 400-char cap was too strict on a non-load-bearing field. * Strict enforcement preserved on industry_primary, oss_posture, confidence, sources. - src/ycai/researcher.py: raw_failure_log keyword arg captures raw model responses for any failure (schema, hallucinated URL, cross-check failure) to a JSONL file. Truncates at 4000 chars per record to keep the file small. - src/ycai/cli.py: enrichment writes each completed analysis to analyses.jsonl immediately so a crash or quota wall doesn't lose progress. Live progress shows running high/medium/low counts. - src/ycai/cli.py: new 'ycai resume <run-dir>' command. Reads analyses.jsonl, identifies missing slugs vs. coverage.json, runs enrichment only on what's left. - src/ycai/cli.py: new 'ycai dashboard <run-dir>' command. Re-renders the dashboard from existing artifacts at zero LLM cost. Re-runs the cited-URL publish gate by default. - tests/test_researcher.py: 12 new tests covering lenient parsing for ai_capability/tech_stack, rationale/tagline truncation, raw failure capture, and silent no-op when no log path provided. 103 tests total passing. W26 full re-run results - Total: 124 companies (Tier A+B from PR #1) - High confidence: 118 (95%) — up from 83 (67%) in PR #3 - Schema failures: 0 — down from 29 (23%) - Genuine model lows: 6 — down from 12 - Hallucinated source URLs: 0 (unchanged; guard works) Top finding strengthens: 58% of n=118 high-confidence W26 companies build agents (68 of 118). 'W26 = the agentic batch' is now defensible on a meaningfully larger cohort with zero schema-failure noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request enhances the resilience and accuracy of the enrichment pipeline by implementing lenient parsing, incremental writes, and new CLI commands for resuming runs and re-rendering dashboards. Key improvements include filtering unknown enum values and truncating long text fields to prevent schema validation failures. Feedback suggests addressing a potential file handle leak in the CLI and adding error handling to the raw failure logging to prevent non-critical I/O issues from crashing the process.

gemini-code-assist · 2026-05-01T21:15:00Z

        analyses_path.write_text(json.dumps([a.model_dump(mode="json") for a in analyses], indent=2))
        console.print(f"[green]✓[/green] wrote analyses.json → {analyses_path}")
+        if raw_failure_path.exists():
+            failure_count = sum(1 for _ in raw_failure_path.open())


The current implementation leaves a file handle open until garbage collection because raw_failure_path.open() is called without a context manager. For consistency with other parts of this file (e.g., lines 84 and 144), consider using read_text().splitlines() which handles closing the file automatically.

Suggested change

failure_count = sum(1 for _ in raw_failure_path.open())

failure_count = len(raw_failure_path.read_text().splitlines())

gemini-code-assist · 2026-05-01T21:15:00Z

+    path.parent.mkdir(parents=True, exist_ok=True)
+    with path.open("a") as f:
+        f.write(json.dumps(record) + "\n")


To improve the resilience of the enrichment process, consider wrapping the file I/O in a try-except block. A failure in writing to the audit log (e.g., due to disk space or permission issues) should ideally not crash the entire analysis run, especially since this is a non-critical logging operation.

Suggested change

path.parent.mkdir(parents=True, exist_ok=True)

with path.open("a") as f:

f.write(json.dumps(record) + "\n")

try:

path.parent.mkdir(parents=True, exist_ok=True)

with path.open("a") as f:

f.write(json.dumps(record) + "\n")

except Exception as exc:

log.warning("Failed to write raw failure log for %s: %s", slug, exc)

First publishable release. What ships in v0.1.0 - Phase 0 + Phase 1 of the project plan: scraper, sanitizer, link verifier, coverage probe, LLM enrichment with anti-hallucination Layer 1, enriched dashboard, cited-URL publish gate, resume + re- render commands, raw failure capture. - 103 tests passing, mypy --strict clean, secret-scan clean. - Real W26 results checked in: 63.3% coverage, 95% high-confidence on the LLM enrichment, 0 schema failures, 0 hallucinated source URLs, top finding 'W26 = the agentic batch' on n=118. Mechanics - pyproject.toml: 0.0.1 -> 0.1.0, classifier bumped pre-alpha -> alpha. - src/ycai/__init__.py: __version__ matches. - tests/test_smoke.py: version assertion bumped. - CHANGELOG.md: 0.1.0 release notes synthesizing PR #6-#9. - README.md: status table updated, quickstart documents the actual v0.1 commands (run-coverage / resume / dashboard). - .github/workflows/release.yml: build wheel+sdist on tag push, publish to PyPI via Trusted Publishing (id-token), attach artifacts to GitHub release. Local smoke - python -m build produces yc_ai_pulse-0.1.0-py3-none-any.whl (38KB) and yc_ai_pulse-0.1.0.tar.gz (134KB). - pipx install --force <wheel> succeeds; ycai version returns 0.1.0; ycai run-coverage --batch winter-2026 succeeds end-to-end from a clean /tmp directory. PyPI Trusted Publishing setup (one-time, on PyPI side) - https://pypi.org/manage/project/yc-ai-pulse/settings/publishing/ - Repo: RyanAlberts/yc-ai-pulse - Workflow: release.yml - Environment: pypi - Until configured, the publish job will fail; the GitHub release job still attaches built wheels. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ack (#12) Closes B007. What ships - src/ycai/crawler.py: polite, robots-aware, depth=1 async crawler. Max 5 pages per company, 30 KB per page, 4-second timeout per fetch. Pages ranked by signal-path priority (/pricing, /security, /about, /docs, /open-source, ...). HTML stripped and PII-sanitized before any LLM call. Same-host only — never wanders off-site. - src/ycai/researcher.py: prompt now includes a 'crawled context' section (capped at 6000 chars total) when pages were fetched, and source-URL guard accepts any URL from the crawled set. - src/ycai/cli.py: crawl runs before enrichment by default. --no-crawl opts out. crawl_results.jsonl written to the run directory for audit. - tests/test_crawler.py: 13 new tests (116 total). Robots-disallow enforcement (path-level), content-type filtering (PDF/JSON skipped), max-pages cap, dedupe + fragment stripping, host-restriction (off-site links never fetched), PII redaction round-trip, max-bytes truncation contract. Real W26 lift (same 124-company cohort) - OSS posture 'unknown' rate: 55% -> 21% (target was <30%, hit it) - Tech-stack identified mentions: 14 -> 41 - Vision capability: 17 -> 26 (the model can now spot product GIFs) - Multimodal: 17 -> 22 - Confidence rate: 95% -> 91% (slight drop because longer prompts push slightly more rationales over the cap; absolute high-confidence count went from 118 to 113, still well above target) - Schema failures: 0 -> 1 (synthetic-sciences; captured for audit) Politeness contract (validated by tests, not just intent) - robots.txt fetched per host before any other request - Disallow rules enforced for both root and per-path - Per-host concurrency capped at 2 simultaneous fetches - User-Agent identifies us with a project URL Honest framing in QUALITY_REPORT_W26.md: 'substantively classified share of W26' moves from 60.2% (PR #9) to 57.7% (PR #11), but each of those 113 rows now carries materially more signal -- oss_posture is a real value for 79% of the cohort instead of 45%. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RyanAlberts merged commit bda9a23 into main May 1, 2026
3 checks passed

RyanAlberts deleted the phase-1-pr4-resilience branch May 1, 2026 21:12

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

RyanAlberts mentioned this pull request May 1, 2026

release: v0.1.0 #10

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(phase-1): resilience + parser tightening (95% high-confidence)#9

feat(phase-1): resilience + parser tightening (95% high-confidence)#9
RyanAlberts merged 1 commit into
mainfrom
phase-1-pr4-resilience

RyanAlberts commented May 1, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	failure_count = sum(1 for _ in raw_failure_path.open())
	failure_count = len(raw_failure_path.read_text().splitlines())

Conversation

RyanAlberts commented May 1, 2026

What

Why

Root cause of the 23% (now 0%)

Before/after on real W26 data

New CLI surfaces

Test plan

Backlog status

Acceptance

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant