feat(phase-1): resilience + parser tightening (95% high-confidence)#9
Conversation
Closes #4. Resolves B008. What ships - src/ycai/researcher.py: lenient parser extended. * ai_capability + tech_stack now drop unknown enum values rather than fail the row. ai_capability falls back to ['unclear'] if all values dropped. * rationale + tagline_rewrite truncate at the schema cap rather than fail the row. This was the actual cause of the 23% schema- validation failure rate in PR #3 — the model was being thorough, our 400-char cap was too strict on a non-load-bearing field. * Strict enforcement preserved on industry_primary, oss_posture, confidence, sources. - src/ycai/researcher.py: raw_failure_log keyword arg captures raw model responses for any failure (schema, hallucinated URL, cross-check failure) to a JSONL file. Truncates at 4000 chars per record to keep the file small. - src/ycai/cli.py: enrichment writes each completed analysis to analyses.jsonl immediately so a crash or quota wall doesn't lose progress. Live progress shows running high/medium/low counts. - src/ycai/cli.py: new 'ycai resume <run-dir>' command. Reads analyses.jsonl, identifies missing slugs vs. coverage.json, runs enrichment only on what's left. - src/ycai/cli.py: new 'ycai dashboard <run-dir>' command. Re-renders the dashboard from existing artifacts at zero LLM cost. Re-runs the cited-URL publish gate by default. - tests/test_researcher.py: 12 new tests covering lenient parsing for ai_capability/tech_stack, rationale/tagline truncation, raw failure capture, and silent no-op when no log path provided. 103 tests total passing. W26 full re-run results - Total: 124 companies (Tier A+B from PR #1) - High confidence: 118 (95%) — up from 83 (67%) in PR #3 - Schema failures: 0 — down from 29 (23%) - Genuine model lows: 6 — down from 12 - Hallucinated source URLs: 0 (unchanged; guard works) Top finding strengthens: 58% of n=118 high-confidence W26 companies build agents (68 of 118). 'W26 = the agentic batch' is now defensible on a meaningfully larger cohort with zero schema-failure noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request enhances the resilience and accuracy of the enrichment pipeline by implementing lenient parsing, incremental writes, and new CLI commands for resuming runs and re-rendering dashboards. Key improvements include filtering unknown enum values and truncating long text fields to prevent schema validation failures. Feedback suggests addressing a potential file handle leak in the CLI and adding error handling to the raw failure logging to prevent non-critical I/O issues from crashing the process.
| analyses_path.write_text(json.dumps([a.model_dump(mode="json") for a in analyses], indent=2)) | ||
| console.print(f"[green]✓[/green] wrote analyses.json → {analyses_path}") | ||
| if raw_failure_path.exists(): | ||
| failure_count = sum(1 for _ in raw_failure_path.open()) |
There was a problem hiding this comment.
The current implementation leaves a file handle open until garbage collection because raw_failure_path.open() is called without a context manager. For consistency with other parts of this file (e.g., lines 84 and 144), consider using read_text().splitlines() which handles closing the file automatically.
| failure_count = sum(1 for _ in raw_failure_path.open()) | |
| failure_count = len(raw_failure_path.read_text().splitlines()) |
| path.parent.mkdir(parents=True, exist_ok=True) | ||
| with path.open("a") as f: | ||
| f.write(json.dumps(record) + "\n") |
There was a problem hiding this comment.
To improve the resilience of the enrichment process, consider wrapping the file I/O in a try-except block. A failure in writing to the audit log (e.g., due to disk space or permission issues) should ideally not crash the entire analysis run, especially since this is a non-critical logging operation.
| path.parent.mkdir(parents=True, exist_ok=True) | |
| with path.open("a") as f: | |
| f.write(json.dumps(record) + "\n") | |
| try: | |
| path.parent.mkdir(parents=True, exist_ok=True) | |
| with path.open("a") as f: | |
| f.write(json.dumps(record) + "\n") | |
| except Exception as exc: | |
| log.warning("Failed to write raw failure log for %s: %s", slug, exc) |
First publishable release. What ships in v0.1.0 - Phase 0 + Phase 1 of the project plan: scraper, sanitizer, link verifier, coverage probe, LLM enrichment with anti-hallucination Layer 1, enriched dashboard, cited-URL publish gate, resume + re- render commands, raw failure capture. - 103 tests passing, mypy --strict clean, secret-scan clean. - Real W26 results checked in: 63.3% coverage, 95% high-confidence on the LLM enrichment, 0 schema failures, 0 hallucinated source URLs, top finding 'W26 = the agentic batch' on n=118. Mechanics - pyproject.toml: 0.0.1 -> 0.1.0, classifier bumped pre-alpha -> alpha. - src/ycai/__init__.py: __version__ matches. - tests/test_smoke.py: version assertion bumped. - CHANGELOG.md: 0.1.0 release notes synthesizing PR #6-#9. - README.md: status table updated, quickstart documents the actual v0.1 commands (run-coverage / resume / dashboard). - .github/workflows/release.yml: build wheel+sdist on tag push, publish to PyPI via Trusted Publishing (id-token), attach artifacts to GitHub release. Local smoke - python -m build produces yc_ai_pulse-0.1.0-py3-none-any.whl (38KB) and yc_ai_pulse-0.1.0.tar.gz (134KB). - pipx install --force <wheel> succeeds; ycai version returns 0.1.0; ycai run-coverage --batch winter-2026 succeeds end-to-end from a clean /tmp directory. PyPI Trusted Publishing setup (one-time, on PyPI side) - https://pypi.org/manage/project/yc-ai-pulse/settings/publishing/ - Repo: RyanAlberts/yc-ai-pulse - Workflow: release.yml - Environment: pypi - Until configured, the publish job will fail; the GitHub release job still attaches built wheels. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ack (#12) Closes B007. What ships - src/ycai/crawler.py: polite, robots-aware, depth=1 async crawler. Max 5 pages per company, 30 KB per page, 4-second timeout per fetch. Pages ranked by signal-path priority (/pricing, /security, /about, /docs, /open-source, ...). HTML stripped and PII-sanitized before any LLM call. Same-host only — never wanders off-site. - src/ycai/researcher.py: prompt now includes a 'crawled context' section (capped at 6000 chars total) when pages were fetched, and source-URL guard accepts any URL from the crawled set. - src/ycai/cli.py: crawl runs before enrichment by default. --no-crawl opts out. crawl_results.jsonl written to the run directory for audit. - tests/test_crawler.py: 13 new tests (116 total). Robots-disallow enforcement (path-level), content-type filtering (PDF/JSON skipped), max-pages cap, dedupe + fragment stripping, host-restriction (off-site links never fetched), PII redaction round-trip, max-bytes truncation contract. Real W26 lift (same 124-company cohort) - OSS posture 'unknown' rate: 55% -> 21% (target was <30%, hit it) - Tech-stack identified mentions: 14 -> 41 - Vision capability: 17 -> 26 (the model can now spot product GIFs) - Multimodal: 17 -> 22 - Confidence rate: 95% -> 91% (slight drop because longer prompts push slightly more rationales over the cap; absolute high-confidence count went from 118 to 113, still well above target) - Schema failures: 0 -> 1 (synthetic-sciences; captured for audit) Politeness contract (validated by tests, not just intent) - robots.txt fetched per host before any other request - Disallow rules enforced for both root and per-path - Per-host concurrency capped at 2 simultaneous fetches - User-Agent identifies us with a project URL Honest framing in QUALITY_REPORT_W26.md: 'substantively classified share of W26' moves from 60.2% (PR #9) to 57.7% (PR #11), but each of those 113 rows now carries materially more signal -- oss_posture is a real value for 79% of the cohort instead of 45%. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What
Closes #4. Resolves B008.
PR #4 lifts the high-confidence enrichment rate from 67% to 95% on the same W26 batch. Adds
ycai resume(recover from interrupted runs) andycai dashboard(re-render from existing artifacts at zero LLM cost). Captures raw model failures for audit so future B008-shaped problems surface fast.Why
PR #3's full-batch run had a 23% schema-validation failure rate. That's too high to ship. The
ycai resumeand standaloneycai dashboardcommands also unblock the v0.1 release plan — without resume, a quota wall mid-run would mean burning the partial work.Root cause of the 23% (now 0%)
The raw failure logger surfaced it immediately: every "schema-validation-failure" was the model's
rationalefield exceeding our 400-charField(max_length=400)cap. The model was being thorough; the schema was being unnecessarily strict on a non-load-bearing field. PR #4 truncates rather than rejects.The lenient extension to
ai_capabilityandtech_stack(drop unknown values, fall back tounclearif emptied) contributes a smaller win on top.Strict enforcement preserved for the load-bearing fields:
industry_primary,oss_posture,confidence,sources. The contract that numbers come from validated data hasn't changed.Before/after on real W26 data
Top finding strengthens: 68 of 118 high-confidence W26 companies build agents (58%). "W26 is the agentic batch" is now defensible on a larger cohort with zero schema noise.
New CLI surfaces
ycai resume <run-dir>— readsanalyses.jsonl, identifies missing slugs fromcoverage.json, runs enrichment only on what's left. Crash-resilient because each completed analysis writes to JSONL immediately.ycai dashboard <run-dir>— re-renders the HTML from existing artifacts. Re-runs the cited-URL publish gate. Zero LLM cost. Useful for iterating on dashboard layout, or generating partial reports after a quota wall.high / medium / lowcounts so you can spot a regression mid-run.raw_failures.jsonlcaptures every failed model response (schema/cross-check/hallucinated-URL) for offline audit. Capped at 4000 chars per record.Test plan
--strictclean.make publish-checkclean.examples/output/dashboard-w26-pr4-2026-05-01.htmlandanalyses-w26-pr4-2026-05-01.json.ycai resumesmoke-tested on the W26 run dir (correctly detected all 124 slugs already enriched).ycai dashboardsmoke-tested on the same run dir (re-rendered without LLM call).Backlog status
unknownstack/OSS still dominates.Acceptance
make validate-p0greenmake publish-checkgreen🤖 Generated with Claude Code