Skip to content

feat(phase-1): resilience + parser tightening (95% high-confidence)#9

Merged
RyanAlberts merged 1 commit into
mainfrom
phase-1-pr4-resilience
May 1, 2026
Merged

feat(phase-1): resilience + parser tightening (95% high-confidence)#9
RyanAlberts merged 1 commit into
mainfrom
phase-1-pr4-resilience

Conversation

@RyanAlberts
Copy link
Copy Markdown
Owner

What

Closes #4. Resolves B008.

PR #4 lifts the high-confidence enrichment rate from 67% to 95% on the same W26 batch. Adds ycai resume (recover from interrupted runs) and ycai dashboard (re-render from existing artifacts at zero LLM cost). Captures raw model failures for audit so future B008-shaped problems surface fast.

Why

PR #3's full-batch run had a 23% schema-validation failure rate. That's too high to ship. The ycai resume and standalone ycai dashboard commands also unblock the v0.1 release plan — without resume, a quota wall mid-run would mean burning the partial work.

Root cause of the 23% (now 0%)

The raw failure logger surfaced it immediately: every "schema-validation-failure" was the model's rationale field exceeding our 400-char Field(max_length=400) cap. The model was being thorough; the schema was being unnecessarily strict on a non-load-bearing field. PR #4 truncates rather than rejects.

The lenient extension to ai_capability and tech_stack (drop unknown values, fall back to unclear if emptied) contributes a smaller win on top.

Strict enforcement preserved for the load-bearing fields: industry_primary, oss_posture, confidence, sources. The contract that numbers come from validated data hasn't changed.

Before/after on real W26 data

metric PR #3 PR #4 delta
Total analyzed 124 124
High confidence 83 (67%) 118 (95%) +35
Schema failures 29 (23%) 0 (0%) -29
Genuine model lows 12 6 -6
Hallucinated source URLs 0 0

Top finding strengthens: 68 of 118 high-confidence W26 companies build agents (58%). "W26 is the agentic batch" is now defensible on a larger cohort with zero schema noise.

New CLI surfaces

  • ycai resume <run-dir> — reads analyses.jsonl, identifies missing slugs from coverage.json, runs enrichment only on what's left. Crash-resilient because each completed analysis writes to JSONL immediately.
  • ycai dashboard <run-dir> — re-renders the HTML from existing artifacts. Re-runs the cited-URL publish gate. Zero LLM cost. Useful for iterating on dashboard layout, or generating partial reports after a quota wall.
  • Live progress during enrichment shows running high / medium / low counts so you can spot a regression mid-run.
  • raw_failures.jsonl captures every failed model response (schema/cross-check/hallucinated-URL) for offline audit. Capped at 4000 chars per record.

Test plan

  • 103 tests passing (12 new). Lenient parsing for both new fields, rationale/tagline truncation, raw-failure capture across multiple calls, no-op when no log path provided.
  • Mypy --strict clean.
  • make publish-check clean.
  • W26 full-batch re-run captured under examples/output/dashboard-w26-pr4-2026-05-01.html and analyses-w26-pr4-2026-05-01.json.
  • ycai resume smoke-tested on the W26 run dir (correctly detected all 124 slugs already enriched).
  • ycai dashboard smoke-tested on the same run dir (re-rendered without LLM call).

Backlog status

Acceptance

  • make validate-p0 green
  • make publish-check green
  • LLM-path invariants preserved (load-bearing fields stay strict)
  • Live smoke captured on real W26

🤖 Generated with Claude Code

Closes #4. Resolves B008.

What ships
- src/ycai/researcher.py: lenient parser extended.
    * ai_capability + tech_stack now drop unknown enum values rather
      than fail the row. ai_capability falls back to ['unclear'] if all
      values dropped.
    * rationale + tagline_rewrite truncate at the schema cap rather
      than fail the row. This was the actual cause of the 23% schema-
      validation failure rate in PR #3 — the model was being thorough,
      our 400-char cap was too strict on a non-load-bearing field.
    * Strict enforcement preserved on industry_primary, oss_posture,
      confidence, sources.
- src/ycai/researcher.py: raw_failure_log keyword arg captures raw
  model responses for any failure (schema, hallucinated URL,
  cross-check failure) to a JSONL file. Truncates at 4000 chars per
  record to keep the file small.
- src/ycai/cli.py: enrichment writes each completed analysis to
  analyses.jsonl immediately so a crash or quota wall doesn't lose
  progress. Live progress shows running high/medium/low counts.
- src/ycai/cli.py: new 'ycai resume <run-dir>' command. Reads
  analyses.jsonl, identifies missing slugs vs. coverage.json, runs
  enrichment only on what's left.
- src/ycai/cli.py: new 'ycai dashboard <run-dir>' command. Re-renders
  the dashboard from existing artifacts at zero LLM cost. Re-runs the
  cited-URL publish gate by default.
- tests/test_researcher.py: 12 new tests covering lenient parsing for
  ai_capability/tech_stack, rationale/tagline truncation, raw failure
  capture, and silent no-op when no log path provided. 103 tests
  total passing.

W26 full re-run results
- Total: 124 companies (Tier A+B from PR #1)
- High confidence: 118 (95%) — up from 83 (67%) in PR #3
- Schema failures: 0 — down from 29 (23%)
- Genuine model lows: 6 — down from 12
- Hallucinated source URLs: 0 (unchanged; guard works)

Top finding strengthens: 58% of n=118 high-confidence W26 companies
build agents (68 of 118). 'W26 = the agentic batch' is now defensible
on a meaningfully larger cohort with zero schema-failure noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@RyanAlberts RyanAlberts merged commit bda9a23 into main May 1, 2026
3 checks passed
@RyanAlberts RyanAlberts deleted the phase-1-pr4-resilience branch May 1, 2026 21:12
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the resilience and accuracy of the enrichment pipeline by implementing lenient parsing, incremental writes, and new CLI commands for resuming runs and re-rendering dashboards. Key improvements include filtering unknown enum values and truncating long text fields to prevent schema validation failures. Feedback suggests addressing a potential file handle leak in the CLI and adding error handling to the raw failure logging to prevent non-critical I/O issues from crashing the process.

Comment thread src/ycai/cli.py
analyses_path.write_text(json.dumps([a.model_dump(mode="json") for a in analyses], indent=2))
console.print(f"[green]✓[/green] wrote analyses.json → {analyses_path}")
if raw_failure_path.exists():
failure_count = sum(1 for _ in raw_failure_path.open())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation leaves a file handle open until garbage collection because raw_failure_path.open() is called without a context manager. For consistency with other parts of this file (e.g., lines 84 and 144), consider using read_text().splitlines() which handles closing the file automatically.

Suggested change
failure_count = sum(1 for _ in raw_failure_path.open())
failure_count = len(raw_failure_path.read_text().splitlines())

Comment thread src/ycai/researcher.py
Comment on lines +426 to +428
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("a") as f:
f.write(json.dumps(record) + "\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve the resilience of the enrichment process, consider wrapping the file I/O in a try-except block. A failure in writing to the audit log (e.g., due to disk space or permission issues) should ideally not crash the entire analysis run, especially since this is a non-critical logging operation.

Suggested change
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("a") as f:
f.write(json.dumps(record) + "\n")
try:
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("a") as f:
f.write(json.dumps(record) + "\n")
except Exception as exc:
log.warning("Failed to write raw failure log for %s: %s", slug, exc)

@RyanAlberts RyanAlberts mentioned this pull request May 1, 2026
8 tasks
RyanAlberts added a commit that referenced this pull request May 1, 2026
First publishable release.

What ships in v0.1.0
- Phase 0 + Phase 1 of the project plan: scraper, sanitizer, link
  verifier, coverage probe, LLM enrichment with anti-hallucination
  Layer 1, enriched dashboard, cited-URL publish gate, resume + re-
  render commands, raw failure capture.
- 103 tests passing, mypy --strict clean, secret-scan clean.
- Real W26 results checked in: 63.3% coverage, 95% high-confidence
  on the LLM enrichment, 0 schema failures, 0 hallucinated source
  URLs, top finding 'W26 = the agentic batch' on n=118.

Mechanics
- pyproject.toml: 0.0.1 -> 0.1.0, classifier bumped pre-alpha -> alpha.
- src/ycai/__init__.py: __version__ matches.
- tests/test_smoke.py: version assertion bumped.
- CHANGELOG.md: 0.1.0 release notes synthesizing PR #6-#9.
- README.md: status table updated, quickstart documents the actual
  v0.1 commands (run-coverage / resume / dashboard).
- .github/workflows/release.yml: build wheel+sdist on tag push,
  publish to PyPI via Trusted Publishing (id-token), attach
  artifacts to GitHub release.

Local smoke
- python -m build produces yc_ai_pulse-0.1.0-py3-none-any.whl (38KB)
  and yc_ai_pulse-0.1.0.tar.gz (134KB).
- pipx install --force <wheel> succeeds; ycai version returns 0.1.0;
  ycai run-coverage --batch winter-2026 succeeds end-to-end from
  a clean /tmp directory.

PyPI Trusted Publishing setup (one-time, on PyPI side)
- https://pypi.org/manage/project/yc-ai-pulse/settings/publishing/
- Repo: RyanAlberts/yc-ai-pulse
- Workflow: release.yml
- Environment: pypi
- Until configured, the publish job will fail; the GitHub release
  job still attaches built wheels.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RyanAlberts added a commit that referenced this pull request May 1, 2026
…ack (#12)

Closes B007.

What ships
- src/ycai/crawler.py: polite, robots-aware, depth=1 async crawler.
  Max 5 pages per company, 30 KB per page, 4-second timeout per fetch.
  Pages ranked by signal-path priority (/pricing, /security, /about,
  /docs, /open-source, ...). HTML stripped and PII-sanitized before
  any LLM call. Same-host only — never wanders off-site.
- src/ycai/researcher.py: prompt now includes a 'crawled context'
  section (capped at 6000 chars total) when pages were fetched, and
  source-URL guard accepts any URL from the crawled set.
- src/ycai/cli.py: crawl runs before enrichment by default. --no-crawl
  opts out. crawl_results.jsonl written to the run directory for audit.
- tests/test_crawler.py: 13 new tests (116 total). Robots-disallow
  enforcement (path-level), content-type filtering (PDF/JSON skipped),
  max-pages cap, dedupe + fragment stripping, host-restriction
  (off-site links never fetched), PII redaction round-trip, max-bytes
  truncation contract.

Real W26 lift (same 124-company cohort)
- OSS posture 'unknown' rate: 55% -> 21% (target was <30%, hit it)
- Tech-stack identified mentions: 14 -> 41
- Vision capability: 17 -> 26 (the model can now spot product GIFs)
- Multimodal: 17 -> 22
- Confidence rate: 95% -> 91% (slight drop because longer prompts
  push slightly more rationales over the cap; absolute high-confidence
  count went from 118 to 113, still well above target)
- Schema failures: 0 -> 1 (synthetic-sciences; captured for audit)

Politeness contract (validated by tests, not just intent)
- robots.txt fetched per host before any other request
- Disallow rules enforced for both root and per-path
- Per-host concurrency capped at 2 simultaneous fetches
- User-Agent identifies us with a project URL

Honest framing in QUALITY_REPORT_W26.md: 'substantively classified
share of W26' moves from 60.2% (PR #9) to 57.7% (PR #11), but each of
those 113 rows now carries materially more signal -- oss_posture is a
real value for 79% of the cohort instead of 45%.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR #4 — Typer CLI, Rich progress, hot-restart resume

1 participant