Skip to content

feat(phase-1): LLM enrichment with anti-hallucination Layer 1#7

Merged
RyanAlberts merged 2 commits into
mainfrom
phase-1-pr2-researcher
May 1, 2026
Merged

feat(phase-1): LLM enrichment with anti-hallucination Layer 1#7
RyanAlberts merged 2 commits into
mainfrom
phase-1-pr2-researcher

Conversation

@RyanAlberts
Copy link
Copy Markdown
Owner

What

Phase 1 PR #2: LLM-driven classification of YC companies on top of PR #1's coverage probe. Three backends (Agent SDK default for subscription, Anthropic API for pay-per-token, Mock for tests) all return validated CompanyAnalysis objects with full Layer 1 anti-hallucination enforcement.

Closes #2.

Why

PR #1 surfaces which companies we can analyze. PR #2 actually classifies them — industry, AI capability, tech stack, OSS posture. The classification has to be cheap, schema-strict, and resistant to model fabrication; otherwise the eventual VC report is no better than vibes.

How

Module Role
schemas.CompanyAnalysis Pydantic source of truth. Closed-set enums for every classified field. sources field has min_length=1 — pydantic rejects empty.
classifier.map_industry Deterministic prefill from yc-oss tags → Industry enum, no LLM.
researcher.Backend Abstract protocol. Three implementations.
researcher.analyze The pipeline. Schema enforcement, source-URL grounding, two-pass cross-check, sentinel on failure.
cli.run_coverage --enrich Opt-in flag. --enrich-limit for smoke runs. --api-key overrides subscription mode. Rich progress UI.

Anti-hallucination Layer 1 — what's guaranteed

  1. Schema-enforced output. Pydantic CompanyAnalysis rejects any model response missing required fields, with empty sources, or with invalid enum values.
  2. Source-URL grounding. _validate_sources rejects citations that don't start with company.website or company.url. Fabricated URLs downgrade the row to confidence=low.
  3. Two-pass cross-check. confidence=medium rows trigger a second independent classification pass. Disagreement on industry_primary or oss_posture downgrades to low. Agreement upgrades to high.
  4. Sentinel low-confidence row on any failure. Parse error, validation error, source rejection, network timeout — all produce a clearly-labeled sentinel that survives in the CSV but is excluded from charts. No silent drops.
  5. PII stripped before prompt. strip_pii runs over name / one_liner / long_description before they reach the model.

Live smoke run on real W26 data

Ran ycai run-coverage --enrich --enrich-limit 5 via subscription. 5 companies enriched in 39 seconds, ~free.

Slug Confidence AI capability Notable
bidflow high rag, nlp-classic, agents Real Estate / Construction
travo high rag, data-pipeline, agents Real Estate / Construction
gru.space high no-ai Industrials — model correctly identified this is not an AI company
autumn-ai high rag, data-pipeline, nlp-classic, agents B2B SaaS
velum-labs low (sentinel) unclear Schema validation failed — pipeline correctly fell through to the sentinel rather than serving an unverified analysis

Every cited source URL came from the inputs (website or YC profile). Zero fabricated citations.

Sample output checked in: examples/output/analyses-w26-smoke-2026-05-01.json.

Test plan

  • 78 tests (40 new). All deterministic — no real LLM calls in CI.
  • Trap-resistance suite: every fixture in hallucination_traps.json lands at confidence=low when the backend returns nothing OR when it returns a fabricated source URL.
  • PII redaction-in-prompt verified by capturing the prompt sent to the backend.
  • API key no-leakage test asserts --api-key value never appears in repr, public attrs, or any string representation of the backend.
  • make publish-check green.
  • Mypy --strict green.
  • Live smoke run on subscription documented above.

Backlog spawned

  • B006: track schema-validation failure rate during enrichment. Smoke run had 1/5 (20%) parse failures — likely rationale field exceeded the 400-char limit. Measure across the full batch and tune.
  • B007: tech-stack and OSS posture come back as unknown for most companies because the model only sees the YC long_description. A depth=1 website crawl before classification would dramatically improve signal density.

Acceptance

  • make validate-p0 green
  • make publish-check green
  • Pre-commit hooks pass
  • LLM-path invariants preserved (the five listed above)
  • Live smoke run captured

🤖 Generated with Claude Code

Closes #2.

What ships
- src/ycai/schemas.py: CompanyAnalysis pydantic model. Industry,
  AICapability, TechStack, OSSPosture closed-set enums. CrossCheckResult
  for the two-pass logic.
- src/ycai/classifier.py: deterministic prefilling. Maps yc-oss tag soup
  to the Industry enum without an LLM, reducing the surface area where
  the model can hallucinate.
- src/ycai/researcher.py: the analyze() pipeline plus three Backend
  implementations:
    * AgentSDKBackend (default) — claude-agent-sdk against Claude Max
      subscription, no API cost.
    * AnthropicAPIBackend — pay-per-token via --api-key or
      ANTHROPIC_API_KEY. Key never logged or written to disk.
    * MockBackend — deterministic test backend.
- src/ycai/cli.py: --enrich flag opts companies through the enrichment
  pipeline. --enrich-limit caps for smoke runs. Rich progress UI.
- tests/fixtures/hallucination_traps.json: 10 synthetic companies
  designed to bait the classifier (misleading names, suggestive but
  vague descriptions, source-URL bait, acronym confusion, etc.).
- tests/test_classifier.py + tests/test_researcher.py: 78 tests total
  (40 new). Schema enforcement, source-URL guard, two-pass cross-check,
  trap-resistance, PII redaction-in-prompt verification, API-key
  no-leakage verification.

Anti-hallucination Layer 1 invariants
1. Pydantic schema rejects empty sources (min_length=1 on the field).
2. Source URLs must originate from the company website or YC profile
   URL — fabricated citations downgrade the row to low confidence.
3. confidence=medium triggers a second independent pass; disagreement
   on industry_primary or oss_posture downgrades to low.
4. Any failure returns a sentinel low-confidence row that survives in
   the CSV but is excluded from charts. No silent drops.
5. PII is stripped from the prompt before the backend sees it
   (defense-in-depth even though yc-oss/api fields are public).

Live smoke run on 5 W26 companies via subscription
- 4 high / 1 low confidence in 39 seconds, ~free on subscription
- gru.space correctly identified as no-ai (the model is willing to
  say a YC company is not an AI company)
- velum-labs correctly fell through to low-confidence sentinel when
  schema validation failed (likely rationale exceeded 400 char limit
  - tracked as B006)
- All cited sources came from inputs (website + YC profile only)
- Captured at examples/output/analyses-w26-smoke-2026-05-01.json

New backlog
- B006: track schema-validation failure rate, tune prompt if >5%
- B007: add depth=1 website crawl before LLM call to recover tech
  stack and OSS posture (currently come back as 'unknown' for most
  companies because the YC long_description doesn't mention them)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Phase 1 PR #2, introducing LLM-based enrichment for company data with a focus on anti-hallucination. It adds a researcher module supporting multiple backends (Claude Agent SDK, Anthropic API, and Mock), a deterministic industry classifier, and updated CLI functionality with an --enrich flag. Feedback focuses on hardening the anti-hallucination guards, specifically addressing a prefix-based URL validation vulnerability and potential crashes from malformed LLM responses. Other suggestions include improving logging for backend failures, replacing magic numbers with constants, and aligning Pydantic schema constraints with the LLM prompt instructions.

Comment thread src/ycai/researcher.py
Comment on lines +265 to +268
def _looks_like_input_url(url: str, company: RawCompany) -> bool:
"""Defense against hallucinated source URLs — a source must come from the inputs."""
allowed = [company.website, company.url]
return any(url.startswith(allowed_url.rstrip("/")) for allowed_url in allowed if allowed_url)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The current implementation of _looks_like_input_url is vulnerable to prefix-based hallucinations. For example, if the allowed website is https://company.com, a source URL like https://company.com.attacker.com would be accepted because it starts with the allowed prefix. This bypasses the anti-hallucination grounding layer.

def _looks_like_input_url(url: str, company: RawCompany) -> bool:
    """Defense against hallucinated source URLs — a source must come from the inputs."""
    allowed = [company.website, company.url]
    for base in allowed:
        if not base:
            continue
        base = base.rstrip("/")
        # Ensure the URL is exactly the base or a sub-path/query/fragment
        if url == base or any(url.startswith(base + c) for c in ("/", "?", "#")):
            return True
    return False

Comment thread src/ycai/researcher.py
Comment on lines +215 to +218
payload = json.loads(match.group(0))
except json.JSONDecodeError:
return None
payload.setdefault("slug", slug) # the model sometimes drops the slug
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The json.loads call can return types other than a dictionary (e.g., a list or a string if the model outputs valid JSON of those types). Calling .setdefault() on a non-dict object will raise an AttributeError, causing the pipeline to crash instead of falling back to the sentinel row.

    try:
        payload = json.loads(match.group(0))
    except json.JSONDecodeError:
        return None
    if not isinstance(payload, dict):
        return None
    payload.setdefault("slug", slug)  # the model sometimes drops the slug

Comment thread src/ycai/researcher.py
messages=[{"role": "user", "content": prompt}],
)
except Exception as exc:
log.warning("AnthropicAPIBackend failure: %s", exc.__class__.__name__)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Logging only the exception class name (exc.__class__.__name__) makes debugging difficult when a backend fails. It is better to include the full exception message to understand why the request failed (e.g., authentication error vs. rate limit).

Suggested change
log.warning("AnthropicAPIBackend failure: %s", exc.__class__.__name__)
log.warning("AnthropicAPIBackend failure: %s", exc)

Comment thread src/ycai/cli.py
model: str,
) -> list[CompanyAnalysis]:
"""Drive the enrichment pipeline with a Rich progress bar."""
semaphore = asyncio.Semaphore(8) # respect subscription rate limits
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The semaphore limit of 8 is a magic number. It should be defined as a constant (e.g., MAX_CONCURRENT_ENRICHMENTS) at the module level to improve maintainability and make it easier to tune based on different backend rate limits.

Comment thread src/ycai/schemas.py
Comment on lines +206 to +207
oss_posture: OSSPosture
oss_evidence_url: HttpUrl | None = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The prompt in researcher.py specifies that oss_evidence_url is mandatory if the posture is not 'closed', 'api-only', or 'unknown'. However, this constraint is not enforced in the Pydantic model. Adding a model validator would ensure that the Layer 1 anti-hallucination layer correctly rejects responses that violate this rule.

CI mypy job didn't have either SDK installed because they were never
listed in pyproject.toml deps — they were just available locally.
Adding them as runtime deps so CI installs them via pip install -e
.[dev].

Both are small wheels and the package needs at least one of them at
runtime (anthropic for the API path, claude-agent-sdk for the
subscription path), so requiring them is correct, not bloat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@RyanAlberts RyanAlberts merged commit 47e40be into main May 1, 2026
3 checks passed
@RyanAlberts RyanAlberts deleted the phase-1-pr2-researcher branch May 1, 2026 19:25
@RyanAlberts RyanAlberts mentioned this pull request May 1, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PR #2 — classifier + researcher with anti-hallucination Layer 1

1 participant