Skip to content

Commit 8df8c68

Browse files
RyanAlbertsclaude
andauthored
feat(phase-2): narrative .docx memo with shared analytics + Layer 2 audit (#15)
What ships - src/ycai/reports/docx.py: 9-section narrative memo per USER.md document-format discipline. Title, headline, coverage methodology, the agentic batch (capability heatmap), industry distribution, tech stack + OSS posture, six company spotlights, unanswered questions, reproducibility footer. Same analytics.py math as the deck. Same Layer 2 audit pre-write. - src/ycai/reports/anti_hallucination.py: date-pattern stripping extended to YC-batch labels ('Winter 2026') and bare 4-digit years flanked by non-digits. The drift checker no longer surfaces years as numerical drift. - src/ycai/cli.py: 'ycai report <run-dir>' now produces both deck + memo by default. --deck-only / --memo-only constrain. - pyproject.toml: python-docx>=1.1 dep, mypy override extended to ycai.reports.docx (same untyped-import situation as ppt). Tests: 4 new docx tests (149 total). Validates the .docx is a valid zip with word/document.xml, contains 'coverage' + 'agents' in the body, embeds >=3 chart images, aborts on a forbidden phrase smuggled into a company rationale, builds even with empty quote candidates. Real W26 memo captured at examples/output/report-w26-pr15-2026-05-01.docx. 4 chart PNGs (capability heatmap, industry bar, OSS pie, tech stack bar). ~47 paragraphs. Layer 2 audit clean on the real run. Phase 2 of the project plan is now shipped: depth=1 crawler (PR #11), ECharts dashboard (PR #12), .pptx deck (PR #14), .docx memo (PR #15). Phase 3 (Chrome extension) lives at the v1.0 milestone. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 21463eb commit 8df8c68

9 files changed

Lines changed: 384 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1717

1818
### Phase 2 — reports
1919
- **PR #14 — VC-style `.pptx` deck with anti-hallucination Layer 2.** New `src/ycai/analytics.py` is the single source of chart math, consumed by both the dashboard (ECharts JSON) and the deck (matplotlib PNG). New `src/ycai/reports/ppt.py` builds a 16-slide deck (cream/orange palette, sans/serif typography). Each chart is a matplotlib PNG anchored to the same Counter the dashboard used. `ycai report <run-dir>` produces `deck.pptx` from existing artifacts at zero LLM cost. New `src/ycai/reports/anti_hallucination.py`: forbidden-phrase scan + numerical-drift check + date-pattern stripping. Two prose streams audited separately — aggregate commentary gets full drift check, per-company taglines/rationales get forbidden-phrase only (Layer 1 already gated their source URLs). 24 new tests (145 total).
20+
- **PR #15 — narrative `.docx` memo.** New `src/ycai/reports/docx.py` builds a 9-section narrative memo per USER.md document-format discipline: title, headline, coverage methodology, the agentic batch (capability heatmap), industry distribution, tech stack + OSS posture, six company spotlights, unanswered questions, reproducibility. Same `analytics.py` math as the deck, same Layer 2 audit pre-write. Date-pattern stripping extended to YC-batch labels ("Winter 2026") and bare 4-digit years. `ycai report <run-dir>` now produces both `deck.pptx` and `report.docx`; `--deck-only` / `--memo-only` to constrain. 4 new tests (149 total).
2021

2122
## [0.1.0] — 2026-05-01
2223

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ A Chrome extension wraps two flows: *(a)* analyze the whole batch, *(b)* deep-di
3131
|---|---|---|
3232
| 0 | Repo bootstrap, secrets hygiene, CI | ✅ shipped |
3333
| 1 | CLI + dashboard with anti-hallucination Layer 1 |**v0.1.0** |
34-
| 2 | Depth=1 crawler + ECharts dashboard + `.pptx` / `.docx` reports | 🟡 in progress |
34+
| 2 | Depth=1 crawler + ECharts dashboard + `.pptx` deck + `.docx` memo | ✅ shipped |
3535
| 3 | Chrome extension | ⬜ planned |
3636

3737
See [CHANGELOG.md](CHANGELOG.md) for what 0.1.0 includes, [BACKLOG.md](BACKLOG.md) for the working backlog, and `docs/decisions/` for architecture decisions.

examples/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ Sanitized sample artifacts. Every commit goes through `make publish-check` so PI
44

55
| File | What |
66
|---|---|
7-
| [`output/deck-w26-pr14-2026-05-01.pptx`](output/deck-w26-pr14-2026-05-01.pptx) | **PR #14 VC-style deck — current best.** 16 slides, a16z-feel palette, matplotlib chart PNGs anchored to the same data the dashboard uses. Anti-hallucination Layer 2 ran before write (no forbidden phrases, no numerical drift). |
7+
| [`output/deck-w26-pr14-2026-05-01.pptx`](output/deck-w26-pr14-2026-05-01.pptx) | **PR #14 VC-style deck.** 16 slides, a16z-feel palette, matplotlib chart PNGs anchored to the same data the dashboard uses. Anti-hallucination Layer 2 ran before write. |
8+
| [`output/report-w26-pr15-2026-05-01.docx`](output/report-w26-pr15-2026-05-01.docx) | **PR #15 narrative memo.** 9 sections, ~47 paragraphs, 4 embedded chart PNGs. Headline finding, coverage methodology, capability heatmap with analysis, industry distribution, tech-stack/OSS-posture caveat, six company spotlights, unanswered questions. Layer 2 audit clean. |
89
| [`output/dashboard-w26-pr12-2026-05-01.html`](output/dashboard-w26-pr12-2026-05-01.html) | **PR #12 dashboard — current best HTML.** Same W26 data, ECharts canvases (real heatmap, pies, bars). |
910
| [`output/dashboard-w26-pr11-2026-05-01.html`](output/dashboard-w26-pr11-2026-05-01.html) | PR #11 dashboard with the depth=1 crawl but static CSS bars. Useful for comparing visual fidelity vs. PR #12. |
1011
| [`output/analyses-w26-pr11-2026-05-01.json`](output/analyses-w26-pr11-2026-05-01.json) | Source data for both PR #11 and PR #12 dashboards. 113/124 high-confidence. |
260 KB
Binary file not shown.

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ dependencies = [
2424
"anthropic>=0.40",
2525
"claude-agent-sdk>=0.1",
2626
"python-pptx>=1.0",
27+
"python-docx>=1.1",
2728
"matplotlib>=3.7",
2829
]
2930

@@ -86,14 +87,14 @@ files = ["src/ycai"]
8687
# Third-party libraries without published type stubs. We pin to the public
8788
# constructor surface and otherwise treat their return types as Any.
8889
[[tool.mypy.overrides]]
89-
module = ["pptx.*", "matplotlib.*"]
90+
module = ["pptx.*", "matplotlib.*", "docx.*"]
9091
ignore_missing_imports = true
9192

9293
# The deck builder is a thin wrapper over python-pptx whose API surface is
9394
# untyped (Presentation is a factory function, not a class). Keep strict
9495
# mode for everything else; relax just this module.
9596
[[tool.mypy.overrides]]
96-
module = "ycai.reports.ppt"
97+
module = ["ycai.reports.ppt", "ycai.reports.docx"]
9798
disallow_untyped_calls = false
9899
disallow_untyped_defs = false
99100
warn_return_any = false

src/ycai/cli.py

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,8 @@ def dashboard_cmd(
118118
@app.command("report")
119119
def report_cmd(
120120
run_dir: Path = typer.Argument(..., help="Run directory with coverage.json + analyses.json(l)."),
121-
deck_only: bool = typer.Option(False, "--deck-only", help="Skip the .docx memo (Phase 2 PR #15)."),
121+
deck_only: bool = typer.Option(False, "--deck-only", help="Skip the .docx memo."),
122+
memo_only: bool = typer.Option(False, "--memo-only", help="Skip the .pptx deck."),
122123
) -> None:
123124
"""Generate the .pptx deck (and .docx memo when shipped) from existing artifacts.
124125
@@ -149,23 +150,36 @@ def report_cmd(
149150
console.print(f"[red]✗ no analyses found in {run_dir}. Run with --enrich first.[/red]")
150151
raise typer.Exit(2)
151152

153+
from ycai.reports.docx import build_memo
152154
from ycai.reports.ppt import Layer2Failure, build_deck
153155

154-
deck_path = run_dir / "deck.pptx"
155-
console.print("[cyan]→[/cyan] building deck.pptx (Layer 2 audit before write)…")
156-
try:
157-
build_deck(coverage, companies, analyses, output_path=deck_path)
158-
except Layer2Failure as exc:
159-
console.print(f"[red]✗ Layer 2 audit failed:[/red] {exc}")
160-
for hit in exc.forbidden[:5]:
161-
console.print(f" [red]forbidden phrase '{hit.phrase}':[/red] {hit.excerpt}")
162-
for drift in exc.drifts[:5]:
163-
console.print(f" [red]numerical drift '{drift.number}':[/red] {drift.excerpt}")
164-
raise typer.Exit(5) from exc
165-
console.print(f"[green]✓[/green] wrote {deck_path}")
156+
if not memo_only:
157+
deck_path = run_dir / "deck.pptx"
158+
console.print("[cyan]→[/cyan] building deck.pptx (Layer 2 audit before write)…")
159+
try:
160+
build_deck(coverage, companies, analyses, output_path=deck_path)
161+
except Layer2Failure as exc:
162+
console.print(f"[red]✗ deck Layer 2 audit failed:[/red] {exc}")
163+
for hit in exc.forbidden[:5]:
164+
console.print(f" [red]forbidden phrase '{hit.phrase}':[/red] {hit.excerpt}")
165+
for drift in exc.drifts[:5]:
166+
console.print(f" [red]numerical drift '{drift.number}':[/red] {drift.excerpt}")
167+
raise typer.Exit(5) from exc
168+
console.print(f"[green]✓[/green] wrote {deck_path}")
166169

167170
if not deck_only:
168-
console.print("[yellow]⚠ .docx memo lands in PR #15.[/yellow]")
171+
memo_path = run_dir / "report.docx"
172+
console.print("[cyan]→[/cyan] building report.docx (Layer 2 audit before write)…")
173+
try:
174+
build_memo(coverage, companies, analyses, output_path=memo_path)
175+
except Layer2Failure as exc:
176+
console.print(f"[red]✗ memo Layer 2 audit failed:[/red] {exc}")
177+
for hit in exc.forbidden[:5]:
178+
console.print(f" [red]forbidden phrase '{hit.phrase}':[/red] {hit.excerpt}")
179+
for drift in exc.drifts[:5]:
180+
console.print(f" [red]numerical drift '{drift.number}':[/red] {drift.excerpt}")
181+
raise typer.Exit(5) from exc
182+
console.print(f"[green]✓[/green] wrote {memo_path}")
169183

170184

171185
@app.command("resume")

src/ycai/reports/anti_hallucination.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@
5050
re.compile(r"\b\d{4}-\d{2}\b"), # YYYY-MM
5151
re.compile(r"\b(?:in|since|as of) (?:19|20)\d{2}\b", re.IGNORECASE),
5252
re.compile(r"\b(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},?\s+(?:19|20)\d{2}\b"),
53+
# YC-batch labels like "Winter 2026" or "Summer 2025"
54+
re.compile(r"\b(?:Winter|Spring|Summer|Fall)\s+(?:19|20)\d{2}\b"),
55+
# Bare 4-digit year that looks like a year, with at least one non-digit on each side
56+
re.compile(r"(?<![\d.-])(?:19|20)\d{2}(?![\d.-])"),
5357
)
5458

5559

0 commit comments

Comments
 (0)