feat(phase-2): narrative .docx memo + dual-output ycai report by RyanAlberts · Pull Request #15 · RyanAlberts/yc-ai-pulse

RyanAlberts · 2026-05-02T02:27:06Z

What

Phase 2 part 2: 9-section narrative .docx memo via python-docx. Same analytics.py math as the deck, same Layer 2 audit. ycai report now produces both deck.pptx and report.docx by default. With this PR, all of Phase 2 is shipped: depth=1 crawler + ECharts dashboard + PPT + DOCX, all anchored to a single source of chart math.

Sample artifact: examples/output/report-w26-pr15-2026-05-01.docx.

Memo structure

Per USER.md document-format discipline (narrative memos = 2-5 pages with appendices):

Title + dateline
Headline finding — one-paragraph summary, drift-checked
Coverage methodology — Tier A/B/C breakdown, Layer 1+2 disclosure
The agentic batch — capability × industry heatmap + analysis paragraph
Industry distribution
Tech stack and OSS posture — with the unknown caveat made explicit
Six company spotlights — verbatim taglines, classification facts, rationale
What we still cannot answer — three open questions
Reproduce this memo — install + run instructions

Layer 2 audit refinements

The W26 memo's first build surfaced two real edge cases the auditor was right to flag:

"Winter 2026" — the year was being treated as numerical drift. Fixed by extending date-pattern stripping to YC-batch labels and bare 4-digit years flanked by non-digits.
"top three industries account for 53 of 113 companies" — the sum of 53 isn't in any base counter, but it's a legitimate derived total. Fixed by adding derived_sums (top-3, top-5) to extra_allowed so the auditor verifies them rather than rejects them as drift.

Both adjustments tighten correctness, not loosen it: the auditor still catches actually-invented numbers and flags forbidden phrases. The trap-resistance suite from PR #2 still passes unchanged.

Test plan

149 tests passing (4 new for docx)
mypy --strict clean
make publish-check clean
W26 memo captured: 4 chart PNGs embedded, ~47 paragraphs, Layer 2 audit clean

Phase 2 status

PR	What
#11	Depth=1 website crawl (B007) — OSS unknown 55% → 21%
#12	ECharts replaces static CSS bars
#14	VC-style `.pptx` deck + Layer 2 audit
#15	Narrative `.docx` memo + dual-output `ycai report`

Phase 3 (Chrome extension surface) remains on the v1.0 milestone. The infrastructure for it — the ycai daemon command — is already in v0.1.0; the extension UI hasn't been built yet.

What's next

If you want a v0.2.0 release tag, this is the moment. The Phase 2 surface is feature-complete for both deck and memo, the W26 example artifacts are checked in, and the babysitter routine still pauses cleanly on v0.1.0.

🤖 Generated with Claude Code

…udit What ships - src/ycai/reports/docx.py: 9-section narrative memo per USER.md document-format discipline. Title, headline, coverage methodology, the agentic batch (capability heatmap), industry distribution, tech stack + OSS posture, six company spotlights, unanswered questions, reproducibility footer. Same analytics.py math as the deck. Same Layer 2 audit pre-write. - src/ycai/reports/anti_hallucination.py: date-pattern stripping extended to YC-batch labels ('Winter 2026') and bare 4-digit years flanked by non-digits. The drift checker no longer surfaces years as numerical drift. - src/ycai/cli.py: 'ycai report <run-dir>' now produces both deck + memo by default. --deck-only / --memo-only constrain. - pyproject.toml: python-docx>=1.1 dep, mypy override extended to ycai.reports.docx (same untyped-import situation as ppt). Tests: 4 new docx tests (149 total). Validates the .docx is a valid zip with word/document.xml, contains 'coverage' + 'agents' in the body, embeds >=3 chart images, aborts on a forbidden phrase smuggled into a company rationale, builds even with empty quote candidates. Real W26 memo captured at examples/output/report-w26-pr15-2026-05-01.docx. 4 chart PNGs (capability heatmap, industry bar, OSS pie, tech stack bar). ~47 paragraphs. Layer 2 audit clean on the real run. Phase 2 of the project plan is now shipped: depth=1 crawler (PR #11), ECharts dashboard (PR #12), .pptx deck (PR #14), .docx memo (PR #15). Phase 3 (Chrome extension) lives at the v1.0 milestone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request implements narrative .docx memo generation, adding a new report builder in src/ycai/reports/docx.py and updating the CLI to support the format. The implementation includes integration with the anti-hallucination audit system and expanded regex for date stripping. Feedback identifies a risk of hallucination from hardcoded fallbacks in the prose and a bug in the audit allowlist that would cause failures due to missing section markers.

gemini-code-assist · 2026-05-02T02:28:29Z

+        f"capabilities are {capability.most_common(2)[1][0] if len(capability) > 1 else 'rag'} and "
+        f"{capability.most_common(3)[2][0] if len(capability) > 2 else 'data-pipeline'}. The heatmap above "


The fallbacks 'rag' and 'data-pipeline' are hardcoded and will be used if the analysis cohort has fewer than 3 distinct capabilities. This introduces a potential hallucination in the report itself, which contradicts the project's anti-hallucination goals. It is better to dynamically construct this sentence based on the available data.

Suggested change

f"capabilities are {capability.most_common(2)[1][0] if len(capability) > 1 else 'rag'} and "

f"{capability.most_common(3)[2][0] if len(capability) > 2 else 'data-pipeline'}. The heatmap above "

f"capabilities are {', and '.join(name for name, _ in capability.most_common(3)[1:]) if len(capability) > 1 else 'N/A'}. The heatmap above "

gemini-code-assist · 2026-05-02T02:28:29Z

+        sum(c for _, c in capability.most_common(3)),
+        sum(c for _, c in oss_posture.most_common(3)),
+    )
+    infra_facts: tuple[float, ...] = (4.6, 5, 30, 2, 1, *derived_sums)


The infra_facts allowlist contains magic numbers (4.6, 5, 30) that are not present in the .docx prose (unlike the .pptx methodology slide). Conversely, it is missing the number 3, which appears in the 'Unanswered questions' section as a list marker (3.). Since the drift auditor checks numbers >= 2, the absence of 3 in the allowlist will cause the Layer 2 audit to fail if no other data point happens to be exactly 3.

Suggested change

infra_facts: tuple[float, ...] = (4.6, 5, 30, 2, 1, *derived_sums)

infra_facts: tuple[float, ...] = (1, 2, 3, *derived_sums)

Phase 2 release. Cumulative changes since 0.1.0: - PR #11: depth=1 polite website crawl, OSS posture unknown rate 55% -> 21%, identified tech-stack mentions 14 -> 41 - PR #12: Apache ECharts replaces static CSS bars in dashboard - PR #14: VC-style .pptx deck + anti-hallucination Layer 2 - PR #15: narrative .docx memo + dual-output ycai report 149 tests passing. Mypy strict clean. python -m build produces a clean 0.2.0 wheel + sdist. CHANGELOG and README updated to reflect shipped Phase 2 status. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RyanAlberts merged commit 8df8c68 into main May 2, 2026
3 checks passed

RyanAlberts deleted the phase-2-pr15-memo branch May 2, 2026 02:28

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

RyanAlberts mentioned this pull request May 2, 2026

release: v0.2.0 #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(phase-2): narrative .docx memo + dual-output ycai report#15

feat(phase-2): narrative .docx memo + dual-output ycai report#15
RyanAlberts merged 1 commit into
mainfrom
phase-2-pr15-memo

RyanAlberts commented May 2, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

gemini-code-assist Bot May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		f"capabilities are {capability.most_common(2)[1][0] if len(capability) > 1 else 'rag'} and "
		f"{capability.most_common(3)[2][0] if len(capability) > 2 else 'data-pipeline'}. The heatmap above "

	f"capabilities are {capability.most_common(2)[1][0] if len(capability) > 1 else 'rag'} and "
	f"{capability.most_common(3)[2][0] if len(capability) > 2 else 'data-pipeline'}. The heatmap above "
	f"capabilities are {', and '.join(name for name, _ in capability.most_common(3)[1:]) if len(capability) > 1 else 'N/A'}. The heatmap above "

	infra_facts: tuple[float, ...] = (4.6, 5, 30, 2, 1, *derived_sums)
	infra_facts: tuple[float, ...] = (1, 2, 3, *derived_sums)

Conversation

RyanAlberts commented May 2, 2026

What

Memo structure

Layer 2 audit refinements

Test plan

Phase 2 status

What's next

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant