Skip to content

Commit ca022de

Browse files
srx7703claude
andcommitted
docs(readme): real brief render + per-tool timing + sample factuality JSON
- "See it run" replaces the synthetic blockquote with a real GitHub-rendered screenshot of docs/assets/demo_brief_nvda.md (right column unchanged — matplotlib chart). Visual proof that the produced brief looks like an analyst report, not a code dump. - Adds a per-tool timing table pulled from the pinned outputs/NVDA_20260426_raw.json _metadata.calls — total tool layer ~2 s, full pipeline ~60 s on the reference run. - New collapsible "Sample factuality run output" block under the baseline table, showing both the perfect NVDA JSON and the MSFT one-flag JSON (the rounding-precision case — useful for explaining what the judge actually fires on without forcing readers to open per-run files). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 6d2e86b commit ca022de

2 files changed

Lines changed: 61 additions & 16 deletions

File tree

README.md

Lines changed: 61 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,33 +25,39 @@
2525

2626
## See it run
2727

28-
```bash
29-
mhfa earnings-recap NVDA --output ./outputs
30-
# → outputs/NVDA_<YYYYMMDD>_brief.md (markdown)
31-
# → outputs/NVDA_<YYYYMMDD>_chart.png (price chart)
32-
# → outputs/NVDA_<YYYYMMDD>_raw.json (raw tool output, for eval)
28+
```console
29+
$ mhfa earnings-recap NVDA --output ./outputs
30+
31+
brief : outputs/NVDA_20260426_brief.md
32+
chart : outputs/NVDA_20260426_chart.png
33+
raw : outputs/NVDA_20260426_raw.json
34+
tools : 5 calls, 0 errors
3335
```
3436

35-
<!-- Replace with real screenshots after running locally — see docs/HOW_TO_SCREENSHOT.md -->
37+
Total wall time on the reference NVDA run: **~2 s tool layer · ~30 s
38+
synthesis (Gemini 2.5 Pro) · ~30 s factuality eval (Flash)**. Per-tool
39+
timing from `_metadata.calls` in `outputs/NVDA_20260426_raw.json`:
40+
41+
| Tool | Args | Duration | OK |
42+
|---|---|---|---|
43+
| `sec.fetch_latest_10q` | `NVDA` | 0 ms (local cache hit) ||
44+
| `sec.fetch_recent_8k` | `NVDA, max=5` | 0 ms (local cache hit) ||
45+
| `market.get_quote_history` | `NVDA, 3mo` | 702 ms ||
46+
| `market.get_company_info` | `NVDA` | 329 ms ||
47+
| `search.web_search` | `"NVDA latest earnings analyst reaction"` | 923 ms ||
48+
3649
<table>
3750
<tr>
3851
<td width="50%" valign="top">
3952

40-
**Sample brief output**
41-
([full markdown](docs/assets/demo_brief_nvda.md))
53+
**Brief render** ([GitHub preview](docs/assets/demo_brief_nvda.md) · [raw markdown](docs/assets/demo_brief_nvda.md?plain=1))
4254

43-
> NVIDIA reported another quarter of explosive growth, with Q3 FY26 revenue
44-
> reaching **$57.01 billion** [10-Q], up **62.5% YoY**. Net income surged to
45-
> **$31.91 billion**, and the company executed **$12.57 billion in share
46-
> repurchases**
47-
>
48-
> **Key risk**: inventories increased **+96%** since January 2025 to **$19.78
49-
> billion**, creating risk of write-downs if demand shifts.
55+
![NVDA brief render](docs/assets/demo_brief_render_nvda.png)
5056

5157
</td>
5258
<td width="50%" valign="top">
5359

54-
**Price chart (3-month, auto-generated)**
60+
**Price chart** (3-month, auto-generated)
5561

5662
![NVDA 3-month chart](docs/assets/demo_chart_nvda.png)
5763

@@ -79,6 +85,45 @@ and verifies each against `raw_data`. See [HOW_IT_WORKS.md →
7985
"Empirical results"](docs/HOW_IT_WORKS.md#empirical-results-v01) for
8086
a deeper read of what the eval catches and what it doesn't.
8187

88+
<details>
89+
<summary><strong>Sample factuality run output</strong> — NVDA (perfect) and MSFT (one flag)</summary>
90+
91+
```json
92+
// eval/runs/NVDA_20260426_factuality.json
93+
{
94+
"score": 1.0,
95+
"total_claims": 17,
96+
"verified_claims": 17,
97+
"flagged": []
98+
}
99+
```
100+
101+
```json
102+
// eval/runs/MSFT_20260426_factuality.json
103+
{
104+
"score": 0.944,
105+
"total_claims": 18,
106+
"verified_claims": 17,
107+
"flagged": [
108+
{
109+
"claim": "EPS (diluted) +59.8%",
110+
"verdict": "contradicted",
111+
"reason": "The source states that Diluted EPS is $5.16 (vs $3.23 YoY), which implies a YoY increase of 59.75%, not 59.8%."
112+
}
113+
]
114+
}
115+
```
116+
117+
The MSFT flag is a rounding-precision strict-mode hit: the synthesizer
118+
rounded $5.16 / $3.23 to **+59.8%**; the judge computed **59.75%** and
119+
called it `contradicted`. Strict-mode signal if you care about exact
120+
reproducibility, arguably noise otherwise — kept as-is for v0.1 because
121+
explicit miscalibration is more useful than silent agreement. The full
122+
flagged-claim taxonomy across all 5 runs is in HOW_IT_WORKS § Empirical
123+
results.
124+
125+
</details>
126+
82127
---
83128

84129
## Architecture
734 KB
Loading

0 commit comments

Comments
 (0)