|
25 | 25 |
|
26 | 26 | ## See it run |
27 | 27 |
|
28 | | -```bash |
29 | | -mhfa earnings-recap NVDA --output ./outputs |
30 | | -# → outputs/NVDA_<YYYYMMDD>_brief.md (markdown) |
31 | | -# → outputs/NVDA_<YYYYMMDD>_chart.png (price chart) |
32 | | -# → outputs/NVDA_<YYYYMMDD>_raw.json (raw tool output, for eval) |
| 28 | +```console |
| 29 | +$ mhfa earnings-recap NVDA --output ./outputs |
| 30 | + |
| 31 | + brief : outputs/NVDA_20260426_brief.md |
| 32 | + chart : outputs/NVDA_20260426_chart.png |
| 33 | + raw : outputs/NVDA_20260426_raw.json |
| 34 | + tools : 5 calls, 0 errors |
33 | 35 | ``` |
34 | 36 |
|
35 | | -<!-- Replace with real screenshots after running locally — see docs/HOW_TO_SCREENSHOT.md --> |
| 37 | +Total wall time on the reference NVDA run: **~2 s tool layer · ~30 s |
| 38 | +synthesis (Gemini 2.5 Pro) · ~30 s factuality eval (Flash)**. Per-tool |
| 39 | +timing from `_metadata.calls` in `outputs/NVDA_20260426_raw.json`: |
| 40 | + |
| 41 | +| Tool | Args | Duration | OK | |
| 42 | +|---|---|---|---| |
| 43 | +| `sec.fetch_latest_10q` | `NVDA` | 0 ms (local cache hit) | ✓ | |
| 44 | +| `sec.fetch_recent_8k` | `NVDA, max=5` | 0 ms (local cache hit) | ✓ | |
| 45 | +| `market.get_quote_history` | `NVDA, 3mo` | 702 ms | ✓ | |
| 46 | +| `market.get_company_info` | `NVDA` | 329 ms | ✓ | |
| 47 | +| `search.web_search` | `"NVDA latest earnings analyst reaction"` | 923 ms | ✓ | |
| 48 | + |
36 | 49 | <table> |
37 | 50 | <tr> |
38 | 51 | <td width="50%" valign="top"> |
39 | 52 |
|
40 | | -**Sample brief output** |
41 | | -([full markdown](docs/assets/demo_brief_nvda.md)) |
| 53 | +**Brief render** ([GitHub preview](docs/assets/demo_brief_nvda.md) · [raw markdown](docs/assets/demo_brief_nvda.md?plain=1)) |
42 | 54 |
|
43 | | -> NVIDIA reported another quarter of explosive growth, with Q3 FY26 revenue |
44 | | -> reaching **$57.01 billion** [10-Q], up **62.5% YoY**. Net income surged to |
45 | | -> **$31.91 billion**, and the company executed **$12.57 billion in share |
46 | | -> repurchases**… |
47 | | -> |
48 | | -> **Key risk**: inventories increased **+96%** since January 2025 to **$19.78 |
49 | | -> billion**, creating risk of write-downs if demand shifts. |
| 55 | + |
50 | 56 |
|
51 | 57 | </td> |
52 | 58 | <td width="50%" valign="top"> |
53 | 59 |
|
54 | | -**Price chart (3-month, auto-generated)** |
| 60 | +**Price chart** (3-month, auto-generated) |
55 | 61 |
|
56 | 62 |  |
57 | 63 |
|
@@ -79,6 +85,45 @@ and verifies each against `raw_data`. See [HOW_IT_WORKS.md → |
79 | 85 | "Empirical results"](docs/HOW_IT_WORKS.md#empirical-results-v01) for |
80 | 86 | a deeper read of what the eval catches and what it doesn't. |
81 | 87 |
|
| 88 | +<details> |
| 89 | +<summary><strong>Sample factuality run output</strong> — NVDA (perfect) and MSFT (one flag)</summary> |
| 90 | + |
| 91 | +```json |
| 92 | +// eval/runs/NVDA_20260426_factuality.json |
| 93 | +{ |
| 94 | + "score": 1.0, |
| 95 | + "total_claims": 17, |
| 96 | + "verified_claims": 17, |
| 97 | + "flagged": [] |
| 98 | +} |
| 99 | +``` |
| 100 | + |
| 101 | +```json |
| 102 | +// eval/runs/MSFT_20260426_factuality.json |
| 103 | +{ |
| 104 | + "score": 0.944, |
| 105 | + "total_claims": 18, |
| 106 | + "verified_claims": 17, |
| 107 | + "flagged": [ |
| 108 | + { |
| 109 | + "claim": "EPS (diluted) +59.8%", |
| 110 | + "verdict": "contradicted", |
| 111 | + "reason": "The source states that Diluted EPS is $5.16 (vs $3.23 YoY), which implies a YoY increase of 59.75%, not 59.8%." |
| 112 | + } |
| 113 | + ] |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +The MSFT flag is a rounding-precision strict-mode hit: the synthesizer |
| 118 | +rounded $5.16 / $3.23 to **+59.8%**; the judge computed **59.75%** and |
| 119 | +called it `contradicted`. Strict-mode signal if you care about exact |
| 120 | +reproducibility, arguably noise otherwise — kept as-is for v0.1 because |
| 121 | +explicit miscalibration is more useful than silent agreement. The full |
| 122 | +flagged-claim taxonomy across all 5 runs is in HOW_IT_WORKS § Empirical |
| 123 | +results. |
| 124 | + |
| 125 | +</details> |
| 126 | + |
82 | 127 | --- |
83 | 128 |
|
84 | 129 | ## Architecture |
|
0 commit comments