Multi-Horizon Financial Agent

Tool-using research agent that turns a single ticker into a citation-grounded, chart-illustrated quarterly brief in under two minutes.

Sister project of multi-horizon-financial-llm — the Gemma LoRA adapter trained there will plug in as a swappable synthesis backend in Phase 2 (A/B vs Claude Opus, paired-t methodology continuous with the sister repo's eval).

Status — v0.1 (MVP): earnings_recap end-to-end works on tickers indexed in the sister repo (69 S&P 500). Factuality baseline pinned at 1.00 (17/17 claims verified) on the NVDA reference run — see eval/runs/. Anthropic is wired into the synthesizer_b slot for the Phase 2 A/B harness; v0.1 ships with Gemini 2.5 because Vertex's new-project quota algorithm rejected six Anthropic quota requests in a row (decision D-009).

See it run

$ mhfa earnings-recap NVDA --output ./outputs

  brief : outputs/NVDA_20260426_brief.md
  chart : outputs/NVDA_20260426_chart.png
  raw   : outputs/NVDA_20260426_raw.json
  tools : 5 calls, 0 errors

Total wall time on the reference NVDA run: ~2 s tool layer · ~30 s synthesis (Gemini 2.5 Pro) · ~30 s factuality eval (Flash). Per-tool timing from _metadata.calls in outputs/NVDA_20260426_raw.json:

Tool	Args	Duration	OK
`sec.fetch_latest_10q`	`NVDA`	0 ms (local cache hit)	✓
`sec.fetch_recent_8k`	`NVDA, max=5`	0 ms (local cache hit)	✓
`market.get_quote_history`	`NVDA, 3mo`	702 ms	✓
`market.get_company_info`	`NVDA`	329 ms	✓
`search.web_search`	`"NVDA latest earnings analyst reaction"`	923 ms	✓

Brief render (GitHub preview · raw markdown)

Price chart (3-month, auto-generated)

The brief is structured (exec summary → financial highlights table → price action + chart → recent catalysts → key risks → sources). Every numeric claim is required to cite a source already in the raw tool output (the synthesizer's hard rule), and the factuality eval verifies that rule held.

Factuality baseline (pinned)

Ticker	Score	Verified	Source
NVDA	1.00	17 / 17	`eval/runs/NVDA_20260426_factuality.json`
AAPL	0.958	23 / 24	`eval/runs/AAPL_20260426_factuality.json`
MSFT	0.944	17 / 18	`eval/runs/MSFT_20260426_factuality.json`
META	1.00	11 / 11	`eval/runs/META_20260426_factuality.json`
JPM	0.955	21 / 22	`eval/runs/JPM_20260426_factuality.json`

LLM-as-judge (Gemini 2.5 Flash) extracts every factual claim from the brief and verifies each against raw_data. See HOW_IT_WORKS.md → "Empirical results" for a deeper read of what the eval catches and what it doesn't.

Sample factuality run output — NVDA (perfect) and MSFT (one flag)

// eval/runs/NVDA_20260426_factuality.json
{
  "score": 1.0,
  "total_claims": 17,
  "verified_claims": 17,
  "flagged": []
}

// eval/runs/MSFT_20260426_factuality.json
{
  "score": 0.944,
  "total_claims": 18,
  "verified_claims": 17,
  "flagged": [
    {
      "claim": "EPS (diluted) +59.8%",
      "verdict": "contradicted",
      "reason": "The source states that Diluted EPS is $5.16 (vs $3.23 YoY), which implies a YoY increase of 59.75%, not 59.8%."
    }
  ]
}

The MSFT flag is a rounding-precision strict-mode hit: the synthesizer rounded $5.16 / $3.23 to +59.8%; the judge computed 59.75% and called it contradicted. Strict-mode signal if you care about exact reproducibility, arguably noise otherwise — kept as-is for v0.1 because explicit miscalibration is more useful than silent agreement. The full flagged-claim taxonomy across all 5 runs is in HOW_IT_WORKS § Empirical results.

Architecture

                    User: "earnings-recap NVDA"
                              │
                              ▼
                       ┌──────────────┐
                       │   Planner    │  Gemini 2.5 Flash → ordered ToolCall list
                       └──────┬───────┘
                              ▼
                       ┌──────────────┐
                       │   Executor   │  sequential, fail-soft per tool
                       └──────┬───────┘
                              ▼
        ┌──────────┬──────────┼──────────┬──────────┐
        ▼          ▼          ▼          ▼          ▼
    ┌───────┐ ┌───────┐  ┌────────┐ ┌────────┐ ┌────────┐
    │SEC 10Q│ │SEC 8-K│  │yfinance│ │company │ │ Tavily │
    └───────┘ └───────┘  └────────┘ │  info  │ │ search │
                                    └────────┘ └────────┘
                              │
                              ▼  raw evidence (JSON)
                       ┌──────────────┐
                       │ Synthesizer  │  Gemini 2.5 Pro → markdown brief
                       └──────┬───────┘     (synthesizer_b: Opus 4.7, Phase 2)
                              ▼
                       ┌──────────────┐
                       │  Factuality  │  Gemini 2.5 Flash → verified/total
                       │     judge    │  (thinking_budget: 0, JSON mode)
                       └──────────────┘

Layered model routing (configs/models.yaml) — Gemini 2.5 Flash for the planner / mid-loop summaries / judge (cost-and-latency tier), Gemini 2.5 Pro for the user-facing synthesizer (quality tier). One file controls every model choice; cost/quality ablations are config changes, not code changes. Phase 2 turns this into a three-way A/B: Gemini 2.5 Pro vs Claude Opus 4.7 vs Multi-Horizon Gemma adapter through the synthesizer_b / synthesizer_c slots.

Tool layer is plain Python in v0.1. MCP wrapping is a Phase 2 task — the function signatures are deliberately MCP-shaped (single dict in, single dict out) so the wrapping is mechanical.

Quickstart

git clone https://github.com/srx7703/multi-horizon-financial-agent
cd multi-horizon-financial-agent
pip install -e ".[dev]"            # or: uv sync

cp .env.example .env
$EDITOR .env                       # fill TAVILY_API_KEY, GCP_PROJECT_ID, MHFA_LOCAL_SEC_DIR

mhfa earnings-recap NVDA

Required env vars

Var	Why
`GCP_PROJECT_ID` + `VERTEX_REGION`	Required for Gemini on Vertex. `us-central1` is the default and has Gemini 2.5 GA. `us-east5` is the Anthropic region (used only when a role's `provider: anthropic` — see D-009)
`ANTHROPIC_BACKEND`	Only matters when a role's `provider: anthropic` (v0.1: just `synthesizer_b`, a Phase 2 slot). `vertex` (default) routes through GCP; `direct` uses `ANTHROPIC_API_KEY`
`TAVILY_API_KEY`	Free tier 1k q/mo. Skip with `MHFA_SEARCH_PROVIDER=mock` for tests
`SEC_USER_AGENT`	SEC blocks requests without contact-info UA — use real name + email
`MHFA_LOCAL_SEC_DIR`	Path to a dir holding `summaries/`, `summaries_10q/`, `summaries_8k/` (the sister repo's data dump)

How a brief is produced

Plan (agent/planner.py) — for earnings_recap the plan is fixed: latest 10-Q + recent 8-Ks + 3-month price + company info + web hits.
Execute (agent/executor.py) — calls each tool, captures per-tool timing, never crashes on a single tool failure.
Chart — yfinance close prices → matplotlib PNG.
Synthesize (agent/synthesizer.py) — Gemini 2.5 Pro gets raw JSON + a hard prompt that requires every numeric claim to trace to a source and forbids estimation (the system rule writes "not disclosed" instead of guessing). Output is markdown with inline citations.
(Optional) Eval (eval/factuality.py) — Gemini 2.5 Flash-as-judge runs two passes: extract every factual claim from the brief, then verify each against raw_data. Both passes use Gemini's JSON mode + thinking_budget: 0 so the judge stays cheap and parseable. Score = verified / total.

Eval

Metric	Method	v0.1 baseline
Factuality	Two-pass Gemini Flash (extract claims → verify each against raw_data)	1.00 (17/17) on `eval/runs/NVDA_20260426_factuality.json`
Comprehensiveness	Rubric (0–3) over 5 axes	Phase 2
BERTScore F1 vs golden	RoBERTa-large, paired t-test	Phase 2 — same metric as sister repo for narrative continuity

Hand-curated golden briefs live in eval/golden/. They are written from raw sources by hand, never LLM-generated (decision D-006 — avoids evaluator/generator collapse).

Layout

src/mhfa/
├── tools/             SEC, market data, web search — pluggable
├── agent/             planner + executor + synthesizer
├── models/            client.py — provider-agnostic complete_text adapter
│                      (dispatches Gemini ↔ Anthropic per role)
├── workflows/         earnings_recap (more in Phase 2)
├── eval/              factuality (v0.1) + ab_harness (Phase 2)
└── cli.py             entry point: `mhfa earnings-recap <TICKER>`
configs/models.yaml    role → {provider, model, max_tokens, temperature, …}
eval/golden/           hand-curated reference briefs (D-006)
eval/runs/             pinned eval results (factuality, BERTScore in Phase 2)
tests/                 hermetic — fake completion adapter + fake SEC dir

Roadmap

See ROADMAP.md for full phasing. Headline:

v0.1 (this release) — earnings_recap end-to-end, factuality eval, 3+ golden briefs, CI green.
v0.5 — Multi-Horizon Gemma adapter integration, true A/B with paired-t, ma_drilldown + sector_compare workflows.
v1.0 — Streamlit UI, Docker self-host, watchlist cron, cost-aware routing, observability dashboard, public release.

License

MIT — see LICENSE.

Sister repo

multi-horizon-financial-llm is the fine-tuning + RAG side: 69 S&P 500 tickers × 381 SEC filings, two PEFT LoRA adapters (Gemma 2 27B and Gemma 4 31B) trained on TPU v6e-8. HF Hub: Srx7703/gemma-{2-27b,4-31b}-financial-adapter. The two repos cross-reference; the agent here will A/B those adapters against Opus in Phase 2.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
eval		eval
outputs		outputs
scripts		scripts
src/mhfa		src/mhfa
tests		tests
.env.example		.env.example
.gitignore		.gitignore
DECISIONS.md		DECISIONS.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SPRINT_PLAN.md		SPRINT_PLAN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Horizon Financial Agent

See it run

Factuality baseline (pinned)

Architecture

Quickstart

Required env vars

How a brief is produced

Eval

Layout

Further reading

Roadmap

License

Sister repo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multi-Horizon Financial Agent

See it run

Factuality baseline (pinned)

Architecture

Quickstart

Required env vars

How a brief is produced

Eval

Layout

Further reading

Roadmap

License

Sister repo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages