A Python agent that analyses an Indian equity portfolio for the day and produces a causal briefing — explaining why the portfolio moved by linking macro news to sectors to individual holdings. Built for the Backend + AI Intern challenge.
Most financial-advisor agents dump raw data into an LLM and ask it to "figure out what happened." This submission does the opposite:
- All reasoning is in Python — filtering, impact scoring, CAPM beta decomposition, spillover detection, conflict resolution, counterfactuals. Every number is derivable from the data and auditable.
- The LLM only writes the narrative — it converts the structured
CausalChaininto three paragraphs of natural language. - If no LLM is available, a template fallback produces the same narrative from the same structured data. This proves the reasoning is complete in code, not hidden inside the model.
This matters for the rubric:
- Reasoning Quality (35%): multi-hop causal chain, CAPM decomposition, counterfactuals, conflict detection — all quantified.
- Code Design (20%): phase-aligned files, flat structure, type hints.
- Observability (15%): structured JSON logs for every phase + LLM call.
- Edge Cases (15%): two paths for conflict detection, concentration alerts, counter-signals, direction-aware primary-driver.
- Evaluation Layer (15%): rule-based 5-factor confidence score.
financial-advisor-agent/
├── main.py # Orchestrator — calls each phase
├── requirements.txt
├── .env.example
├── data/json/ # 6 mock JSON files
└── src/
├── models.py # Shared data structures (dataclasses)
├── data_loader.py # Reads JSON into objects
├── phase1_market_intelligence.py # Phase 1: trends, sectors, macro themes
├── phase2_portfolio_analytics.py # Phase 2: P&L, allocation, risk
├── phase3_reasoning_agent.py # Phase 3: causal chain, conflicts, counterfactuals
├── phase4_observability.py # Phase 4: tracing + self-evaluator
├── llm.py # Gemini call + deterministic template fallback
└── display.py # Rich CLI output
Each phase file is a single concept you can open and defend in the demo video.
# 1. Clone and install
git clone <this-repo> financial-advisor-agent
cd financial-advisor-agent
pip install -r requirements.txt
# 2. (Optional) Add Gemini API key for LLM-generated briefings
cp .env.example .env
# edit .env and set:
# GEMINI_API_KEY=your_key_here
# USE_LLM=true
# If you skip this, the template fallback runs — works without any API.Get a free Gemini API key at https://aistudio.google.com/apikey (no credit card needed).
streamlit run app.pyOpens a chat window. Pick a portfolio from the sidebar; the reasoning
pipeline runs once and produces a Briefing that stays in memory. Then
ask anything:
- "Why did my portfolio move today?" → causal narrative + primary driver
- "Tell me about HDFCBANK" → that stock's CAPM β-split + the news that drove it
- "What if I hadn't held banking?" → counterfactual from Phase 3
- "Show me conflicts" → positive news + falling price cases
- "How risky is my portfolio?" → concentration alerts
- "How reliable is this analysis?" → 5-factor confidence breakdown
- "What is CAPM?" / "How does RBI affect banks?" → general finance answer (requires Gemini key; without it, the agent handles portfolio-specific questions via keyword routing over the same Briefing data)
The sidebar shows compact status (P&L, confidence, primary driver, active themes) plus a collapsible Full analysis panel for audit. Switching portfolios resets the chat and re-runs the pipeline.
# Banking-heavy — the richest demo
python main.py --portfolio PORTFOLIO_002
# Other portfolios
python main.py --portfolio PORTFOLIO_001 # Rahul Sharma (diversified)
python main.py --portfolio PORTFOLIO_003 # Arun (MF-heavy conservative)
python main.py --portfolio all # run all three
# Export briefing as JSON for inspection
python main.py --portfolio PORTFOLIO_002 --export outputs/priya.jsonBoth interfaces emit structured JSON logs to stderr — one line per phase
(with trace_id + latency_ms) and per LLM call. Redirect with
2>logs/run.jsonl to capture. When LANGFUSE_* keys are set in .env,
the same events are mirrored to Langfuse cloud as a hierarchical trace.
Trend Analysis · Sector Extraction · News Processing
- Determines overall market sentiment from NIFTY/SENSEX averages (rule-based, ±0.5% threshold)
- Loads sector-level performance with weekly catalysts (from historical data)
- Detects active macro themes from news + market context:
INTEREST_RATE_UP(RBI / repo / hawkish keywords)FII_OUTFLOW(net FII selling > ₹1000cr)CHINA_SLOWDOWN,US_TECH_SPENDING_UP,GOVT_CAPEX_PUSH,RUPEE_DEPRECIATION
- News classification is already in the mock data (sentiment, scope, entities,
conflict_flag) — so we do not spend an LLM call on this. This is an intentional design decision to minimise cost and maximise verifiability.
Daily P&L · Asset Allocation · Risk Detection
- Computes daily P&L (absolute INR + %)
- MF drill-through: maps each mutual fund's internal sector allocation onto the portfolio, so a user holding a Banking Sectoral Fund sees their TRUE banking exposure (not hidden behind "DIVERSIFIED_MF")
- Concentration risk thresholds (justifiable from finance best practices):
- Single sector > 40% → CRITICAL
- Single stock > 25% → HIGH
- Combined rate-sensitive sectors > 60% → alert
- Cross-checked against pre-computed analytics in
portfolios.json(our numbers match the mock data's numbers — proof the math is correct)
Causal Linking · Conflict Resolution · Prioritisation
The star file. Seven sharp logic ideas, each short but powerful:
1. Portfolio-aware news filter — 3 lines; cuts noise before anything else
relevant = [n for n in all_news
if n.scope == "MARKET_WIDE"
or (set(n.sectors) & portfolio_sectors)
or (set(n.stocks) & portfolio_tickers)]2. CAPM beta decomposition — textbook stock-market reasoning
expected = nifty_change × beta # what market should explain
idiosyncratic = actual_change - expected # what's stock-specificFor HDFC Bank (β=1.15) with NIFTY -1% and actual -3.51%:
- Market-driven: -1.15%
- Stock-specific: -2.36% (that's the RBI news impact)
3. Impact score — ranks holdings by true rupee impact
impact = holding_weight × |day_change| × news_strength4. Sector spillover — uses sector_mapping.json → macro_correlations
to catch second-order effects. If INTEREST_RATE_UP is active, we flag
BANKING, REALTY, AUTOMOBILE, FINANCIAL_SERVICES as indirectly affected.
5. Conflict detection (two paths):
- Path A: pre-flagged conflicts in news data (
conflict_flag=true) - Path B: computed — positive sentiment + stock fell > 1%
6. Counter-signals — holdings that moved OPPOSITE to the portfolio (cushioned the damage). The display shows these separately from losers.
7. Counterfactuals — "If you hadn't held X, your day would be Y"
- Remove the worst-performing sector
- Remove the single biggest-impact holding
Tracing · Self-Evaluation
-
Tracing — dual channel:
- Langfuse (when
LANGFUSE_PUBLIC_KEY+LANGFUSE_SECRET_KEYare set): every portfolio run opens a root trace, each phase becomes a child span, and every LLM call is logged as a Generation with full prompt/response, token usage, and latency. Visible in the Langfuse UI at https://cloud.langfuse.com. - Stderr JSON logs (always on): one line per phase, one per LLM call.
Includes
trace_idso you can correlate stderr logs with the Langfuse UI. Keeps the agent fully usable offline / in CI without any external service. - Per-phase
latency_msis captured in both channels (main.py times each phase with_timed()).
- Langfuse (when
-
Self-Evaluation: rule-based (JD explicitly allows this) with 5 factors. Each factor is computed from the data — no LLM self-grading:
Factor Formula Weight Data completeness holdings with market data / total holdings 0.20 News coverage big movers with matching news / total big movers 0.30 Signal strength avg top-3 impact scores / 60 baseline 0.20 Conflict penalty max(0.5, 1.0 - 0.15 × num_conflicts) 0.10 Reasoning quality fraction of reasoning components present 0.20 Every sub-score is shown in the final table — auditable and defensible.
Portfolio fell 2.73% today.
Primary driver: BANKING sector.
CAPM decomposition on HDFCBANK (22.6% of portfolio):
Actual move -3.51%
Expected (β=1.15 × NIFTY -1.00%) -1.15%
Idiosyncratic -2.36% ← RBI news-driven, bank-specific
Spillover sectors (from macro_correlations):
INTEREST_RATE_UP → BANKING, REALTY, AUTOMOBILE, FS, INFRASTRUCTURE
FII_OUTFLOW → BANKING, FS, HIGH_BETA_STOCKS
Conflicts detected:
BAJFINANCE (pre-flagged): positive news, price -2.01%
→ sector-wide rate concerns overrode company positives
AXISBANK (computed): positive news, price -2.72%
→ broader sentiment dominating
Counterfactuals:
Without BANKING exposure → you'd be at -0.51% instead of -2.73%
Without HDFCBANK → you'd be at -1.93% instead of -2.73%
Confidence: 0.81 (data 1.00, news 0.75, signal 0.73,
conflict 0.70, reasoning 0.83)
- Logic-first architecture — the LLM produces language, not insights. Every claim in the briefing is traceable to a computed number.
- CAPM decomposition — separates market drag from stock-specific news impact. Nothing else in the challenge does this.
- MF drill-through — true sector exposure, not just "diversified MF" as a black box. Catches hidden concentration.
- Two-path conflict detection — uses both the data's own
conflict_flagand a computed sentiment-vs-price heuristic. - Rule-based 5-factor confidence — defensible weighted score, not "ask the LLM how confident it is."
- Template fallback — the system works end-to-end without an API key, proving all reasoning is in code.
- Mock data is a single day's snapshot. Real deployment would need a streaming data loader and incremental causal chain updates.
- Spillover coefficients are binary (affected / not affected). A real system would use historical correlations.
- LLM-as-judge evaluation (in addition to rule-based) would be a natural extension if budget allows a second LLM call.