Skip to content

finos-labs/finsight-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

badge-labs

FinSight Agent: Metacognitive Multi-Agent Financial Analysis System

An advanced multi-agent system that performs comprehensive financial analysis of earnings calls using metacognitive reasoning and specialized expert agents.


📑 Table of Contents


Overview

FinSight Agent is a production-ready multi-agent system that analyzes earnings call transcripts using metacognitive self-awareness and LLM-as-Judge patterns for runtime quality control:

  • Metacognitive Self-Model: System maintains awareness of its own capabilities, limitations, and confidence thresholds
  • LLM-as-Judge: Each agent self-scores confidence in findings; automated guardrail enforcement at runtime
  • 4 Specialized Agents: Coordinator, Sentiment Analysis, Event Detection, and Volatility Prediction
  • External Tool Validation: Tavily news search, SEC EDGAR filings, and yfinance market data
  • Confidence-Based Guardrails: Automated quality control with 60-70% minimum confidence thresholds
  • Individual Reports: Separate markdown outputs for each agent plus comprehensive final report with transparency
  • Transparent Uncertainty: All outputs include confidence scores, limitations, and guardrail check results

Key Features

Metacognitive Self-Model - System maintains self-awareness of capabilities, limitations, and boundaries
LLM-as-Judge Integration - Each agent self-scores confidence; runtime guardrail enforcement
Multi-Agent Orchestration - LangGraph-based workflow with metacognitive coordinator
Tool-Enhanced Validation - Each agent uses external APIs to verify findings
Confidence Threshold Guardrails - Automated quality control with 65-70% minimum thresholds
Transparent Uncertainty - All outputs include confidence levels and limitations
Structured Outputs - Pydantic models ensure type safety and validation
Individual Reports - 4 separate markdown reports per analysis
Configurable - Customizable questions, thresholds, and model parameters
Production-Ready - Proper error handling, logging, and CLI interface


Quick Start

1. Install

cd /Users/vincent/Code/finsight-agent
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

2. Configure

cp .env.example .env
# Edit .env with your API keys:
# - SCW_DEFAULT_PROJECT_ID
# - SCW_SECRET_KEY
# - TAVILY_API_KEY

3. Test Setup

python test_setup.py

4. Run Analysis

python -m src.main \
  --transcript data/input/Alphabet_2025_Q1_Earnings_Call_complete_transcript.txt \
  --ticker GOOGL

5. View Reports

ls -l data/output/

Installation

Prerequisites

  • Python 3.11+
  • Internet connection (for API calls)
  • ~500 MB disk space

Setup Steps

  1. Navigate to project directory:

    cd /Users/vincent/Code/finsight-agent
  2. Create virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # macOS/Linux
    # OR
    .venv\Scripts\activate     # Windows
  3. Install dependencies:

    pip install -r requirements.txt

    Or using UV (recommended):

    uv pip install -e .
  4. Verify installation:

    python test_setup.py

Configuration

Required Environment Variables

Create a .env file in the project root:

cp .env.example .env

Add your API keys:

# Scaleway GenAI API
SCW_DEFAULT_PROJECT_ID=your_scaleway_project_id_here
SCW_SECRET_KEY=your_scaleway_secret_key_here

# Tavily Search API
TAVILY_API_KEY=your_tavily_api_key_here

Getting API Keys

Scaleway GenAI:

  1. Go to https://console.scaleway.com/
  2. Navigate to AI/ML → Generative APIs
  3. Create project and get credentials

Tavily Search:

  1. Go to https://tavily.com/
  2. Sign up and get API key

Model Configuration

The system uses Scaleway GenAI with these default parameters:

Model: qwen3-235b-a22b-instruct-2507
Temperature: 0.0 (deterministic)
Max Tokens: 2048

To modify, edit src/config.py:

@dataclass
class ScalewayConfig:
    model: str = "qwen3-235b-a22b-instruct-2507"
    temperature: float = 0.0
    max_tokens: int = 2048

Usage

Web Interface (Streamlit)

Launch the interactive web application:

streamlit run app.py

The Streamlit interface provides a professional dashboard that displays:

  • Sentiment analysis with confidence metrics and news validation
  • Event detection with SEC filing verification
  • Volatility predictions with market data
  • Comprehensive final report with metacognitive analysis and guardrails
  • Tabbed interface separating analysis outputs from audit/validation data

Note: Currently uses a stubbed API that reads pre-generated reports from data/output/ for demonstration. To integrate with live backend, replace API stubs in api.py with calls to src.orchestrator.FinSightOrchestrator.


Command-Line Interface

Basic usage:

python -m src.main --transcript <PATH> --ticker <SYMBOL>

With custom query:

python -m src.main \
  --transcript data/input/transcript.txt \
  --ticker GOOGL \
  --query "Focus on AI initiatives and revenue impact"

With custom output directory:

python -m src.main \
  --transcript data/input/transcript.txt \
  --ticker AAPL \
  --output ./my_reports

CLI Options

Required:
  -t, --transcript PATH    Path to earnings call transcript (.txt format)
  -s, --ticker SYMBOL      Company ticker symbol (e.g., GOOGL, AAPL)

Optional:
  -q, --query TEXT         Custom analysis query
  -o, --output DIR         Output directory for reports (default: data/output)

Programmatic Usage

from src.orchestrator import FinSightOrchestrator

# Initialize orchestrator
orchestrator = FinSightOrchestrator()

# Run analysis
result = orchestrator.run_analysis(
    transcript_path="data/input/transcript.txt",
    ticker="GOOGL",
    user_query="Comprehensive financial analysis"  # Optional
)

# Access individual results
sentiment = result['sentiment_result']
print(f"Sentiment: {sentiment.overall_sentiment}")
print(f"Score: {sentiment.sentiment_score}")
print(f"Confidence: {sentiment.confidence}")

events = result['event_detection_result']
print(f"Events detected: {events.total_events_found}")

volatility = result['volatility_result']
print(f"Predicted volatility: {volatility.predicted_volatility}")

# Access final report
final_report = result['final_report']

Demo Mode

Run an interactive demonstration:

python demo.py

This will:

  • Check prerequisites
  • Run a complete analysis on sample data
  • Display results summary
  • Show generated reports

Architecture

Understanding the Metacognitive Execution Flow

FinSight Metacognitive Flow Diagram

What you are looking at is the cognitive architecture of the FinSight Agent. In traditional RAG or agent systems, the flow is often linear: Input → Retrieve → Generate. But in high-stakes financial domains, that isn't enough. We need safety, accuracy, and self-correction.

This diagram illustrates our Metacognitive Execution Flow. It's not just about doing the task; it's about monitoring the task while it's happening. Let's walk through the journey of a user query as it navigates our Agentic Execution Loop, our Dynamic Guardrails, and our LLM-as-Judge evaluation system.

Phase 1: Orchestration & Context

It starts here on the left. The User Query enters the system and hits our Orchestration Agent.

The Orchestrator: Think of this as the project manager. It doesn't do the heavy lifting; it plans. It decomposes the query into actionable sub-tasks.

The Self Model (Gray Box): Crucially, the orchestrator consults the 'Self Model.' This is the agent's memory and context awareness. It asks: "What do I know about this user? What is the current market context? Am I hallucinating stale data?" This prevents the agent from starting off with bad assumptions.

Phase 2: The Agentic Execution Loop

Once the plan is set, we enter the Blue Box: the Agentic Execution Loop. This is where the work gets done by our specialist sub-agents. We don't rely on one generalist model; we delegate to specialists:

  • Sentiment Agent: Scrapes news and social signals to gauge market feeling.
  • Event Agent: Looks for hard quantitative data—earnings calls, fed rate changes, or specific filings.
  • Volatility Agent: Analyzes risk metrics and generates VIX-related insights.

These agents work in parallel or sequence depending on the Orchestrator's plan, synthesizing raw data into a financial narrative.

Phase 3: The Metacognitive Monitor

Now, this is the differentiator. Most agents would just take that data and write a response. FinSight does not.

It passes the data through the Metacognitive Monitor (The Red Box). This is a real-time supervisor sitting inside the execution loop. It performs a Dynamic Guardrail Check:

  • Compliance & Safety: Is this financial advice? (If so, flag it).
  • Data Integrity: Do the numbers from the Volatility Agent match the trend from the Sentiment Agent?
  • Hallucination Check: Does the insight exist in the retrieved documents?

The Decision Diamond:

  • 'No' (Pass): If no violation is found, we follow the Green line out to draft the response.
  • 'Yes' (Violation): If a violation is found, we trigger a Corrective Signal (The Red arrow). We don't crash; we self-correct immediately before generating a draft.

Phase 4: Draft & Judge

Assuming we pass the guardrails, we generate a Draft Response. But we still don't show it to the user. We send it to the LLM-as-Judge. This is a separate, highly capable model prompt-engineered strictly for evaluation. It scores the response based on custom scoring metrics:

  • Accuracy
  • Tone (Professional/Financial)
  • Completeness

The Evaluation Diamond: This is the final gatekeeper.

  • Score < Threshold (Red Path): If the response is mediocre or unsafe, the Judge rejects it. It sends feedback all the way back to the Orchestration Agent or the Correction Strategy module. The system literally 'retries' the thought process with new instructions.
  • Score Acceptable (Green Path): Only when the score meets our high-quality threshold do we move to the final step.

Conclusion: The Feedback Loop

This Red Feedback Loop is what makes the agent 'Metacognitive.' It thinks about its thinking, learns from its own attempts within the session. It adjusts its plan, maybe asks the Sentiment Agent to look deeper, or asks the Volatility Agent to check a different timeframe.

Ultimately, the user receives the Final Response Delivery (Green Box). To them, it looks like a single instant answer. But under the hood, FinSight has planned, executed, monitored, fact-checked, judged, and potentially re-done the work to ensure the financial insight is actionable and safe.

Also, as part of the Execution Flow process, we collected the entire trajectory and output of actions / evaluations during execution, which can serve for audit / traceability purposes and improve the system design in future iterations.

Data Flow Summary

  1. User Input → Transcript path + ticker symbol
  2. Orchestrator → Metacognitive planning with self-model consultation
  3. Load Transcript → Reads and validates file
  4. Agentic Execution Loop → Specialist agents process in parallel:
    • Sentiment Agent → Tavily validation → Self-evaluation
    • Event Agent → SEC EDGAR validation → Self-evaluation
    • Volatility Agent → yfinance validation → Self-evaluation
  5. Metacognitive Monitor → Dynamic guardrail checks during execution
  6. Draft Generation → Synthesize findings with transparency
  7. LLM-as-Judge → Quality scoring and threshold enforcement
  8. Feedback Loop → Corrective actions if quality < threshold
  9. Final Output → 4 markdown reports saved to data/output/ with audit trail

Metacognitive Self-Model & LLM-as-Judge

Overview

FinSight implements runtime metacognition and LLM-as-Judge patterns as core guardrail mechanisms. The system maintains self-awareness of its capabilities, limitations, and confidence levels throughout execution.

Self-Model Architecture

The system maintains a comprehensive self-model (FinSightSelfModel) that includes:

FinSightSelfModel:
  ├── Mission: System's core purpose and goals
  ├── Agent Capabilities: What each agent can/cannot do
  │   ├── Capabilities list
  │   ├── Limitations list
  │   └── Confidence threshold (guardrail)
  ├── Operating Boundaries: Hard limits and ethical constraints
  ├── Active Guardrails: Runtime checks and validations
  └── Guardrail Violations: Tracked incidents

LLM-as-Judge: Confidence Scoring

Each agent implements self-evaluation by scoring its own findings:

1. Coordinator Self-Assessment

MetacognitiveDecision:
  ├── user_intent: Understood goal
  ├── analysis_plan: Step-by-step approach
  ├── agents_to_invoke: Selected specialists
  ├── confidence: Self-scored (0-1)  ← LLM judges itself
  └── reasoning: Explanation of plan

2. Agent Self-Scoring During Execution

Each specialized agent scores its confidence:

Agent Self-Scores Threshold Guardrail Action
Sentiment Sentiment confidence (0-1) ≥65% Flag if below threshold
Event Detection Event confidence (0-1) ≥70% Flag if below threshold
Volatility Prediction confidence (0-1) ≥60% Flag if below threshold

3. Runtime Guardrail Enforcement

┌─────────────────────────────────────────────────┐
│  Agent completes analysis                       │
│  ↓                                              │
│  Agent scores own confidence (LLM-as-Judge)     │
│  ↓                                              │
│  Compare: confidence >= threshold?              │
│  ├─ YES → ✓ Pass guardrail check                │
│  └─ NO  → ⚠ Record guardrail violation          │
│     ↓                                           │
│     Add to guardrails_applied list              │
│     ↓                                           │
│     Include warning in final report             │
└─────────────────────────────────────────────────┘

Metacognitive Flow Diagram

┌──────────────────────────────────────────────────────────┐
│                     SYSTEM START                         │
│              Load FinSightSelfModel                      │
│   (Mission, Capabilities, Boundaries, Thresholds)        │
└───────────────────────┬──────────────────────────────────┘
                        ↓
┌──────────────────────────────────────────────────────────┐
│               COORDINATOR (Metacognitive)                │
│  1. Analyze user query                                   │
│  2. Plan analysis approach                               │
│  3. Self-score: confidence in understanding              │
│  4. Check: confidence meets expectations?                │
│     └─ Record decision reasoning                         │
└───────────────────────┬──────────────────────────────────┘
                        ↓
┌──────────────────────────────────────────────────────────┐
│                  FOR EACH AGENT:                         │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 1. Execute Analysis (with external tool validation)│  │
│  │    ├─ Sentiment: Analyze + Tavily news             │  │
│  │    ├─ Events: Detect + SEC EDGAR filings           │  │
│  │    └─ Volatility: Predict + yfinance data          │  │
│  └────────────────────┬───────────────────────────────┘  │
│                       ↓                                  │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 2. LLM-as-Judge: Self-Score Confidence             │  │
│  │    - Agent evaluates own findings                  │  │
│  │    - Produces confidence score (0-1)               │  │
│  │    - Example: "I am 72% confident in this result"  │  │
│  └────────────────────┬───────────────────────────────┘  │
│                       ↓                                  │
│  ┌────────────────────────────────────────────────────┐  │
│  │ 3. Guardrail Check (Runtime)                       │  │
│  │    Compare: agent_confidence >= threshold?         │  │
│  │    ├─ Sentiment: >= 65%?                           │  │
│  │    ├─ Events: >= 70%?                              │  │
│  │    └─ Volatility: >= 60%?                          │  │
│  └────────────────────┬───────────────────────────────┘  │
│                       ↓                                  │
│               ┌──────────────────┐                       │
│               │ Below threshold? │                       │
│               └────┬─────────┬───┘                       │
│                YES ↓         ↓ NO                        │
│     ┌──────────────────┐   ✓ Continue                    │
│     │ Record Violation │                                 │
│     │ - Timestamp      │                                 │
│     │ - Agent name     │                                 │
│     │ - Guardrail type │                                 │
│     │ - Description    │                                 │
│     │ - Action taken   │                                 │
│     └──────────────────┘                                 │
└───────────────────────┬──────────────────────────────────┘
                        ↓
┌──────────────────────────────────────────────────────────┐
│              SYNTHESIZE FINAL REPORT                     │
│  1. Combine all agent findings                           │
│  2. Include metacognitive analysis                       │
│  3. Display confidence summary table:                    │
│     ┌────────────┬────────────┬───────────┬─────────┐    │
│     │ Agent      │ Confidence │ Threshold │ Status  │    │
│     ├────────────┼────────────┼───────────┼─────────┤    │
│     │ Sentiment  │ 72%        │ 65%       │ ✓ Pass  │    │
│     │ Events     │ 68%        │ 70%       │ ⚠ Low   │    │
│     │ Volatility │ 75%        │ 60%       │ ✓ Pass  │    │
│     └────────────┴────────────┴───────────┴─────────┘    │
│  4. Report guardrail violations (if any)                 │
│  5. Include system boundaries and disclaimers            │
└──────────────────────────────────────────────────────────┘

Key Benefits

Transparent Uncertainty - Every output includes confidence scores
Runtime Validation - Agents self-assess before committing results
Automated Quality Control - Threshold guardrails catch low-confidence outputs
Audit Trail - All self-assessments and guardrail checks are logged
Metacognitive Reasoning - Coordinator explains its decision-making process

Code Implementation

The self-model and confidence checks are defined in src/models.py:

class FinSightSelfModel(BaseModel):
    """Metacognitive self-model for runtime guardrails."""
    system_name: str = "FinSight Agent"
    mission: str
    agent_capabilities: List[AgentCapability]  # Each has confidence_threshold
    operating_boundaries: List[str]
    active_guardrails: List[str]
    guardrail_violations: List[GuardrailViolation]

class AgentCapability(BaseModel):
    """Agent self-awareness: what it can/cannot do."""
    agent_name: str
    capabilities: List[str]
    limitations: List[str]
    confidence_threshold: float  # Guardrail threshold

Example confidence check in final report synthesis (from src/orchestrator.py):

# Check agent confidence against self-model threshold
threshold = self_model.agent_capabilities[0].confidence_threshold
if sentiment.confidence >= threshold:
    status = "✓ Pass"
else:
    status = "⚠ Low"
    # Could trigger guardrail violation recording

Output Reports

The system generates 4 markdown reports per analysis:

1. Sentiment Analysis Report

Filename: sentiment_[TICKER]_[TIMESTAMP].md

Contents:

  • Overall sentiment classification (very_positive/positive/neutral/negative/very_negative)
  • Numerical sentiment score (-1.0 to 1.0)
  • Market sentiment summary
  • Key sentiment drivers
  • News headlines analyzed (from Tavily)
  • Tool validations performed
  • Confidence metrics

2. Event Detection Report

Filename: event_detection_[TICKER]_[TIMESTAMP].md

Contents:

  • List of significant events detected
  • Event descriptions and types
  • Verification status (mentioned in call, verified)
  • Impact assessments (high/medium/low)
  • SEC filing references (8-K, 10-Q, 10-K)
  • Tool validations performed
  • Confidence metrics

3. Volatility Prediction Report

Filename: volatility_[TICKER]_[TIMESTAMP].md

Contents:

  • Predicted volatility level (very_high/high/moderate/low/very_low)
  • Volatility score (0.0 to 1.0)
  • Historical volatility comparison
  • Key volatility drivers
  • Transcript insights (answers to analysis questions)
  • Sentiment impact analysis
  • Event impact analysis
  • Tool validations performed
  • Confidence metrics

4. Final Comprehensive Report

Filename: final_report_[TICKER]_[TIMESTAMP].md

Contents:

  • Executive summary
  • Coordinator's metacognitive analysis
  • Complete sentiment analysis
  • Complete event detection
  • Complete volatility prediction
  • Guardrails applied
  • Confidence summary table
  • System disclaimers

Project Structure

finsight-agent/
├── src/                          # Main source code
│   ├── __init__.py              # Package initialization
│   ├── config.py                # Configuration management
│   ├── models.py                # Pydantic data models
│   ├── tools.py                 # External tool integrations
│   ├── orchestrator.py          # LangGraph workflow orchestrator
│   ├── main.py                  # CLI entry point
│   └── agents/                  # Agent implementations
│       ├── __init__.py
│       ├── base.py              # Base agent class
│       ├── coordinator.py       # Coordinator agent
│       ├── sentiment.py         # Sentiment analysis agent
│       ├── events.py            # Event detection agent
│       └── volatility.py        # Volatility prediction agent
│
├── data/                         # Data directories
│   ├── input/                   # Input transcripts
│   ├── output/                  # Generated reports (gitignored)
│   └── sec_filings/             # SEC filings cache (gitignored)
│
├── app.py                        # 🚀 Streamlit web interface
├── api.py                        # Stubbed API for Streamlit POC
├── test_api.py                   # API stub testing script
├── run_streamlit.sh              # Streamlit launcher script
├── .streamlit/                   # Streamlit configuration
│   └── config.toml              # UI theme and settings
│
├── demo.py                       # Interactive demo script
├── examples.py                   # Programmatic usage examples
├── test_setup.py                 # Setup verification script
├── prepare_transcript.py         # Transcript preparation utility
├── run_test.sh                   # Quick test script
│
├── pyproject.toml               # Project metadata and dependencies
├── requirements.txt             # Pip requirements (includes streamlit)
├── .env.example                 # Environment template
├── .gitignore                   # Git ignore rules
│
├── README.md                    # This file (main documentation)
├── CHANGELOG.md                 # Version history and changes
├── CODE_OF_CONDUCT.md           # Community guidelines
└── CONTRIBUTING.md              # Contribution guidelines

Agent Details

1. Coordinator Agent (Metacognitive)

Role: Metacognitive planning, orchestration, and self-assessment

Capabilities:

  • Interprets user queries with self-awareness
  • Creates detailed analysis plans
  • Determines which agents to invoke
  • Self-scores confidence in understanding (LLM-as-Judge)
  • Provides explicit reasoning for decisions

Metacognitive Process:

1. Analyze user query
2. Determine user intent
3. Create step-by-step analysis plan
4. Select appropriate specialized agents
5. 🔍 SELF-ASSESS: Score confidence in understanding (0-1)
6. Provide transparent reasoning
7. Return structured MetacognitiveDecision

Output: MetacognitiveDecision with plan, confidence, and reasoning

Self-Assessment Example:

MetacognitiveDecision(
    user_intent="Comprehensive financial analysis",
    analysis_plan=["Step 1...", "Step 2...", ...],
    agents_to_invoke=["sentiment_analysis", "event_detection", "volatility_prediction"],
    confidence=0.85,  # ← Coordinator judges itself at 85% confident
    reasoning="User query is clear and comprehensive. All three agents needed..."
)

2. Sentiment Analysis Agent

Role: Analyze sentiment from transcript and validate with market news

Tool: Tavily Search API

Capabilities:

  • Extract sentiment from transcript
  • Search recent financial news
  • Compare transcript vs. market sentiment
  • Identify sentiment drivers
  • Self-score confidence using LLM-as-Judge

Process:

1. Extract sentiment from transcript
2. Search Tavily for recent news (5 articles)
3. Analyze market sentiment
4. Compare transcript vs. news sentiment
5. Generate sentiment classification
6. Calculate sentiment score (-1.0 to 1.0)
7. 🔍 SELF-ASSESS: Score confidence in sentiment analysis (0-1)
8.GUARDRAIL CHECK: confidence >= 65%?
9. Save sentiment report with confidence score

Output: SentimentAnalysisResult with classification, score, validations, and self-scored confidence

Confidence Threshold (Guardrail): ≥65%

Self-Assessment Example:

SentimentAnalysisResult(
    overall_sentiment="positive",
    sentiment_score=0.72,
    confidence=0.68,  # ← Agent judges itself at 68% confident
    market_sentiment="Generally positive market reaction...",
    key_sentiment_drivers=["Revenue growth", "AI initiatives"],
    tool_validations=["Validated with 5 Tavily news articles"]
)
# Guardrail: 68% >= 65% ✓ Pass

3. Event Detection Agent

Role: Identify significant corporate events and verify with official filings

Tool: SEC EDGAR Downloader

Capabilities:

  • Detect events from transcript
  • Download SEC filings (8-K, 10-Q, 10-K)
  • Verify events against official sources
  • Assess event materiality
  • Self-score confidence using LLM-as-Judge

Process:

1. Identify significant events in transcript
2. Download recent SEC filings (8-K, 10-Q)
3. Cross-reference events with filings
4. Verify event details
5. Assess impact (high/medium/low)
6. 🔍 SELF-ASSESS: Score confidence in event detection (0-1)
7.GUARDRAIL CHECK: confidence >= 70%?
8. Save event detection report with confidence score

Output: SignificantEventDetectionResult with events, validations, and self-scored confidence

Confidence Threshold (Guardrail): ≥70% (Highest threshold due to verification requirements)

Self-Assessment Example:

SignificantEventDetectionResult(
    events=[...],
    total_events_found=3,
    verified_count=2,
    confidence=0.75,  # ← Agent judges itself at 75% confident
    tool_validations=["Verified against SEC 8-K filings", "Cross-referenced 10-Q"]
)
# Guardrail: 75% >= 70% ✓ Pass

4. Volatility Prediction Agent

Role: Predict stock volatility using multi-modal analysis

Tool: yfinance API

Capabilities:

  • Answer structured questions from transcript
  • Retrieve historical volatility data
  • Analyze price movements
  • Integrate sentiment and event data from other agents
  • Self-score confidence using LLM-as-Judge

Process:

1. Answer analysis questions from transcript
2. Get historical volatility (1-month) from yfinance
3. Get price movement data
4. Retrieve stock information
5. Integrate sentiment results from Sentiment Agent
6. Integrate event results from Event Detection Agent
7. Predict future volatility (multi-modal synthesis)
8. Calculate volatility score (0.0 to 1.0)
9. 🔍 SELF-ASSESS: Score confidence in prediction (0-1)
10.GUARDRAIL CHECK: confidence >= 60%?
11. Save volatility report with confidence score

Output: VolatilityPredictionResult with prediction, validations, and self-scored confidence

Confidence Threshold (Guardrail): ≥60% (Lower threshold due to inherent prediction uncertainty)

Self-Assessment Example:

VolatilityPredictionResult(
    predicted_volatility="moderate",
    volatility_score=0.48,
    confidence=0.65,  # ← Agent judges itself at 65% confident
    historical_volatility=0.22,
    sentiment_impact="Positive sentiment reduces expected volatility",
    event_impact="2 high-impact events increase volatility",
    tool_validations=["Historical data from yfinance (30 days)"]
)
# Guardrail: 65% >= 60% ✓ Pass

Tool Integration

Tool Usage Matrix

Agent External Tool Purpose API Calls
Sentiment Tavily Search API Validate sentiment with recent news 1-2 calls
Event Detection SEC EDGAR Downloader Verify events with official filings 1-3 downloads
Volatility yfinance API Validate predictions with market data 2-3 calls

Tool Details

1. Tavily Search API (Sentiment Agent)

What it does:

  • Searches recent financial news
  • Filters for earnings and sentiment-related articles
  • Returns up to 5 relevant articles

Usage in code:

from src.tools import news_search_tool

# Get sentiment-relevant news
results = news_search_tool.get_sentiment_news(ticker="GOOGL")
headlines = [r['title'] for r in results]

2. SEC EDGAR Downloader (Event Agent)

What it does:

  • Downloads official SEC filings
  • Focuses on 8-K (material events), 10-Q (quarterly), 10-K (annual)
  • Caches filings locally

Usage in code:

from src.tools import sec_filing_tool

# Download recent filings
results = sec_filing_tool.download_recent_filings(
    ticker="GOOGL",
    filing_types=["8-K", "10-Q"],
    limit=3
)

3. yfinance API (Volatility Agent)

What it does:

  • Retrieves historical stock data
  • Calculates volatility metrics
  • Gets price movement statistics

Usage in code:

from src.tools import market_data_tool

# Get historical volatility
volatility = market_data_tool.get_historical_volatility(
    ticker="GOOGL",
    period="1mo"
)

# Get price movement
movement = market_data_tool.get_price_movement(
    ticker="GOOGL",
    period="1mo"
)

Guardrails & Safety

Overview

FinSight implements runtime metacognitive guardrails powered by the self-model and LLM-as-Judge patterns. Unlike static rule-based systems, these guardrails involve the AI reasoning about its own capabilities and confidence during execution.

Active Guardrails (Runtime)

1. Confidence Threshold Enforcement (LLM-as-Judge)

  • Mechanism: Each agent self-scores confidence in its findings (0-1)
  • Check: Runtime comparison against agent-specific thresholds
  • Action: Below-threshold results trigger warnings and are flagged in reports
  • Implementation: Confidence scores embedded in structured outputs (Pydantic models)

Example Flow:

# Agent generates result with self-assessment
result = SentimentAnalysisResult(
    overall_sentiment="positive",
    sentiment_score=0.72,
    confidence=0.68,  # LLM judges itself at 68% confident
    ...
)

# Runtime guardrail check
if result.confidence < threshold (0.65):
    record_guardrail_violation(...)

2. Source Verification Requirement

  • Mechanism: All findings must cite external validation sources
  • Check: tool_validations field must be populated
  • Action: Reports show which tools validated each finding
  • Transparency: Clear audit trail in every report

3. Investment Advice Prohibition (Self-Model Boundary)

  • Mechanism: System self-model defines operating boundaries
  • Check: Coordinator is aware of this boundary during planning
  • Action: Explicitly avoids stock recommendations in all outputs
  • Purpose: Legal/ethical compliance, educational use only

4. Transparent Limitation Disclosure (Metacognitive Awareness)

  • Mechanism: Each agent declares its limitations in the self-model
  • Check: Limitations included in AgentCapability definitions
  • Action: All reports include disclaimers and confidence levels
  • Purpose: User awareness of system boundaries

Confidence Thresholds (Guardrail Parameters)

Agent Threshold Purpose Defined In
Sentiment Analysis ≥65% Minimum for sentiment claims AgentCapability.confidence_threshold
Event Detection ≥70% Minimum for event verification AgentCapability.confidence_threshold
Volatility Prediction ≥60% Minimum for volatility predictions AgentCapability.confidence_threshold

These thresholds are part of the self-model and can be adjusted based on use case risk tolerance.

Operating Boundaries (Self-Model Constraints)

These boundaries are encoded in the system's self-model and inform agent behavior:

✗ NO personalized investment advice
✗ NO stock buy/sell recommendations
✗ NO guarantees about future performance
✓ Educational and analytical purposes ONLY
✓ Must disclose all confidence levels
✓ Must cite all sources

Guardrail Tracking & Runtime Monitoring

All guardrail checks are actively monitored during execution and reported in the final report.

Guardrail Violation Structure

When confidence falls below threshold:

GuardrailViolation(
    timestamp: "2025-01-15 14:32:10",
    agent: "Event Detection Agent",
    guardrail_type: "Confidence Threshold",
    description: "Confidence 68% below threshold 70%",
    action_taken: "Flagged in report, included with warning"
)

Example Report Output (No Violations)

## 5. Guardrails and System Boundaries

**Guardrail Checks Performed:** 0

*All confidence thresholds met. No guardrail violations detected.*

**Active Guardrails:**
- Confidence threshold enforcement
- Source verification requirement
- Investment advice prohibition
- Transparent limitation disclosure

**Operating Boundaries:**
- NO personalized investment advice or stock recommendations
- NO guarantees about future stock performance
- All outputs are for educational and analytical purposes only
- Must disclose confidence levels and limitations

Example Report Output (With Violations)

## 5. Guardrails and System Boundaries

**Guardrail Checks Performed:** 1

- **Confidence Threshold** (Event Detection Agent)
  - Confidence 68% fell below required threshold of 70%
  - Action: Flagged in report with low-confidence warning

**Active Guardrails:**
- Confidence threshold enforcement ← *TRIGGERED*
- Source verification requirement
- Investment advice prohibition
- Transparent limitation disclosure

Confidence Summary Table (LLM-as-Judge Results)

Every final report includes a table showing how each agent scored itself:

## 6. System Confidence Summary

| Agent | Confidence | Threshold | Status |
|-------|-----------|-----------|--------|
| Sentiment Analysis | 72% | 65% | ✓ Pass |
| Event Detection | 68% | 70% | ⚠ Low  |
| Volatility Prediction | 75% | 60% | ✓ Pass |

This transparency allows users to assess the reliability of each component independently.


Configuration & Customization

Custom Analysis Questions

Edit src/models.py to customize questions:

DEFAULT_ANALYSIS_QUESTIONS = [
    AnalysisQuestion(
        category="Financial Performance",
        focus_item="Revenue Growth",
        question="What was the year-over-year revenue growth rate?",
        priority="high"
    ),
    AnalysisQuestion(
        category="Strategic Initiatives",
        focus_item="Key Projects",
        question="What are the key projects or initiatives?",
        priority="high"
    ),
    # Add your custom questions here
    AnalysisQuestion(
        category="Your Category",
        focus_item="Your Focus",
        question="Your specific question?",
        priority="high"
    ),
]

Adjust Confidence Thresholds

Edit src/models.py:

AgentCapability(
    agent_name="Sentiment Analysis Agent",
    capabilities=[...],
    limitations=[...],
    confidence_threshold=0.65  # Adjust as needed
)

Change Model Parameters

Edit src/config.py:

@dataclass
class ScalewayConfig:
    model: str = "qwen3-235b-a22b-instruct-2507"
    temperature: float = 0.0      # Adjust for creativity
    max_tokens: int = 2048         # Adjust for response length

Custom Output Directory

Specify via CLI:

python -m src.main \
  --transcript transcript.txt \
  --ticker GOOGL \
  --output ./custom_reports

Or programmatically:

from src.config import config

config.paths.output_dir = "./custom_reports"

Troubleshooting

Import Errors

Problem: ModuleNotFoundError or import errors

Solution:

pip install -r requirements.txt

Verify all packages are installed:

python -c "import pydantic, langchain_core, langgraph, yfinance; print('OK')"

Configuration Errors

Problem: "Missing required environment variables"

Solution:

  1. Ensure .env file exists:

    cp .env.example .env
  2. Verify all required keys are set:

    cat .env
  3. Check for extra spaces or quotes in values

  4. Test configuration:

    python -c "from src.config import config; config.validate(); print('OK')"

Transcript Not Found

Problem: "Error: Transcript file not found"

Solution:

  1. Verify file path:

    ls -l data/input/
  2. Use absolute path:

    python -m src.main --transcript /full/path/to/transcript.txt --ticker GOOGL
  3. Ensure file is .txt format (not PDF or other)

API Errors

Problem: API connection or authentication errors

Solution:

For Scaleway:

  • Verify project ID and secret key are correct
  • Check project has GenAI API enabled
  • Ensure API quota not exceeded

For Tavily:

  • Verify API key is valid
  • Check rate limits (free tier: 1000 searches/month)
  • Ensure internet connection

For yfinance:

  • Usually no auth needed
  • Check internet connection
  • Try different ticker symbol

Test each API:

# Test Tavily
python -c "from langchain_community.tools.tavily_search import TavilySearchResults; tool = TavilySearchResults(api_key='YOUR_KEY'); print(tool.invoke('test'))"

# Test yfinance
python -c "import yfinance as yf; print(yf.Ticker('GOOGL').info['longName'])"

Out of Memory

Problem: System runs out of memory

Solution:

  • Reduce max_tokens in src/config.py
  • Process shorter transcript excerpts
  • Close other applications
  • Use a machine with more RAM

Slow Performance

Problem: Analysis takes too long

Solution:

  • Check internet connection
  • Reduce number of analysis questions
  • Use shorter transcripts for testing
  • Consider local caching of API results

Development

Adding New Agents

  1. Create agent file in src/agents/:

    # src/agents/my_agent.py
    from .base import BaseAgent
    from ..models import MyAgentResult
    
    class MyAgent(BaseAgent):
        def __init__(self):
            super().__init__("My Agent")
        
        def process(self, state):
            # Your agent logic here
            return {"my_result": result}
  2. Add to src/agents/__init__.py:

    from .my_agent import MyAgent
    
    __all__ = [..., "MyAgent"]
  3. Define output model in src/models.py:

    class MyAgentResult(BaseModel):
        field1: str
        field2: int
        confidence: float
  4. Update orchestrator in src/orchestrator.py:

    self.my_agent = MyAgent()
    
    workflow.add_node("my_agent", self._my_agent_node)
    workflow.add_edge("previous_node", "my_agent")

Adding New Tools

  1. Create tool class in src/tools.py:

    class MyTool:
        def __init__(self, api_key: str):
            self.api_key = api_key
        
        def fetch_data(self, query: str):
            # Tool logic here
            pass
  2. Initialize globally:

    my_tool = MyTool(config.api.my_api_key)
  3. Use in agents:

    from ..tools import my_tool
    
    data = my_tool.fetch_data("query")

Running Tests

Currently, the system includes:

# Setup verification
python test_setup.py

# Interactive demo
python demo.py

# Quick test script
./run_test.sh

Future: Unit tests and integration tests (TODO)

Code Style

  • Follow PEP 8
  • Use type hints
  • Document with docstrings
  • Use Pydantic for data validation

Examples

Example 1: Basic Analysis

from src.orchestrator import FinSightOrchestrator

orchestrator = FinSightOrchestrator()
result = orchestrator.run_analysis(
    transcript_path="data/input/Alphabet_2025_Q1_Earnings_Call_complete_transcript.txt",
    ticker="GOOGL"
)

Example 2: Custom Query

from src.orchestrator import FinSightOrchestrator

orchestrator = FinSightOrchestrator()
result = orchestrator.run_analysis(
    transcript_path="data/input/transcript.txt",
    ticker="GOOGL",
    user_query="Focus on AI initiatives and their impact on revenue growth"
)

Example 3: Accessing Results

from src.orchestrator import FinSightOrchestrator

orchestrator = FinSightOrchestrator()
result = orchestrator.run_analysis(
    transcript_path="data/input/transcript.txt",
    ticker="GOOGL"
)

# Access sentiment
if result.get('sentiment_result'):
    sent = result['sentiment_result']
    print(f"Sentiment: {sent.overall_sentiment}")
    print(f"Score: {sent.sentiment_score:.2f}")
    print(f"Confidence: {sent.confidence:.2%}")
    print(f"News analyzed: {len(sent.news_headlines)}")

# Access events
if result.get('event_detection_result'):
    events = result['event_detection_result']
    print(f"\nEvents detected: {events.total_events_found}")
    print(f"Verified: {events.verified_count}")
    for event in events.events:
        print(f"- {event.event_type}: {event.description[:50]}...")

# Access volatility
if result.get('volatility_result'):
    vol = result['volatility_result']
    print(f"\nPredicted volatility: {vol.predicted_volatility}")
    print(f"Score: {vol.volatility_score:.2f}")
    print(f"Historical: {vol.historical_volatility:.2%}")

Example 4: Command-Line Variations

# Basic
python -m src.main -t data/input/transcript.txt -s GOOGL

# Custom query
python -m src.main \
  -t data/input/transcript.txt \
  -s AAPL \
  -q "Analyze product announcements and their market impact"

# Custom output directory
python -m src.main \
  -t data/input/transcript.txt \
  -s MSFT \
  -o ./quarterly_reports/Q1_2025

# Help
python -m src.main --help

Dependencies

Core Libraries

  • pydantic (≥2.12.3) - Data validation and modeling
  • langchain-core (≥1.0.2) - LLM orchestration framework
  • langchain-openai (≥1.0.1) - OpenAI-compatible API client
  • langchain-community (≥0.4.1) - Community tools (Tavily)
  • langgraph (≥0.2.0) - Multi-agent workflow graphs
  • yfinance (≥0.2.66) - Market data retrieval
  • sec-edgar-downloader (≥5.0.3) - SEC filing access
  • python-dotenv (≥1.0.0) - Environment configuration
  • typing-extensions (≥4.9.0) - Type hints

Full Requirements

See requirements.txt for complete list:

cat requirements.txt

Installing Dependencies

pip install -r requirements.txt

Or with UV:

uv pip install -e .

License

Apache License 2.0 - See LICENSE file for details


Contributing

See CONTRIBUTING.md for guidelines on how to contribute to this project.


Disclaimers

⚠️ IMPORTANT: This system is for educational and informational purposes only.

This system does NOT constitute:

  • Investment advice
  • Financial recommendations
  • Guarantees about future stock performance
  • Professional financial guidance

Key Points:

  • All investments involve risk, including possible loss of principal
  • Past performance does not guarantee future results
  • The system's predictions and analyses should not be the sole basis for investment decisions
  • Consult with qualified financial professionals before making any investment decisions
  • The creators and contributors are not liable for any financial losses

Data Sources:

  • External APIs (Tavily, SEC EDGAR, yfinance) may have limitations or delays
  • Information accuracy depends on source data quality
  • Always verify critical information with official sources

Support

For issues, questions, or feature requests:

  1. Check this README for solutions
  2. Run python test_setup.py to verify configuration
  3. Try python demo.py for a working example
  4. Review examples.py for usage patterns
  5. Open an issue on GitHub

Acknowledgments

Built using:

  • Scaleway GenAI for LLM inference
  • LangChain and LangGraph for orchestration
  • Tavily for news search
  • SEC EDGAR for official filings
  • yfinance for market data

FinSight Agent v1.0 - Metacognitive Multi-Agent Financial Analysis System

Generated with 🤖 precision and 📊 financial insights

About

Repository for the FinSight Agent and its Streamlit Application Front End

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors