An advanced multi-agent system that performs comprehensive financial analysis of earnings calls using metacognitive reasoning and specialized expert agents.
- Overview
- Quick Start
- Installation
- Configuration
- Usage
- Architecture
- Metacognitive Self-Model & LLM-as-Judge
- Output Reports
- Project Structure
- Agent Details
- Tool Integration
- Guardrails & Safety
- Configuration & Customization
- Troubleshooting
- Development
- Examples
- Dependencies
- License
FinSight Agent is a production-ready multi-agent system that analyzes earnings call transcripts using metacognitive self-awareness and LLM-as-Judge patterns for runtime quality control:
- Metacognitive Self-Model: System maintains awareness of its own capabilities, limitations, and confidence thresholds
- LLM-as-Judge: Each agent self-scores confidence in findings; automated guardrail enforcement at runtime
- 4 Specialized Agents: Coordinator, Sentiment Analysis, Event Detection, and Volatility Prediction
- External Tool Validation: Tavily news search, SEC EDGAR filings, and yfinance market data
- Confidence-Based Guardrails: Automated quality control with 60-70% minimum confidence thresholds
- Individual Reports: Separate markdown outputs for each agent plus comprehensive final report with transparency
- Transparent Uncertainty: All outputs include confidence scores, limitations, and guardrail check results
✅ Metacognitive Self-Model - System maintains self-awareness of capabilities, limitations, and boundaries
✅ LLM-as-Judge Integration - Each agent self-scores confidence; runtime guardrail enforcement
✅ Multi-Agent Orchestration - LangGraph-based workflow with metacognitive coordinator
✅ Tool-Enhanced Validation - Each agent uses external APIs to verify findings
✅ Confidence Threshold Guardrails - Automated quality control with 65-70% minimum thresholds
✅ Transparent Uncertainty - All outputs include confidence levels and limitations
✅ Structured Outputs - Pydantic models ensure type safety and validation
✅ Individual Reports - 4 separate markdown reports per analysis
✅ Configurable - Customizable questions, thresholds, and model parameters
✅ Production-Ready - Proper error handling, logging, and CLI interface
cd /Users/vincent/Code/finsight-agent
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env with your API keys:
# - SCW_DEFAULT_PROJECT_ID
# - SCW_SECRET_KEY
# - TAVILY_API_KEYpython test_setup.pypython -m src.main \
--transcript data/input/Alphabet_2025_Q1_Earnings_Call_complete_transcript.txt \
--ticker GOOGLls -l data/output/- Python 3.11+
- Internet connection (for API calls)
- ~500 MB disk space
-
Navigate to project directory:
cd /Users/vincent/Code/finsight-agent -
Create virtual environment:
python -m venv .venv source .venv/bin/activate # macOS/Linux # OR .venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
Or using UV (recommended):
uv pip install -e . -
Verify installation:
python test_setup.py
Create a .env file in the project root:
cp .env.example .envAdd your API keys:
# Scaleway GenAI API
SCW_DEFAULT_PROJECT_ID=your_scaleway_project_id_here
SCW_SECRET_KEY=your_scaleway_secret_key_here
# Tavily Search API
TAVILY_API_KEY=your_tavily_api_key_hereScaleway GenAI:
- Go to https://console.scaleway.com/
- Navigate to AI/ML → Generative APIs
- Create project and get credentials
Tavily Search:
- Go to https://tavily.com/
- Sign up and get API key
The system uses Scaleway GenAI with these default parameters:
Model: qwen3-235b-a22b-instruct-2507
Temperature: 0.0 (deterministic)
Max Tokens: 2048To modify, edit src/config.py:
@dataclass
class ScalewayConfig:
model: str = "qwen3-235b-a22b-instruct-2507"
temperature: float = 0.0
max_tokens: int = 2048Launch the interactive web application:
streamlit run app.pyThe Streamlit interface provides a professional dashboard that displays:
- Sentiment analysis with confidence metrics and news validation
- Event detection with SEC filing verification
- Volatility predictions with market data
- Comprehensive final report with metacognitive analysis and guardrails
- Tabbed interface separating analysis outputs from audit/validation data
Note: Currently uses a stubbed API that reads pre-generated reports from data/output/ for demonstration. To integrate with live backend, replace API stubs in api.py with calls to src.orchestrator.FinSightOrchestrator.
Basic usage:
python -m src.main --transcript <PATH> --ticker <SYMBOL>With custom query:
python -m src.main \
--transcript data/input/transcript.txt \
--ticker GOOGL \
--query "Focus on AI initiatives and revenue impact"With custom output directory:
python -m src.main \
--transcript data/input/transcript.txt \
--ticker AAPL \
--output ./my_reportsRequired:
-t, --transcript PATH Path to earnings call transcript (.txt format)
-s, --ticker SYMBOL Company ticker symbol (e.g., GOOGL, AAPL)
Optional:
-q, --query TEXT Custom analysis query
-o, --output DIR Output directory for reports (default: data/output)
from src.orchestrator import FinSightOrchestrator
# Initialize orchestrator
orchestrator = FinSightOrchestrator()
# Run analysis
result = orchestrator.run_analysis(
transcript_path="data/input/transcript.txt",
ticker="GOOGL",
user_query="Comprehensive financial analysis" # Optional
)
# Access individual results
sentiment = result['sentiment_result']
print(f"Sentiment: {sentiment.overall_sentiment}")
print(f"Score: {sentiment.sentiment_score}")
print(f"Confidence: {sentiment.confidence}")
events = result['event_detection_result']
print(f"Events detected: {events.total_events_found}")
volatility = result['volatility_result']
print(f"Predicted volatility: {volatility.predicted_volatility}")
# Access final report
final_report = result['final_report']Run an interactive demonstration:
python demo.pyThis will:
- Check prerequisites
- Run a complete analysis on sample data
- Display results summary
- Show generated reports
What you are looking at is the cognitive architecture of the FinSight Agent. In traditional RAG or agent systems, the flow is often linear: Input → Retrieve → Generate. But in high-stakes financial domains, that isn't enough. We need safety, accuracy, and self-correction.
This diagram illustrates our Metacognitive Execution Flow. It's not just about doing the task; it's about monitoring the task while it's happening. Let's walk through the journey of a user query as it navigates our Agentic Execution Loop, our Dynamic Guardrails, and our LLM-as-Judge evaluation system.
It starts here on the left. The User Query enters the system and hits our Orchestration Agent.
The Orchestrator: Think of this as the project manager. It doesn't do the heavy lifting; it plans. It decomposes the query into actionable sub-tasks.
The Self Model (Gray Box): Crucially, the orchestrator consults the 'Self Model.' This is the agent's memory and context awareness. It asks: "What do I know about this user? What is the current market context? Am I hallucinating stale data?" This prevents the agent from starting off with bad assumptions.
Once the plan is set, we enter the Blue Box: the Agentic Execution Loop. This is where the work gets done by our specialist sub-agents. We don't rely on one generalist model; we delegate to specialists:
- Sentiment Agent: Scrapes news and social signals to gauge market feeling.
- Event Agent: Looks for hard quantitative data—earnings calls, fed rate changes, or specific filings.
- Volatility Agent: Analyzes risk metrics and generates VIX-related insights.
These agents work in parallel or sequence depending on the Orchestrator's plan, synthesizing raw data into a financial narrative.
Now, this is the differentiator. Most agents would just take that data and write a response. FinSight does not.
It passes the data through the Metacognitive Monitor (The Red Box). This is a real-time supervisor sitting inside the execution loop. It performs a Dynamic Guardrail Check:
- Compliance & Safety: Is this financial advice? (If so, flag it).
- Data Integrity: Do the numbers from the Volatility Agent match the trend from the Sentiment Agent?
- Hallucination Check: Does the insight exist in the retrieved documents?
The Decision Diamond:
- 'No' (Pass): If no violation is found, we follow the Green line out to draft the response.
- 'Yes' (Violation): If a violation is found, we trigger a Corrective Signal (The Red arrow). We don't crash; we self-correct immediately before generating a draft.
Assuming we pass the guardrails, we generate a Draft Response. But we still don't show it to the user. We send it to the LLM-as-Judge. This is a separate, highly capable model prompt-engineered strictly for evaluation. It scores the response based on custom scoring metrics:
- Accuracy
- Tone (Professional/Financial)
- Completeness
The Evaluation Diamond: This is the final gatekeeper.
- Score < Threshold (Red Path): If the response is mediocre or unsafe, the Judge rejects it. It sends feedback all the way back to the Orchestration Agent or the Correction Strategy module. The system literally 'retries' the thought process with new instructions.
- Score Acceptable (Green Path): Only when the score meets our high-quality threshold do we move to the final step.
This Red Feedback Loop is what makes the agent 'Metacognitive.' It thinks about its thinking, learns from its own attempts within the session. It adjusts its plan, maybe asks the Sentiment Agent to look deeper, or asks the Volatility Agent to check a different timeframe.
Ultimately, the user receives the Final Response Delivery (Green Box). To them, it looks like a single instant answer. But under the hood, FinSight has planned, executed, monitored, fact-checked, judged, and potentially re-done the work to ensure the financial insight is actionable and safe.
Also, as part of the Execution Flow process, we collected the entire trajectory and output of actions / evaluations during execution, which can serve for audit / traceability purposes and improve the system design in future iterations.
- User Input → Transcript path + ticker symbol
- Orchestrator → Metacognitive planning with self-model consultation
- Load Transcript → Reads and validates file
- Agentic Execution Loop → Specialist agents process in parallel:
- Sentiment Agent → Tavily validation → Self-evaluation
- Event Agent → SEC EDGAR validation → Self-evaluation
- Volatility Agent → yfinance validation → Self-evaluation
- Metacognitive Monitor → Dynamic guardrail checks during execution
- Draft Generation → Synthesize findings with transparency
- LLM-as-Judge → Quality scoring and threshold enforcement
- Feedback Loop → Corrective actions if quality < threshold
- Final Output → 4 markdown reports saved to
data/output/with audit trail
FinSight implements runtime metacognition and LLM-as-Judge patterns as core guardrail mechanisms. The system maintains self-awareness of its capabilities, limitations, and confidence levels throughout execution.
The system maintains a comprehensive self-model (FinSightSelfModel) that includes:
FinSightSelfModel:
├── Mission: System's core purpose and goals
├── Agent Capabilities: What each agent can/cannot do
│ ├── Capabilities list
│ ├── Limitations list
│ └── Confidence threshold (guardrail)
├── Operating Boundaries: Hard limits and ethical constraints
├── Active Guardrails: Runtime checks and validations
└── Guardrail Violations: Tracked incidentsEach agent implements self-evaluation by scoring its own findings:
MetacognitiveDecision:
├── user_intent: Understood goal
├── analysis_plan: Step-by-step approach
├── agents_to_invoke: Selected specialists
├── confidence: Self-scored (0-1) ← LLM judges itself
└── reasoning: Explanation of planEach specialized agent scores its confidence:
| Agent | Self-Scores | Threshold | Guardrail Action |
|---|---|---|---|
| Sentiment | Sentiment confidence (0-1) | ≥65% | Flag if below threshold |
| Event Detection | Event confidence (0-1) | ≥70% | Flag if below threshold |
| Volatility | Prediction confidence (0-1) | ≥60% | Flag if below threshold |
┌─────────────────────────────────────────────────┐
│ Agent completes analysis │
│ ↓ │
│ Agent scores own confidence (LLM-as-Judge) │
│ ↓ │
│ Compare: confidence >= threshold? │
│ ├─ YES → ✓ Pass guardrail check │
│ └─ NO → ⚠ Record guardrail violation │
│ ↓ │
│ Add to guardrails_applied list │
│ ↓ │
│ Include warning in final report │
└─────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ SYSTEM START │
│ Load FinSightSelfModel │
│ (Mission, Capabilities, Boundaries, Thresholds) │
└───────────────────────┬──────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────┐
│ COORDINATOR (Metacognitive) │
│ 1. Analyze user query │
│ 2. Plan analysis approach │
│ 3. Self-score: confidence in understanding │
│ 4. Check: confidence meets expectations? │
│ └─ Record decision reasoning │
└───────────────────────┬──────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────┐
│ FOR EACH AGENT: │
│ ┌────────────────────────────────────────────────────┐ │
│ │ 1. Execute Analysis (with external tool validation)│ │
│ │ ├─ Sentiment: Analyze + Tavily news │ │
│ │ ├─ Events: Detect + SEC EDGAR filings │ │
│ │ └─ Volatility: Predict + yfinance data │ │
│ └────────────────────┬───────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ 2. LLM-as-Judge: Self-Score Confidence │ │
│ │ - Agent evaluates own findings │ │
│ │ - Produces confidence score (0-1) │ │
│ │ - Example: "I am 72% confident in this result" │ │
│ └────────────────────┬───────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ 3. Guardrail Check (Runtime) │ │
│ │ Compare: agent_confidence >= threshold? │ │
│ │ ├─ Sentiment: >= 65%? │ │
│ │ ├─ Events: >= 70%? │ │
│ │ └─ Volatility: >= 60%? │ │
│ └────────────────────┬───────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Below threshold? │ │
│ └────┬─────────┬───┘ │
│ YES ↓ ↓ NO │
│ ┌──────────────────┐ ✓ Continue │
│ │ Record Violation │ │
│ │ - Timestamp │ │
│ │ - Agent name │ │
│ │ - Guardrail type │ │
│ │ - Description │ │
│ │ - Action taken │ │
│ └──────────────────┘ │
└───────────────────────┬──────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────┐
│ SYNTHESIZE FINAL REPORT │
│ 1. Combine all agent findings │
│ 2. Include metacognitive analysis │
│ 3. Display confidence summary table: │
│ ┌────────────┬────────────┬───────────┬─────────┐ │
│ │ Agent │ Confidence │ Threshold │ Status │ │
│ ├────────────┼────────────┼───────────┼─────────┤ │
│ │ Sentiment │ 72% │ 65% │ ✓ Pass │ │
│ │ Events │ 68% │ 70% │ ⚠ Low │ │
│ │ Volatility │ 75% │ 60% │ ✓ Pass │ │
│ └────────────┴────────────┴───────────┴─────────┘ │
│ 4. Report guardrail violations (if any) │
│ 5. Include system boundaries and disclaimers │
└──────────────────────────────────────────────────────────┘
✅ Transparent Uncertainty - Every output includes confidence scores
✅ Runtime Validation - Agents self-assess before committing results
✅ Automated Quality Control - Threshold guardrails catch low-confidence outputs
✅ Audit Trail - All self-assessments and guardrail checks are logged
✅ Metacognitive Reasoning - Coordinator explains its decision-making process
The self-model and confidence checks are defined in src/models.py:
class FinSightSelfModel(BaseModel):
"""Metacognitive self-model for runtime guardrails."""
system_name: str = "FinSight Agent"
mission: str
agent_capabilities: List[AgentCapability] # Each has confidence_threshold
operating_boundaries: List[str]
active_guardrails: List[str]
guardrail_violations: List[GuardrailViolation]
class AgentCapability(BaseModel):
"""Agent self-awareness: what it can/cannot do."""
agent_name: str
capabilities: List[str]
limitations: List[str]
confidence_threshold: float # Guardrail thresholdExample confidence check in final report synthesis (from src/orchestrator.py):
# Check agent confidence against self-model threshold
threshold = self_model.agent_capabilities[0].confidence_threshold
if sentiment.confidence >= threshold:
status = "✓ Pass"
else:
status = "⚠ Low"
# Could trigger guardrail violation recordingThe system generates 4 markdown reports per analysis:
Filename: sentiment_[TICKER]_[TIMESTAMP].md
Contents:
- Overall sentiment classification (very_positive/positive/neutral/negative/very_negative)
- Numerical sentiment score (-1.0 to 1.0)
- Market sentiment summary
- Key sentiment drivers
- News headlines analyzed (from Tavily)
- Tool validations performed
- Confidence metrics
Filename: event_detection_[TICKER]_[TIMESTAMP].md
Contents:
- List of significant events detected
- Event descriptions and types
- Verification status (mentioned in call, verified)
- Impact assessments (high/medium/low)
- SEC filing references (8-K, 10-Q, 10-K)
- Tool validations performed
- Confidence metrics
Filename: volatility_[TICKER]_[TIMESTAMP].md
Contents:
- Predicted volatility level (very_high/high/moderate/low/very_low)
- Volatility score (0.0 to 1.0)
- Historical volatility comparison
- Key volatility drivers
- Transcript insights (answers to analysis questions)
- Sentiment impact analysis
- Event impact analysis
- Tool validations performed
- Confidence metrics
Filename: final_report_[TICKER]_[TIMESTAMP].md
Contents:
- Executive summary
- Coordinator's metacognitive analysis
- Complete sentiment analysis
- Complete event detection
- Complete volatility prediction
- Guardrails applied
- Confidence summary table
- System disclaimers
finsight-agent/
├── src/ # Main source code
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration management
│ ├── models.py # Pydantic data models
│ ├── tools.py # External tool integrations
│ ├── orchestrator.py # LangGraph workflow orchestrator
│ ├── main.py # CLI entry point
│ └── agents/ # Agent implementations
│ ├── __init__.py
│ ├── base.py # Base agent class
│ ├── coordinator.py # Coordinator agent
│ ├── sentiment.py # Sentiment analysis agent
│ ├── events.py # Event detection agent
│ └── volatility.py # Volatility prediction agent
│
├── data/ # Data directories
│ ├── input/ # Input transcripts
│ ├── output/ # Generated reports (gitignored)
│ └── sec_filings/ # SEC filings cache (gitignored)
│
├── app.py # 🚀 Streamlit web interface
├── api.py # Stubbed API for Streamlit POC
├── test_api.py # API stub testing script
├── run_streamlit.sh # Streamlit launcher script
├── .streamlit/ # Streamlit configuration
│ └── config.toml # UI theme and settings
│
├── demo.py # Interactive demo script
├── examples.py # Programmatic usage examples
├── test_setup.py # Setup verification script
├── prepare_transcript.py # Transcript preparation utility
├── run_test.sh # Quick test script
│
├── pyproject.toml # Project metadata and dependencies
├── requirements.txt # Pip requirements (includes streamlit)
├── .env.example # Environment template
├── .gitignore # Git ignore rules
│
├── README.md # This file (main documentation)
├── CHANGELOG.md # Version history and changes
├── CODE_OF_CONDUCT.md # Community guidelines
└── CONTRIBUTING.md # Contribution guidelines
Role: Metacognitive planning, orchestration, and self-assessment
Capabilities:
- Interprets user queries with self-awareness
- Creates detailed analysis plans
- Determines which agents to invoke
- Self-scores confidence in understanding (LLM-as-Judge)
- Provides explicit reasoning for decisions
Metacognitive Process:
1. Analyze user query
2. Determine user intent
3. Create step-by-step analysis plan
4. Select appropriate specialized agents
5. 🔍 SELF-ASSESS: Score confidence in understanding (0-1)
6. Provide transparent reasoning
7. Return structured MetacognitiveDecisionOutput: MetacognitiveDecision with plan, confidence, and reasoning
Self-Assessment Example:
MetacognitiveDecision(
user_intent="Comprehensive financial analysis",
analysis_plan=["Step 1...", "Step 2...", ...],
agents_to_invoke=["sentiment_analysis", "event_detection", "volatility_prediction"],
confidence=0.85, # ← Coordinator judges itself at 85% confident
reasoning="User query is clear and comprehensive. All three agents needed..."
)Role: Analyze sentiment from transcript and validate with market news
Tool: Tavily Search API
Capabilities:
- Extract sentiment from transcript
- Search recent financial news
- Compare transcript vs. market sentiment
- Identify sentiment drivers
- Self-score confidence using LLM-as-Judge
Process:
1. Extract sentiment from transcript
2. Search Tavily for recent news (5 articles)
3. Analyze market sentiment
4. Compare transcript vs. news sentiment
5. Generate sentiment classification
6. Calculate sentiment score (-1.0 to 1.0)
7. 🔍 SELF-ASSESS: Score confidence in sentiment analysis (0-1)
8. ✓ GUARDRAIL CHECK: confidence >= 65%?
9. Save sentiment report with confidence scoreOutput: SentimentAnalysisResult with classification, score, validations, and self-scored confidence
Confidence Threshold (Guardrail): ≥65%
Self-Assessment Example:
SentimentAnalysisResult(
overall_sentiment="positive",
sentiment_score=0.72,
confidence=0.68, # ← Agent judges itself at 68% confident
market_sentiment="Generally positive market reaction...",
key_sentiment_drivers=["Revenue growth", "AI initiatives"],
tool_validations=["Validated with 5 Tavily news articles"]
)
# Guardrail: 68% >= 65% ✓ PassRole: Identify significant corporate events and verify with official filings
Tool: SEC EDGAR Downloader
Capabilities:
- Detect events from transcript
- Download SEC filings (8-K, 10-Q, 10-K)
- Verify events against official sources
- Assess event materiality
- Self-score confidence using LLM-as-Judge
Process:
1. Identify significant events in transcript
2. Download recent SEC filings (8-K, 10-Q)
3. Cross-reference events with filings
4. Verify event details
5. Assess impact (high/medium/low)
6. 🔍 SELF-ASSESS: Score confidence in event detection (0-1)
7. ✓ GUARDRAIL CHECK: confidence >= 70%?
8. Save event detection report with confidence scoreOutput: SignificantEventDetectionResult with events, validations, and self-scored confidence
Confidence Threshold (Guardrail): ≥70% (Highest threshold due to verification requirements)
Self-Assessment Example:
SignificantEventDetectionResult(
events=[...],
total_events_found=3,
verified_count=2,
confidence=0.75, # ← Agent judges itself at 75% confident
tool_validations=["Verified against SEC 8-K filings", "Cross-referenced 10-Q"]
)
# Guardrail: 75% >= 70% ✓ PassRole: Predict stock volatility using multi-modal analysis
Tool: yfinance API
Capabilities:
- Answer structured questions from transcript
- Retrieve historical volatility data
- Analyze price movements
- Integrate sentiment and event data from other agents
- Self-score confidence using LLM-as-Judge
Process:
1. Answer analysis questions from transcript
2. Get historical volatility (1-month) from yfinance
3. Get price movement data
4. Retrieve stock information
5. Integrate sentiment results from Sentiment Agent
6. Integrate event results from Event Detection Agent
7. Predict future volatility (multi-modal synthesis)
8. Calculate volatility score (0.0 to 1.0)
9. 🔍 SELF-ASSESS: Score confidence in prediction (0-1)
10. ✓ GUARDRAIL CHECK: confidence >= 60%?
11. Save volatility report with confidence scoreOutput: VolatilityPredictionResult with prediction, validations, and self-scored confidence
Confidence Threshold (Guardrail): ≥60% (Lower threshold due to inherent prediction uncertainty)
Self-Assessment Example:
VolatilityPredictionResult(
predicted_volatility="moderate",
volatility_score=0.48,
confidence=0.65, # ← Agent judges itself at 65% confident
historical_volatility=0.22,
sentiment_impact="Positive sentiment reduces expected volatility",
event_impact="2 high-impact events increase volatility",
tool_validations=["Historical data from yfinance (30 days)"]
)
# Guardrail: 65% >= 60% ✓ Pass| Agent | External Tool | Purpose | API Calls |
|---|---|---|---|
| Sentiment | Tavily Search API | Validate sentiment with recent news | 1-2 calls |
| Event Detection | SEC EDGAR Downloader | Verify events with official filings | 1-3 downloads |
| Volatility | yfinance API | Validate predictions with market data | 2-3 calls |
What it does:
- Searches recent financial news
- Filters for earnings and sentiment-related articles
- Returns up to 5 relevant articles
Usage in code:
from src.tools import news_search_tool
# Get sentiment-relevant news
results = news_search_tool.get_sentiment_news(ticker="GOOGL")
headlines = [r['title'] for r in results]What it does:
- Downloads official SEC filings
- Focuses on 8-K (material events), 10-Q (quarterly), 10-K (annual)
- Caches filings locally
Usage in code:
from src.tools import sec_filing_tool
# Download recent filings
results = sec_filing_tool.download_recent_filings(
ticker="GOOGL",
filing_types=["8-K", "10-Q"],
limit=3
)What it does:
- Retrieves historical stock data
- Calculates volatility metrics
- Gets price movement statistics
Usage in code:
from src.tools import market_data_tool
# Get historical volatility
volatility = market_data_tool.get_historical_volatility(
ticker="GOOGL",
period="1mo"
)
# Get price movement
movement = market_data_tool.get_price_movement(
ticker="GOOGL",
period="1mo"
)FinSight implements runtime metacognitive guardrails powered by the self-model and LLM-as-Judge patterns. Unlike static rule-based systems, these guardrails involve the AI reasoning about its own capabilities and confidence during execution.
- Mechanism: Each agent self-scores confidence in its findings (0-1)
- Check: Runtime comparison against agent-specific thresholds
- Action: Below-threshold results trigger warnings and are flagged in reports
- Implementation: Confidence scores embedded in structured outputs (Pydantic models)
Example Flow:
# Agent generates result with self-assessment
result = SentimentAnalysisResult(
overall_sentiment="positive",
sentiment_score=0.72,
confidence=0.68, # LLM judges itself at 68% confident
...
)
# Runtime guardrail check
if result.confidence < threshold (0.65):
record_guardrail_violation(...)- Mechanism: All findings must cite external validation sources
- Check:
tool_validationsfield must be populated - Action: Reports show which tools validated each finding
- Transparency: Clear audit trail in every report
- Mechanism: System self-model defines operating boundaries
- Check: Coordinator is aware of this boundary during planning
- Action: Explicitly avoids stock recommendations in all outputs
- Purpose: Legal/ethical compliance, educational use only
- Mechanism: Each agent declares its limitations in the self-model
- Check: Limitations included in
AgentCapabilitydefinitions - Action: All reports include disclaimers and confidence levels
- Purpose: User awareness of system boundaries
| Agent | Threshold | Purpose | Defined In |
|---|---|---|---|
| Sentiment Analysis | ≥65% | Minimum for sentiment claims | AgentCapability.confidence_threshold |
| Event Detection | ≥70% | Minimum for event verification | AgentCapability.confidence_threshold |
| Volatility Prediction | ≥60% | Minimum for volatility predictions | AgentCapability.confidence_threshold |
These thresholds are part of the self-model and can be adjusted based on use case risk tolerance.
These boundaries are encoded in the system's self-model and inform agent behavior:
✗ NO personalized investment advice
✗ NO stock buy/sell recommendations
✗ NO guarantees about future performance
✓ Educational and analytical purposes ONLY
✓ Must disclose all confidence levels
✓ Must cite all sources
All guardrail checks are actively monitored during execution and reported in the final report.
When confidence falls below threshold:
GuardrailViolation(
timestamp: "2025-01-15 14:32:10",
agent: "Event Detection Agent",
guardrail_type: "Confidence Threshold",
description: "Confidence 68% below threshold 70%",
action_taken: "Flagged in report, included with warning"
)## 5. Guardrails and System Boundaries
**Guardrail Checks Performed:** 0
*All confidence thresholds met. No guardrail violations detected.*
**Active Guardrails:**
- Confidence threshold enforcement
- Source verification requirement
- Investment advice prohibition
- Transparent limitation disclosure
**Operating Boundaries:**
- NO personalized investment advice or stock recommendations
- NO guarantees about future stock performance
- All outputs are for educational and analytical purposes only
- Must disclose confidence levels and limitations## 5. Guardrails and System Boundaries
**Guardrail Checks Performed:** 1
- **Confidence Threshold** (Event Detection Agent)
- Confidence 68% fell below required threshold of 70%
- Action: Flagged in report with low-confidence warning
**Active Guardrails:**
- Confidence threshold enforcement ← *TRIGGERED*
- Source verification requirement
- Investment advice prohibition
- Transparent limitation disclosureEvery final report includes a table showing how each agent scored itself:
## 6. System Confidence Summary
| Agent | Confidence | Threshold | Status |
|-------|-----------|-----------|--------|
| Sentiment Analysis | 72% | 65% | ✓ Pass |
| Event Detection | 68% | 70% | ⚠ Low |
| Volatility Prediction | 75% | 60% | ✓ Pass |This transparency allows users to assess the reliability of each component independently.
Edit src/models.py to customize questions:
DEFAULT_ANALYSIS_QUESTIONS = [
AnalysisQuestion(
category="Financial Performance",
focus_item="Revenue Growth",
question="What was the year-over-year revenue growth rate?",
priority="high"
),
AnalysisQuestion(
category="Strategic Initiatives",
focus_item="Key Projects",
question="What are the key projects or initiatives?",
priority="high"
),
# Add your custom questions here
AnalysisQuestion(
category="Your Category",
focus_item="Your Focus",
question="Your specific question?",
priority="high"
),
]Edit src/models.py:
AgentCapability(
agent_name="Sentiment Analysis Agent",
capabilities=[...],
limitations=[...],
confidence_threshold=0.65 # Adjust as needed
)Edit src/config.py:
@dataclass
class ScalewayConfig:
model: str = "qwen3-235b-a22b-instruct-2507"
temperature: float = 0.0 # Adjust for creativity
max_tokens: int = 2048 # Adjust for response lengthSpecify via CLI:
python -m src.main \
--transcript transcript.txt \
--ticker GOOGL \
--output ./custom_reportsOr programmatically:
from src.config import config
config.paths.output_dir = "./custom_reports"Problem: ModuleNotFoundError or import errors
Solution:
pip install -r requirements.txtVerify all packages are installed:
python -c "import pydantic, langchain_core, langgraph, yfinance; print('OK')"Problem: "Missing required environment variables"
Solution:
-
Ensure
.envfile exists:cp .env.example .env
-
Verify all required keys are set:
cat .env
-
Check for extra spaces or quotes in values
-
Test configuration:
python -c "from src.config import config; config.validate(); print('OK')"
Problem: "Error: Transcript file not found"
Solution:
-
Verify file path:
ls -l data/input/
-
Use absolute path:
python -m src.main --transcript /full/path/to/transcript.txt --ticker GOOGL
-
Ensure file is .txt format (not PDF or other)
Problem: API connection or authentication errors
Solution:
For Scaleway:
- Verify project ID and secret key are correct
- Check project has GenAI API enabled
- Ensure API quota not exceeded
For Tavily:
- Verify API key is valid
- Check rate limits (free tier: 1000 searches/month)
- Ensure internet connection
For yfinance:
- Usually no auth needed
- Check internet connection
- Try different ticker symbol
Test each API:
# Test Tavily
python -c "from langchain_community.tools.tavily_search import TavilySearchResults; tool = TavilySearchResults(api_key='YOUR_KEY'); print(tool.invoke('test'))"
# Test yfinance
python -c "import yfinance as yf; print(yf.Ticker('GOOGL').info['longName'])"Problem: System runs out of memory
Solution:
- Reduce
max_tokensinsrc/config.py - Process shorter transcript excerpts
- Close other applications
- Use a machine with more RAM
Problem: Analysis takes too long
Solution:
- Check internet connection
- Reduce number of analysis questions
- Use shorter transcripts for testing
- Consider local caching of API results
-
Create agent file in
src/agents/:# src/agents/my_agent.py from .base import BaseAgent from ..models import MyAgentResult class MyAgent(BaseAgent): def __init__(self): super().__init__("My Agent") def process(self, state): # Your agent logic here return {"my_result": result}
-
Add to
src/agents/__init__.py:from .my_agent import MyAgent __all__ = [..., "MyAgent"]
-
Define output model in
src/models.py:class MyAgentResult(BaseModel): field1: str field2: int confidence: float
-
Update orchestrator in
src/orchestrator.py:self.my_agent = MyAgent() workflow.add_node("my_agent", self._my_agent_node) workflow.add_edge("previous_node", "my_agent")
-
Create tool class in
src/tools.py:class MyTool: def __init__(self, api_key: str): self.api_key = api_key def fetch_data(self, query: str): # Tool logic here pass
-
Initialize globally:
my_tool = MyTool(config.api.my_api_key)
-
Use in agents:
from ..tools import my_tool data = my_tool.fetch_data("query")
Currently, the system includes:
# Setup verification
python test_setup.py
# Interactive demo
python demo.py
# Quick test script
./run_test.shFuture: Unit tests and integration tests (TODO)
- Follow PEP 8
- Use type hints
- Document with docstrings
- Use Pydantic for data validation
from src.orchestrator import FinSightOrchestrator
orchestrator = FinSightOrchestrator()
result = orchestrator.run_analysis(
transcript_path="data/input/Alphabet_2025_Q1_Earnings_Call_complete_transcript.txt",
ticker="GOOGL"
)from src.orchestrator import FinSightOrchestrator
orchestrator = FinSightOrchestrator()
result = orchestrator.run_analysis(
transcript_path="data/input/transcript.txt",
ticker="GOOGL",
user_query="Focus on AI initiatives and their impact on revenue growth"
)from src.orchestrator import FinSightOrchestrator
orchestrator = FinSightOrchestrator()
result = orchestrator.run_analysis(
transcript_path="data/input/transcript.txt",
ticker="GOOGL"
)
# Access sentiment
if result.get('sentiment_result'):
sent = result['sentiment_result']
print(f"Sentiment: {sent.overall_sentiment}")
print(f"Score: {sent.sentiment_score:.2f}")
print(f"Confidence: {sent.confidence:.2%}")
print(f"News analyzed: {len(sent.news_headlines)}")
# Access events
if result.get('event_detection_result'):
events = result['event_detection_result']
print(f"\nEvents detected: {events.total_events_found}")
print(f"Verified: {events.verified_count}")
for event in events.events:
print(f"- {event.event_type}: {event.description[:50]}...")
# Access volatility
if result.get('volatility_result'):
vol = result['volatility_result']
print(f"\nPredicted volatility: {vol.predicted_volatility}")
print(f"Score: {vol.volatility_score:.2f}")
print(f"Historical: {vol.historical_volatility:.2%}")# Basic
python -m src.main -t data/input/transcript.txt -s GOOGL
# Custom query
python -m src.main \
-t data/input/transcript.txt \
-s AAPL \
-q "Analyze product announcements and their market impact"
# Custom output directory
python -m src.main \
-t data/input/transcript.txt \
-s MSFT \
-o ./quarterly_reports/Q1_2025
# Help
python -m src.main --help- pydantic (≥2.12.3) - Data validation and modeling
- langchain-core (≥1.0.2) - LLM orchestration framework
- langchain-openai (≥1.0.1) - OpenAI-compatible API client
- langchain-community (≥0.4.1) - Community tools (Tavily)
- langgraph (≥0.2.0) - Multi-agent workflow graphs
- yfinance (≥0.2.66) - Market data retrieval
- sec-edgar-downloader (≥5.0.3) - SEC filing access
- python-dotenv (≥1.0.0) - Environment configuration
- typing-extensions (≥4.9.0) - Type hints
See requirements.txt for complete list:
cat requirements.txtpip install -r requirements.txtOr with UV:
uv pip install -e .Apache License 2.0 - See LICENSE file for details
See CONTRIBUTING.md for guidelines on how to contribute to this project.
This system does NOT constitute:
- Investment advice
- Financial recommendations
- Guarantees about future stock performance
- Professional financial guidance
Key Points:
- All investments involve risk, including possible loss of principal
- Past performance does not guarantee future results
- The system's predictions and analyses should not be the sole basis for investment decisions
- Consult with qualified financial professionals before making any investment decisions
- The creators and contributors are not liable for any financial losses
Data Sources:
- External APIs (Tavily, SEC EDGAR, yfinance) may have limitations or delays
- Information accuracy depends on source data quality
- Always verify critical information with official sources
For issues, questions, or feature requests:
- Check this README for solutions
- Run
python test_setup.pyto verify configuration - Try
python demo.pyfor a working example - Review
examples.pyfor usage patterns - Open an issue on GitHub
Built using:
- Scaleway GenAI for LLM inference
- LangChain and LangGraph for orchestration
- Tavily for news search
- SEC EDGAR for official filings
- yfinance for market data
FinSight Agent v1.0 - Metacognitive Multi-Agent Financial Analysis System
Generated with 🤖 precision and 📊 financial insights
