FinSight AI is a local-first financial decision engine built using a Streamlit interface.
It is not a simple chatbot. Instead, it intelligently routes user queries through:
- Hybrid document retrieval (RAG)
- Multimodal reasoning (text + images)
- Structured LLM reasoning (Gemini)
- Automatic web grounding (for live data)
- A verification layer (VeriFi) to ensure reliability
The system is designed for analysis and understanding, not for executing trades or providing licensed financial advice.
Instead of answering directly using an LLM, FinSight AI follows a structured pipeline:
User Query
→ Router (classify intent)
→ Retrieval / Multimodal / Web / Educational path
→ Gemini reasoning
→ VeriFi verification
→ Final Answer
This ensures that answers are:
- grounded in data
- logically structured
- verifiable
Every user query is classified into a specific type before processing:
compute_onlyeducational_answerretrieve_then_answerretrieve_then_compute_then_answermultimodal_reasoningweb_grounded_answerabstain
This prevents generic LLM responses and improves accuracy.
FinSight AI uses two retrieval systems together:
- Uses embeddings
- Powered by Qdrant
- Captures semantic meaning
- Captures exact keywords
- Handles finance-specific terminology
- Combines both results into a final ranked set
Before sending data to Gemini:
- duplicate chunks are removed
- relevant chunks are prioritized
- table data is boosted for numerical queries
- irrelevant content is filtered
This improves answer precision.
The system supports:
- PDFs
- Tables
- Images
- Charts
- Screenshots
For image-based queries:
Image + Query + Context → Gemini → Explanation
Example:
- “Explain this portfolio screenshot”
- “What trend is visible in this chart?”
For live or current queries, the system automatically uses Gemini search grounding.
Triggered when query contains:
- current / latest / today / recent
- stock prices
- exchange rates
- market conditions
- interest rates
Example:
"What is the current USD to INR rate?"
Gemini does not return free-form text.
Instead, it returns structured JSON:
{
"answer": "...",
"used_citation_ids": ["chunk-id"],
"claims": [],
"assumptions": [],
"confidence": 0.0,
"needs_more_data": false
}This allows downstream validation.
VeriFi ensures answer quality by checking:
- claims are supported by retrieved context
- citations are valid
- missing data is handled properly
- confidence level is sufficient
If verification fails:
"Insufficient data to answer reliably."
User
↓
Streamlit UI
↓
Router
↓
----------------------------------------------
| Retrieval | Multimodal | Web | Educational |
----------------------------------------------
↓
Gemini Structured Answer
↓
VeriFi
↓
Final Answer
PDF / Image
→ Extract (text, tables, images)
→ Chunking
→ Embedding
→ Store in Qdrant
→ Build BM25 index
Each chunk includes:
- content
- type (text / table / image)
- page number
- section metadata
app.py Main Streamlit app
config.py Configuration
schemas.py Data schemas
ingestion/
extractor.py Document parsing
chunker.py Chunk creation
retrieval/
hybrid.py Hybrid RAG implementation
router/
intent_router.py Query classification
verifier/
verifi.py Validation layer
data/
uploads/originals/ Uploaded documents
index/qdrant/ Vector DB
index/chunks.json Retrieval catalog
Tell me about SEC filings
How do I calculate retirement corpus?
What risks are discussed in the Apple filing?
Summarize the brokerage statement
Explain this portfolio screenshot
What does this chart show?
What is the latest Fed interest rate?
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txtCreate .env:
GEMINI_API_KEY=your_key_here
GEMINI_TEXT_MODEL=gemini-3-flash-preview
GEMINI_EMBEDDING_MODEL=gemini-embedding-001
GEMINI_WEB_GROUNDING_MODEL=gemini-2.5-flashstreamlit run app.pyApp runs at:
http://localhost:8510
python index_uploads.py --folder data/uploads/originals --embedding-provider local_hashpython eval_retrieval.py --top-k 5- Router-first architecture
- Hybrid RAG (Qdrant + BM25)
- RRF fusion strategy
- Context-aware chunk selection
- Structured LLM output
- Multimodal reasoning
- Automatic web grounding
- Verification layer (VeriFi)
- Quota-resilient design