A fully local, open-source hybrid RAG system for querying SEBI (Securities and Exchange Board of India) circulars and master circulars.
User Query
│
▼
┌──────────────┐
│ Query │ Intent classification · query expansion · HyDE (opt-in)
│ Understanding│
└──────┬───────┘
▼
┌──────────────┐
│ Hybrid │ BM25 sparse + MiniLM-L6-v2 dense → weighted sum / RRF fusion
│ Retrieval │
└──────┬───────┘
▼
┌──────────────┐
│ Cross-Encoder│ ms-marco-MiniLM-L-6-v2 reranking (top-20 → top-5)
│ Reranker │
└──────┬───────┘
▼
┌──────────────┐
│ Context │ Extractive sentence selection to reduce noise (opt-in)
│ Compression │
└──────┬───────┘
▼
┌──────────────┐
│ LLM (local) │ gemma3:12b via Ollama · streaming + timeout · grounded prompts
│ Generation │
└──────┬───────┘
▼
┌──────────────┐
│ Guardrails │ Citation verification · n-gram grounding check
│ & Tracing │ JSONL pipeline traces for observability
└──────┬───────┘
▼
Answer + Citations
- Hybrid retrieval: BM25 sparse + dense embeddings with configurable fusion (weighted sum or RRF)
- Cross-encoder reranking: Two-phase retrieval for high precision
- Query understanding: Intent classification, LLM-based expansion, HyDE support
- Post-generation guardrails: Citation verification and grounding checks
- Agentic multi-hop (opt-in): Plan → Route → Act → Verify → Synthesize for complex queries
- Context compression (opt-in): Extractive sentence selection to reduce noise
- Structured observability: JSONL pipeline traces with full latency breakdown
- Streaming output: Token-by-token CLI output for interactive use
- Offline-first: All models load from local cache when network is unavailable
- RAGAS-style evaluation: recall@k, MRR, citation accuracy, faithfulness proxy
- Python 3.11+
- A local LLM runtime (Ollama recommended) — or use
mockbackend for testing
git clone https://github.com/iAn-P1nt0/sebi_circular_rag.git
cd sebi_circular_rag
# Install dependencies
pip install -e ".[dev]"
# Or using uv (faster)
uv pip install -e ".[dev]"Edit config/settings.yaml or use environment variables:
# Use Ollama with gemma3
export SEBI_RAG_RAG_LLM_MODEL=gemma3:12b
export SEBI_RAG_RAG_LLM_BASE_URL=http://localhost:11434/v1
# Use mock LLM (no real LLM needed)
export SEBI_RAG_RAG_LLM_BACKEND=mockpython -m cli.ingest
python -m cli.ingest --max-pages 10 --verbose # Limit crawl for testing
python -m cli.ingest --skip-crawl # Use existing raw data# Full RAG answer
python -m cli.query "What are the disclosure requirements for RPTs under LODR?"
# With streaming output
python -m cli.query "What are margin requirements?" --stream
# With filters and chunk details
python -m cli.query "REIT distribution requirements" --domain Intermediaries --show-chunks
# Retrieval only (no LLM call)
python -m cli.query "Insider trading regulations" --retrieval-onlypython -m cli.eval_cli
python -m cli.eval_cli --questions path/to/my_questions.jsonl --verbosesebi_circular_rag/
├── config/
│ ├── __init__.py # Config loader (YAML + env overrides)
│ └── settings.yaml # Central configuration
├── crawler/
│ ├── sebi_crawler.py # Async rate-limited SEBI crawler
│ └── robots_check.py # robots.txt compliance
├── parser/
│ ├── sebi_parser.py # HTML/PDF → structured JSON
│ ├── html_extractor.py # HTML content extraction
│ └── pdf_extractor.py # PDF text extraction (PyMuPDF)
├── chunker/
│ └── sebi_chunker.py # Clause-aware chunking (200-400 tokens)
├── index/
│ ├── build_index.py # BM25 + FAISS/Annoy index builder
│ ├── retriever.py # Hybrid retrieval with score fusion
│ └── reranker.py # Cross-encoder reranking
├── query/
│ └── understanding.py # Intent classification + query expansion + HyDE
├── rag/
│ ├── qa_pipeline.py # Enhanced end-to-end RAG pipeline
│ ├── llm_client.py # LLM client (OpenAI-compat, streaming, mock)
│ ├── prompts.py # Prompt templates
│ ├── guardrails.py # Citation verification + grounding checks
│ └── compressor.py # Extractive context compression
├── agent/
│ └── orchestrator.py # Multi-hop agentic decomposition
├── observability/
│ └── tracer.py # JSONL pipeline tracing
├── utils/
│ └── model_loader.py # Offline-resilient model loading
├── eval/
│ ├── questions.jsonl # 40 evaluation questions
│ └── run_eval.py # Evaluation (recall@k, MRR, faithfulness, citation accuracy)
├── cli/
│ ├── ingest.py # Full ingest pipeline CLI
│ ├── query.py # Query CLI (with streaming)
│ └── eval_cli.py # Evaluation CLI
├── tests/
│ ├── test_parser.py # Parser tests
│ ├── test_chunker.py # Chunker tests
│ ├── test_retriever.py # Retriever tests
│ ├── test_qa_pipeline.py # Pipeline integration tests
│ ├── test_guardrails.py # Guardrails tests
│ ├── test_query_understanding.py # Query understanding tests
│ ├── test_compressor.py # Context compression tests
│ └── test_eval_metrics.py # Evaluation metrics tests
├── data/ # Generated data (gitignored)
│ ├── raw/ # Crawled HTML + PDFs
│ ├── processed/ # Parsed JSON per circular
│ ├── chunks/ # chunks.jsonl
│ ├── indexes/ # BM25 + FAISS serialized indexes
│ └── traces/ # Pipeline execution traces
├── pyproject.toml
├── README.md
├── ADR-001-SEBI-RAG-ARCHITECTURE.md
└── ADR-002-ARCHITECTURE-ENHANCEMENT-REVIEW.md
pytest tests/ -vSee ADR-001 for the original architecture and ADR-002 for the enhancement review.
- BM25 + dense hybrid: Legal text benefits from exact term matching combined with semantic understanding
- Cross-encoder reranking: Two-phase retrieval significantly improves precision over single-phase
- Small chunks (200-400 tokens): LegalBench-RAG findings show smaller chunks improve regulatory text retrieval
- Clause-aware splitting: Respects legal document structure with abbreviation-safe sentence handling
- Post-generation guardrails: Catch hallucinated citations and ungrounded claims before returning to user
- Pluggable LLM: Abstract client interface supporting Ollama, LM Studio, vLLM, or any OpenAI-compatible API
- Offline-first: Models load from local cache when network is unavailable (try online → fallback to local)