Skip to content

vijayrajeshr/DocuSense-AI

Repository files navigation

DocuSense AI | Semantic Document Intelligence

NewUpdate : I have also created a local-llm based rag chatbot which is available in a seperate folder; It runs on Llama3.2 which I ran locally in my system. Feel free to check it out.

DocuSense AI Banner

Experience the future of document interaction.
DocuSense AI is a premium, high-accuracy semantic search engine that deconstructs PDF documents into a neural vector space, allowing you to query insights with conversational precision.


⚡ Core Capabilities

  • 🧠 Neural Semantic Analysis: Uses Sentence-Transformers to map document hierarchy into high-dimensional vector representations.
  • ⚡ FAISS Acceleration: Powered by Meta's FAISS for lightning-fast similarity retrieval across massive text corpora.
  • 🔍 Dual-Stage Re-ranking: Implements a Cross-Encoder (ms-marco-MiniLM-L-6-v2) to re-score hits, ensuring the top result is always the most contextually relevant.
  • ✨ Premium Glassmorphism UI: A state-of-the-art Streamlit interface featuring splash screens, data visualization, and micro-animations for an elite user experience.

🛠️ Technology Stack

Component Technology
Framework Streamlit (Python)
Vector Engine Meta FAISS (Intel-Optimized)
Embeddings Sentence-Transformers (all-MiniLM-L6-v2)
Re-ranker Cross-Encoder (MS-Marco)
Processing Regex Sanitization & Chunking

🚀 Quick Start

1. Prerequisites

Ensure you have Python 3.9+ installed and a virtual environment active.

2. Installation

# Clone the repository
git clone https://github.com/vijayrajeshr/DocuSense-AI.git
cd DocuSense-AI

# Install dependencies
pip install -r requirements.txt

3. Launching the Engine

streamlit run app.py

📜 Professional Workflow

  1. Neural Sync: Upon launch, the engine completes a 3-second system synchronization.
  2. Deconstruction: Upload any PDF. The engine performs lexical analysis and geometry extraction.
  3. Vectorization: Sentences are normalized and transformed into dense vectors.
  4. Querying: Input conversational questions. The engine performs a multi-stage search and provides "Verified Insights" with a confidence score.

Hi, I have also created a local-llm based rag chatbot which is available in a seperate folder; It runs on Llama3.2 which I ran locally in my system. Feel free to check it out.


📂 Architecture Overview

graph TD
    A[PDF Upload] --> B[Text Decomposition]
    B --> C[Neural Chunking]
    C --> D[Vector Embedding]
    D --> E[FAISS Indexing]
    E --> F[Semantic Search]
    F --> G[Cross-Encoder Re-ranking]
    G --> H[Verified Insight]
Loading

🛡️ Security & Privacy

DocuSense AI processes all data locally within your current session. No document text or vector weights are transmitted to external servers, ensuring 100% data sovereignty.


Built for precision. Designed for the elite.
DocuSense AI — Where documents find their voice.

About

Semantic Document Intelligence using Open Source tools and Models (like FAISS and Llama Models )

Resources

Stars

Watchers

Forks

Contributors

Languages