Transform documents into intelligent knowledge with AI-powered semantic search and graph visualization
Documind is a cutting-edge document intelligence platform that transforms your documents into an interactive, searchable knowledge base. Upload documents, extract insights, and interact using natural language queries powered by advanced AI technologies.
- π€ AI-Powered Q&A: Ask questions in natural language and get intelligent answers with source citations
- π Interactive Knowledge Graph: Visualize relationships between entities with advanced filtering and layout options
- π Semantic Search: Find relevant information using vector-based similarity search
- π Multi-Format Support: Process PDFs, Word documents, and text files seamlessly
- π Secure & Private: Complete user data isolation with enterprise-grade security
- β‘ Real-time Processing: Background document processing with live status updates
- ποΈ Smart Filtering: Customizable graph views with entity type filters and confidence thresholds
- π§ Resilient Architecture: Graceful error handling with fallback options for all services
Documind employs a sophisticated multi-database architecture designed for scalability and performance:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend Layer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Next.js 15 with App Router (React 19) β
β β’ TypeScript for type safety β
β β’ Tailwind CSS v4 for styling β
β β’ Radix UI components β
β β’ Clerk for authentication β
β β’ Cytoscape.js for graph visualization β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β API Routes
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend Layer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β’ Next.js API Routes β
β β’ Middleware for authentication β
β β’ AI Processing Pipeline (LangChain + OpenAI) β
β β’ File processing (PDF, Word, Text) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β---------------βββββββββ«ββββββββ--------------β
βΌ β βΌ
βββββββββββββββββββ βββββββββββ«ββββββββ βββββββββββββββββββ
β File Storage β β AI Services β β Databases β
β β β β β β
β β’ AWS S3 β β β’ OpenAI GPT β β β’ MongoDB β
β β’ Presigned β β β’ Embeddings β β β’ Qdrant β
β URLs β β β’ LangChain β β β’ Neo4j β
β β’ Secure β β β’ Text β β β’ Multi-DB β
β Storage β β Processing β β Architecture β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β MongoDB β β Qdrant β β Neo4j β
β β β β β β
β β’ Documents β β β’ Vector β β β’ Knowledge β
β metadata β β embeddings β β Graph β
β β’ User data β β β’ Semantic β β β’ Entities β
β β’ Processing β β search β β β’ Relations β
β status β β β’ Similarity β β β’ Topics β
β β’ File refs β β matching β β β’ Clusters β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Next.js 15: React framework with App Router
- TypeScript: Type-safe development
- Tailwind CSS v4: Modern utility-first styling
- Radix UI: Accessible component primitives
- React Hot Toast: User notifications
- Cytoscape.js: Interactive graph visualization
- Next.js API Routes: RESTful endpoints
- Clerk: Authentication and user management
- OpenAI: Embeddings and language model
- LangChain: AI orchestration framework
- Mammoth.js: Word document processing
- PDF-Parse: PDF text extraction
- Text Chunking: Intelligent content segmentation
- Entity Extraction: NER with relationship mapping
1. User Authentication (Clerk)
β
2. File Upload to S3
β
3. Background Processing:
β’ Text extraction
β’ AI analysis (OpenAI)
β’ Vector generation (Qdrant)
β’ Entity extraction (Neo4j)
β’ Metadata storage (MongoDB)
β
4. Real-time Status Updates
β
5. Interactive Features:
β’ Semantic search
β’ AI chat
β’ Graph visualization
β’ Document management
File Upload β Text Extraction β AI Processing β Multi-DB Storage
β β β β
β β β βββ Vector embeddings (Qdrant)
β β β Entity extraction (Neo4j)
β β β Metadata storage (MongoDB)
β β β
β β βββ LangChain + OpenAI processing
β β Topic modeling
β β Entity recognition
β β
β βββ PDF/Word/Text extraction
β Mammoth.js for Word docs
β pdf-parse for PDFs
β
βββ AWS S3 secure storage
Presigned URLs
Frontend Structure:
βββ Pages:
β βββ / (Landing page)
β βββ /dashboard (Main interface)
β βββ /chat (AI Q&A interface)
β βββ /graph (Knowledge graph visualization)
β βββ /sign-in & /sign-up (Authentication)
β
βββ Components:
β βββ ui/ (Radix UI components)
β βββ chat/ (Chat interface)
β βββ documents/ (File management)
β βββ graph/ (Cytoscape visualization)
β βββ layout/ (Navigation, headers)
β
βββ API Routes:
βββ /upload (File upload & processing)
βββ /documents (CRUD operations)
βββ /search (Semantic search)
βββ /chat (AI Q&A)
βββ /graph (Graph data & operations)
Before running Documind, ensure you have:
- Node.js (v18 or higher)
- npm or yarn package manager
- MongoDB instance (local or cloud)
- Qdrant vector database
- Neo4j graph database
- AWS S3 bucket
- OpenAI API key
- Clerk account for authentication
git clone https://github.com/yourusername/documind.git
cd documindnpm installCopy the example environment file and configure your services:
cp .env.example .env.localUpdate .env.local with your service credentials:
# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_your_key
CLERK_SECRET_KEY=sk_test_your_key
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/dashboard
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/dashboard
# MongoDB Configuration
MONGODB_URI=mongodb://localhost:27017/documind
MONGODB_DB_NAME=documind
# Qdrant Vector Database
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your_qdrant_api_key
# Neo4j Graph Database
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password
# AWS S3 Storage
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_aws_access_key_id
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
AWS_S3_BUCKET_NAME=your-bucket-name
# OpenAI API
OPENAI_API_KEY=sk-your_openai_key
OPENAI_MODEL=gpt-4o-mini
# Application Configuration
NEXT_PUBLIC_APP_URL=http://localhost:3000
MAX_FILE_SIZE_MB=10
MAX_CHUNK_SIZE=500
EMBEDDING_DIMENSIONS=1536Ensure all databases are running and accessible:
# Local MongoDB
mongod --dbpath /path/to/data/db
# Or use MongoDB Atlas (cloud)# Using Docker (recommended for development)
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Or use our provided Docker Compose setup
docker-compose -f docker-compose.qdrant.yml up -d
# Or use Qdrant Cloud# Using Docker
docker run --publish=7474:7474 --publish=7687:7687 --env NEO4J_AUTH=neo4j/your_password neo4j
# Or use Neo4j Aura (cloud)# Development mode
npm run dev
# Production build
npm run build
npm startThe application will be available at http://localhost:3000
documind/
βββ app/ # Next.js App Router
β βββ api/ # API routes
β β βββ chat/ # Q&A endpoint
β β βββ documents/ # Document management
β β βββ graph/ # Graph operations
β β βββ search/ # Search functionality
β β βββ upload/ # File upload
β βββ chat/ # Chat interface page
β βββ dashboard/ # Main dashboard
β βββ graph/ # Knowledge graph view
β βββ sign-in/ # Authentication pages
β βββ sign-up/
β βββ layout.tsx # Root layout
β βββ page.tsx # Landing page
βββ components/ # React components
β βββ chat/ # Chat interface components
β βββ documents/ # Document management
β βββ graph/ # Graph visualization
β βββ layout/ # Layout components
β βββ ui/ # Reusable UI components
βββ lib/ # Utilities and configurations
β βββ ai/ # AI processing modules
β β βββ chat.ts # Chat functionality
β β βββ embeddings.ts # Vector embeddings
β β βββ entities.ts # Entity extraction
β β βββ pipeline.ts # Processing pipeline
β β βββ processing.ts # Text processing
β βββ api/ # API utilities
β βββ db/ # Database connections
β β βββ mongodb.ts # MongoDB client
β β βββ neo4j.ts # Neo4j client
β β βββ qdrant.ts # Qdrant client
β βββ storage/ # File storage
βββ types/ # TypeScript definitions
βββ docker-compose.qdrant.yml # Local Qdrant Docker setup
βββ middleware.ts # Clerk middleware
βββ next.config.ts # Next.js configuration
- Authentication: Verify user via Clerk
- Storage: Save file to AWS S3
- Metadata: Create document record in MongoDB
- Queue: Initiate background processing
- Text Extraction: Extract content from PDF/DOCX/TXT
- Chunking: Split text into optimal segments (500 tokens)
- Embeddings: Generate vector representations using OpenAI
- Storage: Store vectors in Qdrant with user scoping
- Entity Extraction: Identify people, organizations, locations, dates using AI
- Relationship Mapping: Create co-occurrence and semantic similarity connections
- Quality Filtering: Filter relationships by confidence thresholds (>0.3 for co-occurrence, >0.5 for similarity)
- Cross-Document Resolution: Link same entities across different documents
- Graph Storage: Build optimized knowledge graph in Neo4j with proper indexing
- User Isolation: Ensure complete data privacy with user-scoped queries
- Real-time: Live processing status updates
- Error Handling: Comprehensive error reporting
- Completion: Automatic notification system
- Query Processing: Convert user query to vector embedding
- Vector Search: Find similar content in Qdrant (user-scoped)
- Context Retrieval: Gather related entities from Neo4j
- LLM Integration: Combine context with user query
- Response Generation: Provide answers with source citations
- Interactive Visualization: Cytoscape.js powered graphs with optimized layouts
- Smart Edge Rendering: Clean visualization with hover-to-reveal labels for reduced clutter
- Advanced Filtering: Filter by entity types, confidence thresholds, and relationship strengths
- Customizable Display: Toggle edge labels, adjust node limits, and control visual density
- Entity Relationships: Explore connections between people, organizations, locations, and concepts
- Document Mapping: Visualize how documents relate through shared entities and topics
- Graph Statistics: Real-time metrics showing nodes, edges, and entity distributions
- Clerk Integration: Secure sign-up/sign-in flows
- Session Management: Automatic token handling
- Route Protection: Middleware-based access control
- User Scoping: Complete isolation of user data
- Query Filtering: Automatic user-based filtering
- Access Control: Document ownership verification
- Encrypted Storage: Secure file storage in AWS S3
- API Security: Protected routes with authentication
- Error Handling: Safe error messages without data leakage
# Development server
npm run dev
# Production build
npm run build
# Start production server
npm start
# Linting
npm run lint
# Code formatting
npm run format- TypeScript: Full type safety across the application
- Biome: Modern linting and formatting
- Error Boundaries: Graceful error handling
- Loading States: Comprehensive loading indicators
- Database Services: Ensure all databases are accessible
- Environment Variables: Configure production credentials
- File Storage: Set up AWS S3 bucket
- Authentication: Configure Clerk for production
- Vercel: Optimal for Next.js applications
- Netlify: Alternative deployment option
- Railway: Full-stack deployment with databases
- AWS/GCP/Azure: Enterprise-grade hosting
- Environment variables configured
- Database connections tested
- File upload limits set
- Authentication flows verified
- Error monitoring enabled
- Performance optimization applied
If you encounter ENOTFOUND errors with Qdrant:
- Check Instance Status: Verify your Qdrant Cloud instance is running
- Use Local Fallback: Switch to local Docker setup:
# Start local Qdrant docker-compose -f docker-compose.qdrant.yml up -d # Update .env.local QDRANT_URL=http://localhost:6333 # Remove QDRANT_API_KEY for local instance
- Network Issues: Check firewall/VPN settings
- Graceful Degradation: The app continues working with limited functionality if Qdrant is unavailable
- No Connections Visible: Check that documents have been processed and entities extracted
- Cluttered Graph: Use the "Show Connection Labels" toggle to reduce visual noise
- Performance Issues: Reduce max nodes limit in the filters panel
- Layout Problems: Use graph controls (fit to view, center, reset zoom) to optimize display
- MongoDB: Ensure connection string is correct and database is accessible
- Neo4j: Verify bolt:// URL and credentials are valid
- AWS S3: Check access credentials and bucket permissions
# Start all services with Docker
docker-compose -f docker-compose.qdrant.yml up -d
# Install dependencies
npm install
# Run development server
npm run devThe application includes built-in connection testing and will provide clear error messages for misconfigured services.
- Efficient Indexing: Optimized Qdrant collections
- Batch Processing: Bulk operations for embeddings
- Caching Strategy: Smart result caching
- MongoDB Indexes: Optimized query performance
- Neo4j Optimization: Efficient graph traversal
- Connection Pooling: Managed database connections
- Next.js Optimization: Built-in performance features
- Component Optimization: Memoization and lazy loading
- Bundle Optimization: Efficient code splitting
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- TypeScript: Maintain type safety
- Testing: Add tests for new features
- Documentation: Update docs for changes
- Code Style: Follow established patterns
This project is licensed under the MIT License - see the LICENSE file for details.
- API Reference:
/docs/api - Component Library:
/docs/components - Deployment Guide:
/docs/deployment
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Join our community
For enterprise deployments and custom integrations, contact us at [email protected]
Built with β€οΈ by the Kanugu Rajesh
Website β’ Documentation β’ Blog