Users often work with large PDF documents such as research papers, legal files, reports, and manuals.
Finding specific information inside these documents is time-consuming and inefficient because traditional search tools rely only on keyword matching and do not understand context.
There is a need for an intelligent system that can:
- Understand document content
- Answer user questions in natural language
- Provide accurate, context-based responses
This project is an AI-Based Document Retrieval and Question Answering System that allows users to upload PDF documents and ask questions related to the document.
The system uses AI and Natural Language Processing (NLP) to understand the document and return precise answers based only on the document content.
AI-Based Document Retrieval uses machine learning models to understand the meaning of text instead of searching for exact keywords.
It converts document text into vector embeddings, enabling semantic search and intelligent question answering.
- User uploads a PDF document
- Text is extracted from the PDF
- Text is split into smaller chunks
- Each chunk is converted into vector embeddings
- Embeddings are stored in a vector database
- User asks a question
- Relevant document sections are retrieved
- AI model generates an answer using document context only
- Frontend: Streamlit
- LLM: Meta LLaMA 3.2 (1B Instruct)
- Embeddings: Sentence Transformers (MiniLM)
- Vector Database: ChromaDB
- Framework: LangChain
- PDF Processing: PyPDF2
- Upload PDF documents
- Chat-based question answering
- Context-aware responses
- Prevents AI hallucination
- Simple and interactive UI
- Student study and exam preparation
- Legal and policy document analysis
- Research paper understanding
- Corporate document review
- OCR support for scanned PDFs
- Multi-document support
- Answer citation with page numbers
- Cloud deployment
Team Name: Celestial Coders
Project Type: AI / NLP / LLM-Based Application
- Clone the repository
- Install required dependencies
- Add your Hugging Face API token in
.env - Run the Streamlit application
streamlit run app.py