Chat with your PDFs like theyβre alive β upload lecture notes, textbooks, resumes, or any PDF, and ask questions in natural language. The app retrieves context from your documents and combines it with Geminiβs reasoning to answer clearly.
- π Upload and process multiple PDFs.
- βοΈ Smart text chunking for long documents.
- π§ Vector embeddings with FAISS for semantic search.
- π€ Google Gemini integration for natural QnA and summarization.
- π Document summaries generated automatically.
- π¬ Conversation memory to keep context from past questions.
- π₯ Export your entire chat history + document summaries as JSON.
- Frontend/UI: Streamlit
- Document parsing: PyPDF2
- Chunking:
RecursiveCharacterTextSplitter(LangChain) - Embeddings: SpaCy (
en_core_web_sm) - Vector DB: FAISS
- LLM: Google Gemini (
gemini-1.5-flash) - Config & Env: python-dotenv
- Persistence: JSON export
βββββββββββββββββ
β PDF(s) β
βββββββββ¬ββββββββ
β Extract (PyPDF2)
βΌ
ββββββββββββββββββββββ
β Text Splitter β
β (chunk_size=1000) β
βββββββββββ¬βββββββββββ
β
βΌ
ββββββββββββββββββββββ
β Embeddings (SpaCy) β
βββββββββββ¬βββββββββββ
β
βΌ
ββββββββββββββββββββββ
β Vector DB (FAISS) β
βββββββββββ¬βββββββββββ
β Retrieval (top-k)
βΌ
βββββββββββββββββββββββββββ
β Gemini LLM β
β - Uses context chunks β
β - Falls back to GK if β
β no match found β
βββββββββββ¬βββββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β Chat UI (Streamlit)β
β + conversation mem β
ββββββββββββββββββββββ
- Upload PDF β Extract text β Chunk it β Vectorize β Store in FAISS
- Ask question β Vectorize β Search in FAISS β Get context β Gemini generates answer
- If not found β fallback to general AI knowledge
- Chat + Export feature
-
Clone repo
git clone https://github.com/your-repo/pdf-qna-bot.git cd pdf-qna-bot -
Create virtual environment
python -m venv venv source venv/bin/activate # Mac/Linux venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Download SpaCy model
python -m spacy download en_core_web_sm
-
Set environment variable Create a
.envfile in project root:GEMINI_API_KEY=your_api_key_here -
Run app
streamlit run app.py
-
Upload one or more PDFs from the sidebar.
-
Click "Process Documents" β text is split, embedded, and stored in FAISS.
-
Ask any question in the chat box.
- If answer is in PDF β retrieved and answered using Gemini.
- If not found β Gemini provides a fallback general knowledge answer.
-
See summaries per document in the sidebar.
-
Export chat + summaries as JSON.
Made with β€οΈβπ₯ in AI domain.