📚 PDF RAG Chatbot

A powerful PDF question-answering chatbot built with LangChain, Groq, and Gradio. Upload any PDF document and ask questions about its content using advanced RAG (Retrieval-Augmented Generation) technology.

Live APP HF Link

HuggingFace

✨ Features

📄 Upload and process PDF documents
🤖 Ask questions about PDF content using natural language
🚀 Fast responses powered by Groq's Mixtral model
🎯 Accurate answers using RAG with vector similarity search
🖥️ Clean and intuitive web interface with Gradio
🔍 MMR (Maximal Marginal Relevance) search for diverse results

🛠️ Technologies Used

LangChain: Framework for building LLM applications
Groq: Ultra-fast LLM inference
HuggingFace Embeddings: Sentence transformers for text embeddings
ChromaDB: Vector database for efficient similarity search
Gradio: Web UI framework
PyPDF: PDF processing

📋 Prerequisites

Python 3.8 or higher
Groq API key (get it from console.groq.com)

🚀 Installation

Clone the repository

git clone <your-repo-url>
cd pdf-chatbot

Create a virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies

pip install gradio langchain-groq langchain-community langchain-core langchain-text-splitters chromadb pypdf sentence-transformers python-dotenv torch

Set up environment variables

Create a .env file in the project root:

touch .env

Add your Groq API key to the .env file:

GROQ_API_KEY=gsk_your_actual_api_key_here

Add .env to .gitignore (to keep your API key secure)

echo ".env" >> .gitignore

🎮 Usage

Run the application

python app.py

Open your browser

Navigate to http://localhost:7860

Upload and Ask
- Click on the file upload area and select a PDF
- Type your question in the text box
- Click Submit to get your answer

📁 Project Structure

pdf-chatbot/
├── .venv/              # Virtual environment
├── app.py              # Main application file
├── .env                # Environment variables (API keys)
├── .gitignore          # Git ignore file
└── README.md           # This file

🔑 Getting a Groq API Key

Visit console.groq.com
Sign up or log in
Navigate to the API Keys section
Create a new API key
Copy the key and add it to your .env file

💡 How It Works

Document Processing: The PDF is loaded and split into chunks
Embeddings: Text chunks are converted to vector embeddings
Vector Storage: Embeddings are stored in ChromaDB
Query Processing: User questions are converted to embeddings
Retrieval: Similar chunks are retrieved using MMR search
Generation: Groq's Mixtral model generates answers based on retrieved context

⚙️ Configuration

You can modify these parameters in app.py:

Model: Change mixtral-8x7b-32768 to other Groq models
Temperature: Adjust creativity (0.0-1.0)
Max Tokens: Control response length
Chunk Size: Modify document splitting (default: 1000)
Chunk Overlap: Adjust context continuity (default: 200)
K value: Number of documents to retrieve (default: 5)

🐛 Troubleshooting

Issue: ModuleNotFoundError

pip install --upgrade pip
pip install -r requirements.txt

Issue: API Key Error

Ensure your .env file exists in the project root
Verify the API key is correct
Check that python-dotenv is installed

Issue: PDF Not Loading

Ensure the PDF is not password-protected
Check that the file is a valid PDF format
Try with a smaller PDF first

📝 Example Questions

"What is the main topic of this document?"
"Can you summarize the key points?"
"What does the document say about [specific topic]?"
"Who are the authors mentioned?"
"What are the conclusions?"

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

LangChain for the RAG framework
Groq for lightning-fast LLM inference
Gradio for the web interface
HuggingFace for embeddings models

📞 Support

If you encounter any issues or have questions, please open an issue on GitHub.

Made with ❤️ using LangChain and Groq

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
grpc		grpc
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 PDF RAG Chatbot

Live APP HF Link

✨ Features

🛠️ Technologies Used

📋 Prerequisites

🚀 Installation

🎮 Usage

📁 Project Structure

🔑 Getting a Groq API Key

💡 How It Works

⚙️ Configuration

🐛 Troubleshooting

📝 Example Questions

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 PDF RAG Chatbot

Live APP HF Link

✨ Features

🛠️ Technologies Used

📋 Prerequisites

🚀 Installation

🎮 Usage

📁 Project Structure

🔑 Getting a Groq API Key

💡 How It Works

⚙️ Configuration

🐛 Troubleshooting

📝 Example Questions

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages