Skip to content

soorajaryan007/langchain-pdf-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📚 PDF RAG Chatbot

A powerful PDF question-answering chatbot built with LangChain, Groq, and Gradio. Upload any PDF document and ask questions about its content using advanced RAG (Retrieval-Augmented Generation) technology.

Live APP HF Link

HuggingFace

✨ Features

  • 📄 Upload and process PDF documents
  • 🤖 Ask questions about PDF content using natural language
  • 🚀 Fast responses powered by Groq's Mixtral model
  • 🎯 Accurate answers using RAG with vector similarity search
  • 🖥️ Clean and intuitive web interface with Gradio
  • 🔍 MMR (Maximal Marginal Relevance) search for diverse results

🛠️ Technologies Used

  • LangChain: Framework for building LLM applications
  • Groq: Ultra-fast LLM inference
  • HuggingFace Embeddings: Sentence transformers for text embeddings
  • ChromaDB: Vector database for efficient similarity search
  • Gradio: Web UI framework
  • PyPDF: PDF processing

📋 Prerequisites

🚀 Installation

  1. Clone the repository
git clone <your-repo-url>
cd pdf-chatbot
  1. Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies
pip install gradio langchain-groq langchain-community langchain-core langchain-text-splitters chromadb pypdf sentence-transformers python-dotenv torch
  1. Set up environment variables

Create a .env file in the project root:

touch .env

Add your Groq API key to the .env file:

GROQ_API_KEY=gsk_your_actual_api_key_here
  1. Add .env to .gitignore (to keep your API key secure)
echo ".env" >> .gitignore

🎮 Usage

  1. Run the application
python app.py
  1. Open your browser

Navigate to http://localhost:7860

  1. Upload and Ask
    • Click on the file upload area and select a PDF
    • Type your question in the text box
    • Click Submit to get your answer

📁 Project Structure

pdf-chatbot/
├── .venv/              # Virtual environment
├── app.py              # Main application file
├── .env                # Environment variables (API keys)
├── .gitignore          # Git ignore file
└── README.md           # This file

🔑 Getting a Groq API Key

  1. Visit console.groq.com
  2. Sign up or log in
  3. Navigate to the API Keys section
  4. Create a new API key
  5. Copy the key and add it to your .env file

💡 How It Works

  1. Document Processing: The PDF is loaded and split into chunks
  2. Embeddings: Text chunks are converted to vector embeddings
  3. Vector Storage: Embeddings are stored in ChromaDB
  4. Query Processing: User questions are converted to embeddings
  5. Retrieval: Similar chunks are retrieved using MMR search
  6. Generation: Groq's Mixtral model generates answers based on retrieved context

⚙️ Configuration

You can modify these parameters in app.py:

  • Model: Change mixtral-8x7b-32768 to other Groq models
  • Temperature: Adjust creativity (0.0-1.0)
  • Max Tokens: Control response length
  • Chunk Size: Modify document splitting (default: 1000)
  • Chunk Overlap: Adjust context continuity (default: 200)
  • K value: Number of documents to retrieve (default: 5)

🐛 Troubleshooting

Issue: ModuleNotFoundError

pip install --upgrade pip
pip install -r requirements.txt

Issue: API Key Error

  • Ensure your .env file exists in the project root
  • Verify the API key is correct
  • Check that python-dotenv is installed

Issue: PDF Not Loading

  • Ensure the PDF is not password-protected
  • Check that the file is a valid PDF format
  • Try with a smaller PDF first

📝 Example Questions

  • "What is the main topic of this document?"
  • "Can you summarize the key points?"
  • "What does the document say about [specific topic]?"
  • "Who are the authors mentioned?"
  • "What are the conclusions?"

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

📞 Support

If you encounter any issues or have questions, please open an issue on GitHub.


Made with ❤️ using LangChain and Groq

About

RAG-powered chatbot to ask questions from PDF documents using Groq , LangChain, and Gradio.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages