This project is a Conversational Retrieval-Augmented Generation (RAG) chatbot built with Streamlit. It allows you to upload PDF files and chat with their content using a powerful LLM (Groq's Gemma2-9b-it) and HuggingFace embeddings. The app maintains chat history for context-aware conversations.
- PDF Upload: Upload one or more PDF files to use as the knowledge base.
- Conversational RAG: Ask questions about your PDFs and get concise, context-aware answers.
- Chat History: Maintains session-based chat history for more natural, contextual conversations.
- Groq LLM Integration: Uses Groq's Gemma2-9b-it model for high-quality responses.
- HuggingFace Embeddings: Uses
all-MiniLM-L6-v2for semantic search over your documents.
- Python 3.9+
- Streamlit
- Groq API Key
- HuggingFace API Key
- The following Python packages (see below)
-
Clone the repository:
git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
-
Create a
.envfile in the project root with your HuggingFace token:HF_TOKEN=your_huggingface_token_here
-
streamlit run app.py- Enter your Groq API key in the input box.
- (Optional) Enter a session ID to keep your chat history separate.
- Upload one or more PDF files using the file uploader.
- Ask questions about the content of your PDFs in the chat box.
- The assistant will answer using only the information from your uploaded PDFs, maintaining context from your previous questions.
.
├── app.py
├── requirements.txt
├── .env
└── README.md
Example requirements.txt:
streamlit
langchain
langchain-community
langchain-chroma
langchain-groq
langchain-huggingface
langchain-text-splitters
python-dotenv
- Your data and chat history are stored in memory and are not persisted after the app stops.
- Make sure your API keys are kept secure and never commit your
.envfile to public repositories.
MIT License
Built with ❤️ using Streamlit and LangChain.
