An intelligent containerized document question-answering system powered by LangChain
Transform your documents into an interactive knowledge base! This application allows you to upload PDF or text documents and ask intelligent questions about their content using state-of-the-art AI models. All in a docker container!
graph TD
A[Document Upload] --> B[Document Processor]
B --> C[Text Splitting]
C --> D[Bedrock Embeddings]
D --> E[ChromaDB Vector Store]
E --> F[Retrieval QA System]
F --> G[Claude Sonnet Response]
G --> H[Streamlit UI]
-
Build Docker Image
docker build -t pdf-chat . -
Run Docker Container
docker compose up
-
Open your browser Navigate to
http://localhost:8501
The system uses:
- LLM:
us.anthropic.claude-sonnet-4-20250514-v1:0 - Embeddings:
amazon.titan-embed-text-v1 - Vector Store: ChromaDB with persistent storage
- Click "Upload a document"
- Select your PDF or TXT file
- Wait for processing to complete
- Type your question in the text input
- Get comprehensive answers (250+ words)
- View source context and references
pdf-chat/
βββ main.py # Streamlit application entry point
βββ qa_system.py # Q&A system implementation
βββ document_loader.py # Document processing utilities
βββ requirements.txt # Python dependencies
βββ refdocs/ # Uploaded documents storage
βββ chroma_db/ # Vector database storage
- Loading: PyPDFLoader for PDFs, TextLoader for text files
- Chunking: RecursiveCharacterTextSplitter
- Embedding: AWS Bedrock Titan embeddings
- Storage: ChromaDB vector database with persistence
- Retrieval: Similarity search with top-k=3 results
- Generation: Claude Sonnet with custom prompt template
- Output: Detailed 250+ word responses with context
# In document_loader.py
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1500, # Increase for larger chunks
chunk_overlap=300
)# In qa_system.py
self.llm = ChatBedrock(
model="us.anthropic.claude-sonnet-4-20250514-v1:0",
model_kwargs={
"max_tokens": 2000, # Increase for longer responses
"temperature": 0.3, # Adjust creativity (0-1)
"top_p": 0.9
}
)This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain - For the amazing framework
- AWS Bedrock - For powerful AI models
- Streamlit - For the UI framework
- ChromaDB - For efficient vector storage
- Docker - For containerization