📚 LangChain Document Q&A System

An intelligent containerized document question-answering system powered by LangChain

Overview

Transform your documents into an interactive knowledge base! This application allows you to upload PDF or text documents and ask intelligent questions about their content using state-of-the-art AI models. All in a docker container!

Architecture

graph TD
    A[Document Upload] --> B[Document Processor]
    B --> C[Text Splitting]
    C --> D[Bedrock Embeddings]
    D --> E[ChromaDB Vector Store]
    E --> F[Retrieval QA System]
    F --> G[Claude Sonnet Response]
    G --> H[Streamlit UI]

📦 Installation

Build Docker Image
```
docker build -t pdf-chat .
```
Run Docker Container
```
docker compose up
```
Open your browser Navigate to http://localhost:8501

Configuration

Model Configuration

The system uses:

LLM: us.anthropic.claude-sonnet-4-20250514-v1:0
Embeddings: amazon.titan-embed-text-v1
Vector Store: ChromaDB with persistent storage

Usage

1. Upload Document 📄

Click "Upload a document"
Select your PDF or TXT file
Wait for processing to complete

2. Ask Questions 💭

Type your question in the text input
Get comprehensive answers (250+ words)
View source context and references

Project Structure

pdf-chat/
├── main.py                 # Streamlit application entry point
├── qa_system.py           # Q&A system implementation
├── document_loader.py     # Document processing utilities
├── requirements.txt       # Python dependencies
├── refdocs/              # Uploaded documents storage
├── chroma_db/            # Vector database storage

Technical Details

Document Processing Pipeline

Loading: PyPDFLoader for PDFs, TextLoader for text files
Chunking: RecursiveCharacterTextSplitter
Embedding: AWS Bedrock Titan embeddings
Storage: ChromaDB vector database with persistence

Q&A System

Retrieval: Similarity search with top-k=3 results
Generation: Claude Sonnet with custom prompt template
Output: Detailed 250+ word responses with context

Customization

Modify Chunk Size

# In document_loader.py
self.text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,  # Increase for larger chunks
    chunk_overlap=300
)

Adjust Model Parameters

# In qa_system.py
self.llm = ChatBedrock(
    model="us.anthropic.claude-sonnet-4-20250514-v1:0",
    model_kwargs={
        "max_tokens": 2000,    # Increase for longer responses
        "temperature": 0.3,    # Adjust creativity (0-1)
        "top_p": 0.9
    }
)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LangChain - For the amazing framework
AWS Bedrock - For powerful AI models
Streamlit - For the UI framework
ChromaDB - For efficient vector storage
Docker - For containerization

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
refdocs		refdocs
.dockerignore		.dockerignore
.envexample		.envexample
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
document_loader.py		document_loader.py
main.py		main.py
qa_system.py		qa_system.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 LangChain Document Q&A System

Overview

Architecture

📦 Installation

Configuration

Model Configuration

Usage

1. Upload Document 📄

2. Ask Questions 💭

Project Structure

Technical Details

Document Processing Pipeline

Q&A System

Customization

Modify Chunk Size

Adjust Model Parameters

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 LangChain Document Q&A System

Overview

Architecture

📦 Installation

Configuration

Model Configuration

Usage

1. Upload Document 📄

2. Ask Questions 💭

Project Structure

Technical Details

Document Processing Pipeline

Q&A System

Customization

Modify Chunk Size

Adjust Model Parameters

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages