A Retrieval-Augmented Generation (RAG) application that processes PDF documents and answers questions using OpenAI's API.
- Docker installed
- OpenAI API key
For easier management, use the provided Makefile:
-
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-api-key-here"
-
Build and run:
make quick-start
-
Or build separately:
make build make run
-
Run interactively:
make run-interactive
If you want to run locally without Docker:
export OPENAI_API_KEY="your-openai-api-key-here"
go run main.goAfter making code changes:
make build
make runThe application uses the following environment variables:
OPENAI_API_KEY: Your OpenAI API key (required)
- PDF text extraction using embedded PDF file
- Text chunking for optimal processing
- Vector embeddings using OpenAI's text-embedding-ada-002
- Semantic search with cosine similarity
- Question answering using GPT-4
-
Missing API Key Error:
Missing OPENAI_API_KEY env varSolution: Ensure your OpenAI API key is set as an environment variable.
-
PDF Text Corruption: If you see "heavily corrupted text" messages, the PDF extraction may have issues. Check the debug output for text quality.
-
Memory Issues: For large PDFs, you may need to increase Docker memory limits or optimize chunk sizes.
The application includes debug output that shows:
- Extracted text samples
- Number of chunks created
- Retrieved chunks for each query
Run make help to see all available commands:
build Build the Docker image
run Run the container with Docker
run-interactive Run the container interactively
run-detached Run the container in detached mode
dev Run the application locally (requires Go)
test Run tests
clean Remove the Docker image
quick-start Build and run the application quickly
PDF → Text Extraction → Chunking → Embeddings → Vector Store → Similarity Search → LLM → Answer
MIT License