Intelligent document navigator using Generative AI on AWS to provide fast, accurate answers from a private knowledge base with API key authentication.
Experience real-time RAG technology with our live demonstration powered by AWS Bedrock and optimized vector search. Demo requires API key - contact @andres-fmc for access.
Organizations accumulate vast amounts of technical documentation, policy documents, and knowledge bases that become increasingly difficult to navigate as they grow. Finding specific information across hundreds of PDFs is time-consuming and often results in incomplete answers or missed critical details. Traditional search tools lack context understanding and fail to synthesize information from multiple sources.
A serverless RAG (Retrieval-Augmented Generation) system with API key protection that transforms static document collections into an intelligent knowledge navigator:
- Upload PDF documents to build your knowledge base
- Authenticate with secure API key
- Ask questions in natural language
- Receive contextual answers with source citations in seconds
The system understands context, synthesizes information from multiple sources, and provides accurate answers backed by specific document references while maintaining secure access control.
graph TB
A[📄 PDF Documents] -->|Index Building| B[Document Processor]
B --> C[Text Chunking]
C --> D[Amazon Titan Embeddings]
D --> E[Compressed Vector Index]
E -->|Store| F[S3 Bucket]
G[User Query] --> H[🔐 API Key Validation]
H --> I[API Gateway]
I --> J[Lambda Function]
J -->|Load Index| F
J -->|Generate Query Embedding| K[Bedrock - Titan]
J -->|Vector Search| L[Cosine Similarity]
L -->|Relevant Chunks| M[Context Assembly]
M --> N[Bedrock - Claude 3 Sonnet]
N -->|Contextual Answer| O[Web Interface]
| Service | Purpose | Configuration |
|---|---|---|
| S3 | Vector index storage | Compressed JSON (32MB vs 99MB FAISS) |
| Lambda | Query processing | Python 3.12, 1GB RAM, 60s timeout |
| API Gateway | REST endpoint | CORS enabled, API key validation |
| Bedrock Titan | Text embeddings | amazon.titan-embed-text-v1 (1536 dims) |
| Bedrock Claude | Answer generation | Claude 3 Sonnet for context synthesis |
| Authentication | API key protection | Environment variable validation |
- Lightning Fast: <5 second response times with optimized vector search
- Secure Access: API key authentication with rate limiting
- Cost Effective: 95% smaller index than traditional FAISS solutions
- Accurate Citations: Every answer includes source document references
- Scalable: Processes 8,211 document chunks efficiently
- Protected: Private knowledge base with controlled access
- EU Compliant: Deployed in eu-central-1 region
- AWS Account with Bedrock access
- Python 3.12+
- AWS CLI configured
- PDF documents for your knowledge base
git clone https://github.com/andresafmc/rag-documentation-navigator.git
cd rag-documentation-navigatorpython3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtCreate .env file:
S3_BUCKET_NAME=your-index-bucket-name# Create data directory and add your PDFs
mkdir data
# Copy your PDF files to ./data/
cp /path/to/your/documents/*.pdf ./data/python build_index.pyThis process will:
- Extract text from all PDFs in
./data/ - Split content into optimized chunks (1000 chars, 100 overlap)
- Generate embeddings using Amazon Titan
- Create compressed vector index
- Upload to S3 (from ~99MB FAISS to ~32MB compressed JSON)
- Index storage:
your-index-bucket-name - Frontend hosting:
your-frontend-bucket-name
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::your-index-bucket-name/*"
},
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": "*"
}
]
}Create Lambda Function:
- Go to AWS Lambda Console
- Create function
RAG-Doc-Navigator-Clean - Set runtime to Python 3.12
- Configure environment variables:
S3_BUCKET_NAME: your-index-bucket-nameVALID_API_KEY: your-chosen-api-key
- Upload function code from
lambda_function/app.py
Create API Gateway:
- Create REST API named
rag-documentation-api - Create resource
/query - Add POST method with Lambda proxy integration
- Enable CORS
- Deploy to
prodstage
# Update frontend with your API endpoint
aws s3 cp frontend/index.html s3://your-frontend-bucket/ --region eu-central-1All API requests require a valid API key in the header:
x-api-key: your-api-key-herePOST https://your-api-id.execute-api.eu-central-1.amazonaws.com/prod/query
{
"question": "What are the main benefits of RAG architecture?"
}{
"answer": "RAG architecture provides several key benefits: 1) Combines retrieval with generation for factual accuracy, 2) Reduces hallucinations by grounding responses in source documents, 3) Enables dynamic knowledge updates without retraining...",
"sources": ["rag-paper-neurips.pdf", "aws-bedrock-guide.pdf"],
"chunks_used": 5,
"model_used": "Claude 3 Sonnet (Clean)",
"status": "success"
}curl -X POST \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-d '{"question": "How does vector similarity search work in RAG systems?"}' \
https://your-api-url/prod/queryUnauthorized (401):
{
"error": "Unauthorized",
"message": "Valid API key required. Contact @andres-fmc for access."
}Rate Limited: Frontend implements client-side rate limiting (5 requests per session) to prevent abuse.
rag-documentation-navigator/
├── frontend/
│ └── index.html # Protected web interface
├── lambda_function/
│ └── app.py # Main Lambda function with auth
├── data/ # PDF documents (not in repo)
├── local_index/ # Generated index files
├── build_index.py # Index creation script
├── requirements.txt # Python dependencies
├── .env # Environment variables (not in repo)
├── LICENSE # MIT License
└── README.md # This file
Set API Key in Lambda:
- Go to Lambda function configuration
- Environment variables section
- Add
VALID_API_KEYwith your chosen key
Frontend Rate Limiting:
// In frontend/index.html
const MAX_REQUESTS_PER_SESSION = 5;# In build_index.py
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Optimal for Claude 3 context
chunk_overlap=100, # Maintains continuity
separators=["\n\n", "\n", ".", " ", ""]
)S3_BUCKET_NAME=your-index-bucket-name
VALID_API_KEY=your-secure-api-key| Metric | Value | Monitoring |
|---|---|---|
| Average response time | <5 seconds | Real-time tracking |
| Vector search accuracy | >95% relevance | Manual validation |
| Index compression ratio | 68% smaller than FAISS | Storage optimization |
| Concurrent queries | 1000 simultaneous | AWS Lambda scaling |
| Cost per query | $0.002-0.004 | Live cost tracking |
| Cold start time | <3 seconds | CloudWatch metrics |
- API Key Validation: All requests authenticated
- Rate Limiting: 5 queries per session (frontend)
- CORS Protection: Configured for specific origins
- Error Handling: No sensitive information in error messages
- Base cost: $0.002-0.004/query
- Breakdown:
- Embeddings: $0.0003/query (15%)
- Claude 3 inference: $0.002-0.003/query (80%)
- Infrastructure: <$0.0001/query (5%)
Examples:
- 100 queries/day: ~$0.30/day
- 1,000 queries/month: ~$3.00/month
- Enterprise usage (10k queries): ~$30/month
# Ensure PDFs are in ./data/ directory
python build_index.py# Test without API key (should fail)
curl -X POST https://your-api-url/prod/query \
-H 'Content-Type: application/json' \
-d '{"question": "test"}'
# Test with valid API key (should succeed)
curl -X POST https://your-api-url/prod/query \
-H 'Content-Type: application/json' \
-H 'x-api-key: your-api-key' \
-d '{"question": "What is RAG?"}'- 401 Unauthorized: Check API key is correctly set in Lambda environment variables
- Invalid API key: Verify the key matches exactly (case-sensitive)
- Missing header: Ensure
x-api-keyheader is included in requests
- Error: "No documents found": Verify PDFs are in
./data/directory - Embedding failures: Check AWS credentials and Bedrock access
- Memory errors: Reduce
chunk_sizeor process fewer documents
- Import errors: Verify all dependencies in
requirements.txt - Permission denied: Check IAM role has S3 and Bedrock permissions
- Timeout errors: Current configuration uses 60s timeout and 1GB RAM
- Slow responses: Check index is properly cached in Lambda
- Poor relevance: Adjust chunk size or overlap parameters
- CORS errors: Verify API Gateway CORS configuration
- Use strong, unique API keys
- Rotate keys regularly
- Monitor usage through CloudWatch logs
- Implement different keys for different use cases
- Configure API Gateway with proper CORS settings
- Use IAM roles with minimal required permissions
- Monitor failed authentication attempts
- Implement IP whitelisting if needed
# Add PDFs to data directory
cp new-documents/*.pdf ./data/
# Rebuild index
python build_index.py
# Deploy updated function if needed- PDF: Primary support with metadata extraction
- TXT: Plain text documents
- Future: DOCX, HTML support planned
- 8,211 document chunks indexed
- 32MB compressed index (vs 99MB traditional FAISS)
- Documents included: AWS Well-Architected Framework, RAG research papers, Amazon Bedrock documentation
This solution addresses critical enterprise challenges:
- Knowledge Discovery: Reduce document search time from hours to seconds
- Security: API key authentication prevents unauthorized access
- Compliance: Ensure answers are grounded in official documentation
- Cost Control: Rate limiting and authentication prevent abuse
- Scalability: Serverless architecture scales with demand
Why API Key Authentication?
- Simple to implement and manage
- Prevents unauthorized usage and cost escalation
- Allows usage tracking and analytics
- Portfolio demonstration with controlled access
Why Custom Vector Search over Managed Solutions?
- 95% cost reduction compared to managed vector databases
- Full control over indexing and retrieval algorithms
- Optimized for AWS Lambda cold starts
- Simple deployment without additional infrastructure
Why Compressed JSON over FAISS?
- 68% smaller storage footprint
- Faster Lambda cold starts (3s vs 8s)
- Easier debugging and inspection
- Native JSON parsing performance
Why Claude 3 Sonnet over GPT?
- Superior context synthesis from multiple sources
- Better instruction following for structured responses
- Reduced hallucination rates with source grounding
- Cost-effective for enterprise usage patterns
Contributions welcome! Please:
- Fork the project
- Create feature branch (
git checkout -b feature/NewCapability) - Add comprehensive tests
- Update documentation
- Submit pull request
- Follow AWS Well-Architected principles
- Maintain <5s response time targets
- Document all configuration changes
- Include cost impact analysis
- Test authentication flows
For live demonstration access:
- Contact @andres-fmc on LinkedIn
- Temporary API keys available for evaluation
- Full documentation and setup guidance provided
⭐ If you find this project useful, please consider giving it a star on GitHub!