Problem: Current AIWB can't handle codebases > 50K lines of code Solution: Add RAG (Retrieval-Augmented Generation) with vector embeddings
pip3 install sentence-transformers chromadbCopy the implementation from CONTEXT_LIMITATION_SOLUTION_ANALYSIS.md section 4.
cd /path/to/large/project
aiwb rag-index
# Takes 5-10 minutes for 100K LOC, one-time operationaiwb rq "How does authentication work?"
# Returns answer in < 1 second
# Cost: $0.0004 per queryUser Query
│
▼
Vector Search (local, free, <100ms)
│
▼
Find Top 5 Relevant Code Chunks (2K tokens)
│
▼
Send to Gemini 2.5 Flash ($0.0004)
│
▼
Answer
| Codebase Size | Setup Time | Query Time | Cost/Query |
|---|---|---|---|
| 10K LOC | 1 min | 0.5 sec | $0.0004 |
| 100K LOC | 10 min | 1 sec | $0.0004 |
| 1M LOC | 2 hours | 1 sec | $0.0004 |
Key insight: Query cost and time are constant regardless of codebase size!
100K LOC = 125K tokens
Single query to Claude 3.5 Sonnet
Cost: $0.375 (input) + $0.03 (output) = $0.41
Result: FAILS (exceeds context window)
100K LOC indexed once (free, local embeddings)
Each query retrieves ~5 chunks (2.5K tokens)
Cost: $0.00025 (input) + $0.00015 (output) = $0.0004
Result: ✓ Works perfectly
Savings: ~1,000× cheaper per query
✓ Use RAG for:
- Large codebases (> 10K LOC)
- Frequent queries on same codebase
- Interactive exploration ("How does X work?")
- Code search + understanding
✗ Don't use RAG for:
- Small scripts (< 1K LOC) - just send the whole thing
- One-off analysis - not worth indexing
- Real-time code generation - use current modes
- Add
lib/rag.shwith functions above - Update
aiwbmain dispatcher with new commands - Add
setup_rag()to initial setup wizard - Document in README.md
- Add to
/helpcommand output - Create unit tests
Estimated implementation time: 1 week Estimated value: Enables AIWB to handle enterprise-scale projects
# Combine RAG with existing modes
# Mode 1: RAG-enhanced /make
aiwb make --with-rag
# Automatically includes relevant context from entire codebase
# Mode 2: RAG-enhanced /debug
aiwb debug --with-rag error.log
# Finds similar code and known issues
# Mode 3: RAG + Parallel
aiwb parallel-analyze ./src --use-rag
# Processes in parallel, stores in vector DBQ: Do I need to re-index after every code change? A: Only for major changes. Minor edits don't affect most queries.
Q: Can I use this without internet? A: Embeddings are local. Only the final LLM call needs internet.
Q: How much disk space does it use? A: ~300 bytes per code chunk. 100K LOC ≈ 10K chunks ≈ 3MB.
Q: What if my query needs the FULL codebase? A: Fall back to hierarchical summarization or Map-Reduce (see main doc).
Q: Can I use different embedding models? A: Yes! Try:
all-MiniLM-L6-v2: 384-dim, 80MB, fast (default)all-mpnet-base-v2: 768-dim, 420MB, more accuratetext-embedding-ada-002: OpenAI API (costs money)
# Clone a large open-source project
git clone https://github.com/torvalds/linux.git
cd linux
# Initialize AIWB workspace
aiwb init
# Enable RAG (one-time)
aiwb rag-setup
# Index the kernel (~2M LOC, will take ~4 hours)
aiwb rag-index
# Now you can query instantly:
aiwb rq "How does the process scheduler work?"
aiwb rq "Find all uses of spinlocks"
aiwb rq "Explain memory management in kernel/mm/"
aiwb rq "What are common security vulnerabilities?"
# All queries: < 1 second, < $0.001 each
# Total cost for 1000 queries: < $1- Read full analysis:
CONTEXT_LIMITATION_SOLUTION_ANALYSIS.md - Review architecture diagrams:
CONTEXT_FLOW_DIAGRAM.md - Implement RAG in your AIWB fork
- Test on your largest codebase
- Share results!
Bottom line: RAG makes AIWB scale from toy projects to enterprise systems. It's not optional for v2.1+.