Hi! I have been exploring the Sugar-AI RAG pipeline and noticed that the current implementation retrieves only a single document before passing context to the language model.
This may reduce answer accuracy when the top result is not the most relevant chunk.
I experimented with a small modification that improves retrieval quality while keeping the architecture lightweight.
Proposed improvement:
Query
→ Vector Search (Top 5)
→ Neural Reranker
→ Top 2 Documents
→ LLM
Implementation details:
- Retrieve top-5 documents from FAISS
- Rerank them using the
bge-reranker-base cross-encoder
- Select the top-2 documents as context
- Upgrade embedding model to
bge-small-en-v1.5
Benefits:
- Improved retrieval accuracy
- Reduced hallucination
- No additional API costs (runs locally)
- Minimal changes to existing architecture
I would love feedback from maintainers before opening a pull request.