While exploring the codebase for application for 2026, I noticed that `rag/embedding/bm25_indexer.py` contains the `BM25Indexer` class
which is a critical component of the sparse retrieval pipeline but currently has zero unit tests. So, it creates a problem as:
Problem:
`rag/embedding/bm25_indexer.py` contains the `BM25Indexer` class which is a critical component of the sparse retrieval pipeline. It currently has zero-unit tests.
Untested Behaviors:
- `build()` stores retrievers correctly for valid configs
- `build()` skips configs when SparseRetriever is unavailable
- `_index_config()` returns None when SparseRetriever is None
- `_index_config()` handles indexing errors gracefully
- `get()` returns cached retriever if already in memory
- `get()` loads retriever from disk if not cached
- `get()` returns None when index not found or load fails
- `get()` returns None when SparseRetriever is unavailable
Proposed Solution:
Add `tests/unit/rag/embedding/test_bm25_indexer.py` covering the above behaviors following existing test conventions.
And hence now I would like to work on this issue.