Summary
Add first-class sparse embedding support (e.g. SPLADE) and refactor the current embedder design so it generalizes across embedding model families, with pooling as an explicit option instead of hardcoded behavior.
Context
Current high-level embedding path is tightly coupled to all-MiniLM assumptions:
- fixed dense output width and output semantics
- hardcoded post-processing pipeline (mean pooling + L2 normalization)
- model-specific behavior mixed into one implementation
To support sparse models (SPLADE) and additional dense variants cleanly, we need a more composable embedder abstraction.
Problem
- Sparse embedding models require different post-processing/output representation than dense sentence-transformers.
- Mean pooling should not be mandatory; it should be selectable.
- Input/output names, tensor types, and output interpretation vary across models.
Proposed Direction
1) Generalized embedder API
Introduce a model-agnostic embedding pipeline with configurable components:
- tokenization config
- input/output mapping config
- post-processing strategy config
Potential options API:
WithMeanPooling()
WithNoPooling()
WithCLSExtraction()
WithL2Normalization() / WithoutL2Normalization()
WithInputOutputNames(...)
2) Sparse embedding support (SPLADE)
Add a sparse embedder implementation that:
- runs ONNX inference for SPLADE-compatible models
- applies sparse-specific post-processing (e.g. activation + pruning/threshold/top-k)
- returns sparse output form (indices + values, or map[int]float32)
3) Keep existing dense path stable
- preserve current MiniLM behavior via defaults/compatibility wrapper
- avoid breaking existing API callers
Acceptance Criteria
Implementation Notes
- Consider introducing explicit output types:
DenseEmbedding []float32
SparseEmbedding { Indices []int32; Values []float32 } (or equivalent)
- Keep the runtime/session/tensor lifetime guarantees unchanged.
- Reuse existing session cache patterns where applicable.
Related
- Goal: run inference for embedding and sparse embedding models across platforms.
- Follow-up should include cross-platform real-model CI for embedding paths (Linux/macOS/Windows).
Summary
Add first-class sparse embedding support (e.g. SPLADE) and refactor the current embedder design so it generalizes across embedding model families, with pooling as an explicit option instead of hardcoded behavior.
Context
Current high-level embedding path is tightly coupled to all-MiniLM assumptions:
To support sparse models (SPLADE) and additional dense variants cleanly, we need a more composable embedder abstraction.
Problem
Proposed Direction
1) Generalized embedder API
Introduce a model-agnostic embedding pipeline with configurable components:
Potential options API:
WithMeanPooling()WithNoPooling()WithCLSExtraction()WithL2Normalization()/WithoutL2Normalization()WithInputOutputNames(...)2) Sparse embedding support (SPLADE)
Add a sparse embedder implementation that:
3) Keep existing dense path stable
Acceptance Criteria
WithMeanPooling()exists and mean pooling is no longer hardcoded.Implementation Notes
DenseEmbedding []float32SparseEmbedding { Indices []int32; Values []float32 }(or equivalent)Related