Skip to content

[ENH] Generalize embedder API and add sparse embeddings (SPLADE) #49

@tazarov

Description

@tazarov

Summary

Add first-class sparse embedding support (e.g. SPLADE) and refactor the current embedder design so it generalizes across embedding model families, with pooling as an explicit option instead of hardcoded behavior.

Context

Current high-level embedding path is tightly coupled to all-MiniLM assumptions:

  • fixed dense output width and output semantics
  • hardcoded post-processing pipeline (mean pooling + L2 normalization)
  • model-specific behavior mixed into one implementation

To support sparse models (SPLADE) and additional dense variants cleanly, we need a more composable embedder abstraction.

Problem

  1. Sparse embedding models require different post-processing/output representation than dense sentence-transformers.
  2. Mean pooling should not be mandatory; it should be selectable.
  3. Input/output names, tensor types, and output interpretation vary across models.

Proposed Direction

1) Generalized embedder API

Introduce a model-agnostic embedding pipeline with configurable components:

  • tokenization config
  • input/output mapping config
  • post-processing strategy config

Potential options API:

  • WithMeanPooling()
  • WithNoPooling()
  • WithCLSExtraction()
  • WithL2Normalization() / WithoutL2Normalization()
  • WithInputOutputNames(...)

2) Sparse embedding support (SPLADE)

Add a sparse embedder implementation that:

  • runs ONNX inference for SPLADE-compatible models
  • applies sparse-specific post-processing (e.g. activation + pruning/threshold/top-k)
  • returns sparse output form (indices + values, or map[int]float32)

3) Keep existing dense path stable

  • preserve current MiniLM behavior via defaults/compatibility wrapper
  • avoid breaking existing API callers

Acceptance Criteria

  • New generalized embedder configuration surface supports selectable pooling/post-processing.
  • WithMeanPooling() exists and mean pooling is no longer hardcoded.
  • Sparse embedder implementation exists for SPLADE-like models.
  • Sparse output format is documented and tested.
  • Existing MiniLM integration tests keep passing (backward compatibility).
  • Add integration test(s) for sparse model inference path.
  • README/docs include dense + sparse usage examples.

Implementation Notes

  • Consider introducing explicit output types:
    • DenseEmbedding []float32
    • SparseEmbedding { Indices []int32; Values []float32 } (or equivalent)
  • Keep the runtime/session/tensor lifetime guarantees unchanged.
  • Reuse existing session cache patterns where applicable.

Related

  • Goal: run inference for embedding and sparse embedding models across platforms.
  • Follow-up should include cross-platform real-model CI for embedding paths (Linux/macOS/Windows).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions