[ENH] Generalize embedder API and add sparse embeddings (SPLADE)

## Summary
Add first-class sparse embedding support (e.g. SPLADE) and refactor the current embedder design so it generalizes across embedding model families, with pooling as an explicit option instead of hardcoded behavior.

## Context
Current high-level embedding path is tightly coupled to all-MiniLM assumptions:
- fixed dense output width and output semantics
- hardcoded post-processing pipeline (mean pooling + L2 normalization)
- model-specific behavior mixed into one implementation

To support sparse models (SPLADE) and additional dense variants cleanly, we need a more composable embedder abstraction.

## Problem
1. Sparse embedding models require different post-processing/output representation than dense sentence-transformers.
2. Mean pooling should not be mandatory; it should be selectable.
3. Input/output names, tensor types, and output interpretation vary across models.

## Proposed Direction
### 1) Generalized embedder API
Introduce a model-agnostic embedding pipeline with configurable components:
- tokenization config
- input/output mapping config
- post-processing strategy config

Potential options API:
- `WithMeanPooling()`
- `WithNoPooling()`
- `WithCLSExtraction()`
- `WithL2Normalization()` / `WithoutL2Normalization()`
- `WithInputOutputNames(...)`

### 2) Sparse embedding support (SPLADE)
Add a sparse embedder implementation that:
- runs ONNX inference for SPLADE-compatible models
- applies sparse-specific post-processing (e.g. activation + pruning/threshold/top-k)
- returns sparse output form (indices + values, or map[int]float32)

### 3) Keep existing dense path stable
- preserve current MiniLM behavior via defaults/compatibility wrapper
- avoid breaking existing API callers

## Acceptance Criteria
- [ ] New generalized embedder configuration surface supports selectable pooling/post-processing.
- [ ] `WithMeanPooling()` exists and mean pooling is no longer hardcoded.
- [ ] Sparse embedder implementation exists for SPLADE-like models.
- [ ] Sparse output format is documented and tested.
- [ ] Existing MiniLM integration tests keep passing (backward compatibility).
- [ ] Add integration test(s) for sparse model inference path.
- [ ] README/docs include dense + sparse usage examples.

## Implementation Notes
- Consider introducing explicit output types:
  - `DenseEmbedding []float32`
  - `SparseEmbedding { Indices []int32; Values []float32 }` (or equivalent)
- Keep the runtime/session/tensor lifetime guarantees unchanged.
- Reuse existing session cache patterns where applicable.

## Related
- Goal: run inference for embedding and sparse embedding models across platforms.
- Follow-up should include cross-platform real-model CI for embedding paths (Linux/macOS/Windows).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Generalize embedder API and add sparse embeddings (SPLADE) #49

Summary

Context

Problem

Proposed Direction

1) Generalized embedder API

2) Sparse embedding support (SPLADE)

3) Keep existing dense path stable

Acceptance Criteria

Implementation Notes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[ENH] Generalize embedder API and add sparse embeddings (SPLADE) #49

Description

Summary

Context

Problem

Proposed Direction

1) Generalized embedder API

2) Sparse embedding support (SPLADE)

3) Keep existing dense path stable

Acceptance Criteria

Implementation Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions