Skip to content

Conversation

@joonsoome
Copy link
Owner

Summary

  • Adds MLX cross-encoder-lite reranker (pooled token embeddings + linear head) with score normalization (sigmoid/minmax).
  • Separates API paths to resolve format collision:
    • Cohere rerank: /v1/rerank (and /v2/rerank)
    • OpenAI-compatible rerank: /v1/openai/rerank (alias /v1/rerank_openai)
  • Enforces embedding dimension strategy (DIMENSION_STRATEGY=hidden_size) to serve 2560-D for Qwen3-Embedding-4B.
  • macOS LaunchAgent automation: setup-macos-service.sh generates a clean plist, omits empty envs, validates, and reloads.
  • Health and root endpoints expose embedding dims/hidden_size and reranker status/specs.
  • README adds Beginner Quick Start, service setup, endpoint map, and OpenAI rerank example.
  • All test suites PASS: Text Processing, API compatibility (Native/OpenAI/TEI/Cohere), Quality, Performance.

Changes

  • Reranker
    • MLX backend: pooling mean/cls, linear head padded/truncated to pooled dim, optional sigmoid/minmax.
    • Factory/env routing for backend selection and model aliases.
  • API
    • Cohere /v1/rerank kept; OpenAI moved to /v1/openai/rerank; updated router order and health advertised endpoints.
    • Bounds scores with sigmoid where applicable (schema-compatible).
  • Config
    • Empty-string envs normalized to None to prevent alias misrouting.
    • DIMENSION_STRATEGY=hidden_size; OUTPUT_EMBEDDING_DIMENSION honored when pad_or_truncate.
  • Tooling
    • setup-macos-service.sh rebuilt (no shell in XML; omits empty envs; plutil -lint; safe launchctl reload; health probe).
    • server-tests.sh validated all modes; new quick scripts and docs added.
  • Docs
    • README Beginner Quick Start; corrected OpenAI rerank path; endpoints reference; test command.
    • Added docs/DEPLOYMENT_PROFILES.md, docs/QUALITY_BENCHMARKS.md.

Endpoints reference

  • Native: /api/v1/embed, /api/v1/rerank
  • OpenAI: /v1/embeddings, /v1/openai/rerank (alias /v1/rerank_openai)
  • TEI: /embed, /rerank, /info
  • Cohere: /v1/rerank, /v2/rerank

Test results (on Apple Silicon)

  • API compatibility: PASS (8/8)
  • Quality validation: PASS
  • Performance benchmark: Mean embedding latency ~7.24 ms; peak throughput ~1287.5 texts/sec; rerank ~1.5–2.5 ms
  • Stress: 100% success, ~472 req/sec

Breaking/Notable changes

  • OpenAI rerank path moved to /v1/openai/rerank (Cohere remains at /v1/rerank). An alias /v1/rerank_openai is provided.
  • Scores for rerank may be sigmoid-normalized by default for OpenAI clients; configurable via OPENAI_RERANK_AUTO_SIGMOID.

Deployment notes

  • Prefer DIMENSION_STRATEGY=hidden_size to serve 2560-D for Qwen3-Embedding-4B-4bit-DWQ.
  • Use setup-macos-service.sh to provision LaunchAgent; empty env values are omitted to avoid overrides.

Checklist

  • All tests pass locally (Text Processing, API compatibility, Quality, Performance)
  • Health endpoints show embedding/reranker specs
  • README updated with beginner Quick Start
  • macOS service setup validated (plutil + launchctl)
  • Version tag v1.5.0 created and pushed
  • CI green
  • PyPI publish (optional; 1.5.0 artifacts built and twine-check passed)

…re env-driven reranker path (Torch CrossEncoder) and docs clarified; all tests passing
…led embeddings + linear head) and enable via RERANKER_BACKEND=mlx; tests green
…KEND, RERANKER_MODEL_ID/NAME, RERANK_MAX_SEQ_LEN, RERANK_BATCH_SIZE)
…X reranker (experimental v1) and backend selection behavior
…, OpenAI /v1/openai/rerank), dim strategy exposure, and macOS LaunchAgent setup\n\n- Add MLX cross-encoder-lite reranker with pooling + score normalization (sigmoid/minmax)\n- Move OpenAI rerank to /v1/openai/rerank; keep Cohere at /v1/rerank; update router order + health endpoints\n- Enforce embedding dimension strategy (hidden_size) and expose dims/hidden_size in / and /health\n- Rewrite tools/setup-macos-service.sh (omit empty envs, valid plist, safe reload)\n- Update server-tests; add docs; README Beginner Quick Start; version 1.5.0\n- All suites pass: Text Processing, Quality, Performance, API compatibility
@joonsoome joonsoome merged commit 87eae49 into main Nov 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants