11# 🔥 Embeddings + Reranking on your Mac (MLX‑first)
22
3- <p >
3+ <p align = " center " >
44 <a href =" docs/ENHANCED_OPENAI_API.md " >
5+ <a href =" https://github.com/joonsoo-me/embed-rerank/blob/main/LICENSE " ><img src =" https://img.shields.io/github/license/joonsoo-me/embed-rerank?logo=opensource&logoColor=white " /></a >
56 <img src =" https://img.shields.io/badge/OpenAI%20rerank-supported-2ea44f " alt =" OpenAI rerank supported (/v1/openai/rerank) " />
67 </a >
78 <a href =" docs/DEPLOYMENT_PROFILES.md " >
89 <img src="https://img.shields.io/badge/auto--sigmoid-default%20on-blue" alt="auto-sigmoid default on" />
9- </a >
10+ </a ><a href =" https://ml-explore.github.io/mlx/ " ><img src =" https://img.shields.io/badge/MLX-Optimized-green?logo=apple&logoColor=white " /></a >
11+ <a href =" https://fastapi.tiangolo.com/ " ><img src =" https://img.shields.io/badge/FastAPI-009688?logo=fastapi&logoColor=white " /></a >
1012 <a href =" https://pypi.org/project/embed-rerank/ " >
1113 <img src="https://img.shields.io/pypi/v/embed-rerank?logo=pypi&logoColor=white" alt="PyPI Version" />
1214 </a >
@@ -16,6 +18,46 @@ Blazing‑fast local embeddings and true cross‑encoder reranking on Apple Sili
1618
1719This page is a beginner‑friendly quick start. Detailed guides live in docs/.
1820
21+ ## 🌐 Four APIs, One Service
22+
23+ | API | Endpoint | Use Case |
24+ | -----| ----------| ----------|
25+ | ** Native** | ` /api/v1/embed ` , ` /api/v1/rerank ` | New projects |
26+ | ** OpenAI** | ` /v1/embeddings ` , ` /v1/openai/rerank ` (alias: ` /v1/rerank_openai ` ) | Existing OpenAI code |
27+ | ** TEI** | ` /embed ` , ` /rerank ` , ` /info ` | Hugging Face TEI replacement |
28+ | ** Cohere** | ` /v1/rerank ` , ` /v2/rerank ` | Cohere API replacement |
29+ | | ` /docs ` ` /health ` | More info. |
30+
31+ ## 📈 Performance Visualization
32+
33+ ### Latency Comparison (Projected)
34+
35+ ```
36+ Single Text Embedding Latency (milliseconds)
37+
38+ Apple MLX ████ 0.2ms
39+ PyTorch MPS ████████████████████████████████████████████████ 45ms
40+ PyTorch CPU ████████████████████████████████████████████████████████████████████████████████████████████████████████ 120ms
41+ CUDA (Est.) ████████████ 12ms
42+ Vulkan (Est.) ████████████████████████ 25ms
43+
44+ 0ms 25ms 50ms 75ms 100ms 125ms
45+ ```
46+
47+ ### Throughput Comparison (texts/second)
48+
49+ ```
50+ Maximum Throughput (texts per second)
51+
52+ Apple MLX ████████████████████████████████████████████████████████████████████████████████████████████████████████ 35,000
53+ CUDA (Est.) ████████████████████████████████ 8,000
54+ PyTorch MPS ██████ 1,500
55+ Vulkan (Est.) ████████████ 3,000
56+ PyTorch CPU ██ 500
57+
58+ 0 10k 20k 30k 40k
59+ ```
60+
1961## 🚀 Start here (60 seconds)
2062
21631 ) Install and run (embeddings only)
@@ -96,12 +138,6 @@ Notes
96138- Scores may be auto‑sigmoid‑normalized for OpenAI clients by default (disable via ` OPENAI_RERANK_AUTO_SIGMOID=false ` ).
97139- The root endpoint ` / ` shows both ` embedding_dimension ` (served) and ` hidden_size ` (model config) for clarity.
98140
99- Quick endpoints reference
100- - Native: ` /api/v1/embed ` , ` /api/v1/rerank `
101- - OpenAI: ` /v1/embeddings ` , ` /v1/openai/rerank ` (alias: ` /v1/rerank_openai ` )
102- - TEI: ` /embed ` , ` /rerank ` , ` /info `
103- - Cohere: ` /v1/rerank ` , ` /v2/rerank `
104-
105141Run the full validation suite
106142``` bash
107143./tools/server-tests.sh --full
@@ -142,6 +178,13 @@ rr = client._request(
142178print (rr.get(" results" , rr))
143179```
144180
181+ ## Tested Frameworks
182+ | | Framework | Tests |
183+ | ---| ---| ---|
184+ | ✅ | [ ** Open WebUI** ] ( https://github.com/open-webui/open-webui ) | ` Embed ` |
185+ | ✅ | [ ** LightRAG** ] ( https://github.com/HKUDS/LightRAG ) | ` Embed ` ` Rerank ` |
186+ ###### We are waiting for your reports!
187+
145188## 📄 License
146189
147190MIT License – build amazing things locally.
0 commit comments