You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+52-52Lines changed: 52 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,7 @@
13
13
Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first). OpenAI, TEI, and Cohere compatible.
14
14
15
15
## 🔧 Troubleshooting
16
-
17
16
### Common Issues
18
-
19
17
**"Embedding service not initialized" Error**: Fixed in v1.2.0. If you encounter this error:
20
18
1. Update to the latest version: `pip install --upgrade embed-rerank`
21
19
2. For source installations, ensure proper service initialization in `main.py`
@@ -40,35 +38,15 @@ For comprehensive troubleshooting, see [docs/TROUBLESHOOTING.md](docs/TROUBLESHO
40
38
41
39
Recent MLX versions removed `mx.array` in favor of `mx.asarray` (and `mx.numpy.array`). This repository includes a compatibility helper that automatically forwards to the appropriate API, so Apple Silicon embeddings continue to work across MLX versions.
42
40
43
-
What changed:
41
+
**What changed:**
44
42
- Internal `mx.array(...)` calls now use a helper that tries, in order: `mx.array` → `mx.asarray` → `mx.numpy.array`.
45
-
- Placeholder embedding fallback now respects the model configuration using `config['hidden_size']` (previously some error paths defaulted to 4096).
43
+
- Placeholder embedding fallback now respects the model configuration using multiple dimension keys.
46
44
47
-
Why this matters:
45
+
**Why this matters:**
48
46
- Prevents runtime error: `module 'mlx.core' has no attribute 'array'` on newer MLX.
49
-
- Ensures embedding dimension matches the loaded model, avoiding vector size mismatches (e.g., when updating existing ChromaDB collections).
You can request base64-encoded embeddings by setting `encoding_format="base64"`. This is useful when transporting vectors through systems that expect strings only.
We validated an end-to-end workflow using LightRAG with this service:
338
-
- Embeddings via the OpenAI-compatible endpoint (`/v1/embeddings`)
339
-
- Reranking via the Cohere-compatible endpoint (`/v1/rerank` or `/v2/rerank`)
340
-
341
-
Results: the integration tests succeeded using OpenAI embeddings and Cohere reranking.
342
-
343
-
Qwen Embedding similarity scaling note: when using the Qwen Embedding model, we observed cosine similarity values that appear very small (e.g., `0.02`, `0.03`). This is expected due to vector scaling differences and does not indicate poor retrieval by itself. As a starting point, we recommend disabling the retrieval threshold in LightRAG to avoid filtering out good matches prematurely:
344
-
345
-
```
346
-
# === Retrieval threshold ===
347
-
COSINE_THRESHOLD=0.0
348
-
```
349
-
350
-
Adjust upward later based on your dataset and evaluation results.
351
-
352
329
### Native API
353
330
354
331
```bash
@@ -416,6 +393,9 @@ embed-rerank --test full --test-url http://localhost:9000
416
393
417
394
### 🔧 Advanced Testing (Source Code)
418
395
396
+
```bash
397
+
### 🔧 Advanced Testing (Source Code)
398
+
419
399
For development and comprehensive testing with the source code:
420
400
421
401
```bash
@@ -474,6 +454,7 @@ embed-rerank --port 9000 &
474
454
```
475
455
476
456
>**Windows Support**: Coming soon! Currently optimized for macOS/Linux.
We validated an end-to-end workflow using LightRAG with this service:
541
+
- Embeddings via the OpenAI-compatible endpoint (`/v1/embeddings`)
542
+
- Reranking via the Cohere-compatible endpoint (`/v1/rerank` or `/v2/rerank`)
543
+
544
+
Results: the integration tests succeeded using OpenAI embeddings and Cohere reranking.
545
+
546
+
Qwen Embedding similarity scaling note: when using the Qwen Embedding model, we observed cosine similarity values that appear very small (e.g., `0.02`, `0.03`). This is expected due to vector scaling differences and does not indicate poor retrieval by itself. As a starting point, we recommend disabling the retrieval threshold in LightRAG to avoid filtering out good matches prematurely:
547
+
548
+
```
549
+
# === Retrieval threshold ===
550
+
COSINE_THRESHOLD=0.0
551
+
```
552
+
553
+
Adjust upward later based on your dataset and evaluation results.
554
+
555
+
---
556
+
557
557
## 📄 License
558
558
559
559
MIT License - build amazing things with this code!
0 commit comments