Skip to content

Commit 524b18a

Browse files
authored
Merge pull request #8 from joonsoome/5-feature-automatic-forwarding-helper-for-mxarray-with-fallback-to-mxasarray
OpenAI compatibility: add base64 embedding encoding + optional dimens…
2 parents 47d34f2 + 24debf1 commit 524b18a

File tree

4 files changed

+92
-5
lines changed

4 files changed

+92
-5
lines changed

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,21 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

88
## [1.2.0] - 2025-09-10
9+
10+
## [1.2.3] - 2025-10-30
11+
12+
### Added
13+
- OpenAI compatibility: base64 encoding support via `encoding_format="base64"` for `/v1/embeddings`.
14+
- OpenAI compatibility: optional `dimensions` handling (truncate/pad to requested size).
15+
16+
### Documentation
17+
- README: Added LightRAG integration note (OpenAI embeddings + Cohere reranking tested successfully).
18+
- README: Added Qwen Embedding similarity scaling note and recommended starting threshold `COSINE_THRESHOLD=0.0` for LightRAG.
19+
- README: Example for requesting base64-encoded embeddings and decoding back to float32.
20+
21+
### Notes
22+
- These updates maintain full compatibility with existing OpenAI SDK usage; default remains `encoding_format="float"`.
23+
924

1025
### Added
1126
- 🆕 **Cohere API v1/v2 Compatibility**: Full support for Cohere reranking API

README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,31 @@ response = client.embeddings.create(
274274
model="text-embedding-ada-002"
275275
)
276276
# 🚀 10x faster than OpenAI, same code!
277+
278+
"""
279+
Base64 encoding support
280+
-----------------------
281+
282+
For OpenAI-compatible calls, you can request base64-encoded embeddings by setting `encoding_format` to `"base64"`. This is useful when transporting vectors through systems that expect strings only.
283+
284+
Example (Python OpenAI SDK):
285+
286+
```python
287+
response = client.embeddings.create(
288+
input=["Hello world"],
289+
model="text-embedding-ada-002",
290+
encoding_format="base64", # returns base64-encoded float32 bytes
291+
)
292+
293+
# embedding string is base64; decode if you need floats again
294+
import base64, numpy as np
295+
arr = np.frombuffer(base64.b64decode(response.data[0].embedding), dtype=np.float32)
296+
```
297+
298+
Notes:
299+
- `encoding_format` defaults to `"float"` (list[float]).
300+
- `dimensions` is accepted and will truncate/pad to the requested size when supported.
301+
"""
277302
```
278303

279304
### TEI Compatible
@@ -311,6 +336,25 @@ response = requests.post("http://localhost:9000/v1/rerank", json={
311336
})
312337
```
313338

339+
---
340+
341+
## 🧩 LightRAG Integration
342+
343+
We validated an end-to-end workflow using LightRAG with this service:
344+
- Embeddings via the OpenAI-compatible endpoint (`/v1/embeddings`)
345+
- Reranking via the Cohere-compatible endpoint (`/v1/rerank` or `/v2/rerank`)
346+
347+
Results: the integration tests succeeded using OpenAI embeddings and Cohere reranking.
348+
349+
Qwen Embedding similarity scaling note: when using the Qwen Embedding model, we observed cosine similarity values that appear very small (e.g., `0.02`, `0.03`). This is expected due to vector scaling differences and does not indicate poor retrieval by itself. As a starting point, we recommend disabling the retrieval threshold in LightRAG to avoid filtering out good matches prematurely:
350+
351+
```
352+
# === Retrieval threshold ===
353+
COSINE_THRESHOLD=0.0
354+
```
355+
356+
Adjust upward later based on your dataset and evaluation results.
357+
314358
### Native API
315359

316360
```bash

app/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
- Apple Silicon MLX optimization with PyTorch fallback
88
- Multi-API compatibility: Native, OpenAI, TEI, and Cohere formats
99
10-
🚀 NEW in v1.2.2: Fully resolved API compatibility test warnings!
10+
🚀 NEW in v1.2.3: OpenAI base64 encoding support + docs update
1111
- Fixed Cohere API tests with proper environment variable handling
1212
- Resolved pytest environment variable propagation issues
1313
- Eliminated false warnings while maintaining 100% API compatibility
@@ -17,5 +17,5 @@
1717
Author: joonsoo-me
1818
"""
1919

20-
__version__ = "1.2.2"
20+
__version__ = "1.2.3"
2121
__author__ = "joonsoo-me"

app/routers/openai_router.py

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717

1818
import time
1919
from typing import Any, Dict, List, Optional, Union
20+
import base64
21+
import numpy as np
2022

2123
import structlog
2224
from fastapi import APIRouter, Depends, HTTPException, Request
@@ -149,7 +151,8 @@ class OpenAIEmbeddingData(BaseModel):
149151
"""
150152

151153
object: str = Field(default="embedding", description="Object type identifier")
152-
embedding: List[float] = Field(..., description="The embedding vector")
154+
# Allow either float list (default) or base64 string when encoding_format="base64"
155+
embedding: Union[List[float], str] = Field(..., description="The embedding vector (float list or base64 string)")
153156
index: int = Field(..., description="Index of the input text")
154157

155158

@@ -296,8 +299,33 @@ async def create_embeddings(
296299
# 📊 Calculate comprehensive timing metrics
297300
total_time = time.time() - start_time
298301

299-
# 🔄 Transform MLX response to enhanced OpenAI format
300-
embedding_data = [OpenAIEmbeddingData(embedding=vector, index=i) for i, vector in enumerate(mlx_result.vectors)]
302+
# 🔄 Optionally adjust dimensions if requested
303+
vectors: List[List[float]] = mlx_result.vectors
304+
target_dims = request.dimensions
305+
if target_dims is not None and target_dims > 0:
306+
adjusted: List[List[float]] = []
307+
for v in vectors:
308+
if len(v) == target_dims:
309+
adjusted.append(v)
310+
elif len(v) > target_dims:
311+
# Truncate to requested dimensions
312+
adjusted.append(v[:target_dims])
313+
else:
314+
# Pad with zeros up to requested dimensions
315+
padded = v + [0.0] * (target_dims - len(v))
316+
adjusted.append(padded)
317+
vectors = adjusted
318+
319+
# 🔄 Transform MLX response to enhanced OpenAI format (support base64 when requested)
320+
embedding_data: List[OpenAIEmbeddingData] = []
321+
if (request.encoding_format or "float").lower() == "base64":
322+
for i, v in enumerate(vectors):
323+
arr = np.asarray(v, dtype=np.float32)
324+
b64 = base64.b64encode(arr.tobytes()).decode("ascii")
325+
embedding_data.append(OpenAIEmbeddingData(embedding=b64, index=i))
326+
else:
327+
for i, v in enumerate(vectors):
328+
embedding_data.append(OpenAIEmbeddingData(embedding=v, index=i))
301329

302330
# 📈 Calculate token usage (approximate word-based counting)
303331
total_tokens = sum(len(text.split()) for text in texts)

0 commit comments

Comments
 (0)