Skip to content

Commit a87324a

Browse files
authored
Make HuggingFace embeddings optional
Move local HuggingFace embedding dependencies behind an optional extra and lazy-load them only when needed.
1 parent 37d6fce commit a87324a

7 files changed

Lines changed: 97 additions & 63 deletions

File tree

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,17 @@ uv add "vector-graph-rag[loaders]"
5151

5252
</details>
5353

54+
<details>
55+
<summary><b>With local HuggingFace embedding models</b></summary>
56+
57+
```bash
58+
pip install "vector-graph-rag[hf]"
59+
# or
60+
uv add "vector-graph-rag[hf]"
61+
```
62+
63+
</details>
64+
5465
## 🚀 Quick Start
5566

5667
```python

docs/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Frequently asked questions about Vector Graph RAG — covering when to use it, c
7070
```
7171

7272
??? note "Can I use my own embeddings?"
73-
Vector Graph RAG uses OpenAI embedding models by default (`text-embedding-3-large`), but you can configure the embedding model via the `embedding_model` parameter. Any model accessible through the OpenAI-compatible API will work. If you are using a local or custom embedding endpoint, set the appropriate base URL and model name. The embedding dimensionality is detected automatically. Note that all entities, relations, and passages in a single graph must use the same embedding model — mixing models within one collection prefix is not supported.
73+
Vector Graph RAG uses OpenAI embedding models by default (`text-embedding-3-large`), but you can configure the embedding model via the `embedding_model` parameter. Any model accessible through the OpenAI-compatible API will work. If you are using a local HuggingFace embedding model, install the optional dependencies with `pip install "vector-graph-rag[hf]"`. The embedding dimensionality is detected automatically. Note that all entities, relations, and passages in a single graph must use the same embedding model — mixing models within one collection prefix is not supported.
7474

7575
??? note "How do I deploy to production?"
7676
For production deployments, use a remote Milvus instance instead of Milvus Lite for better performance, scalability, and persistence. Run the FastAPI backend behind a reverse proxy (e.g., Nginx) with appropriate rate limiting and authentication. The frontend can be built as static files and served from any CDN or static file server.

docs/getting-started.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@
2020
pip install "vector-graph-rag[loaders]"
2121
```
2222

23+
=== "With local embeddings"
24+
25+
```bash
26+
pip install "vector-graph-rag[hf]"
27+
```
28+
2329
!!! note "Prerequisites"
2430
- Python 3.9+
2531
- An OpenAI API key (set `OPENAI_API_KEY` environment variable)

evaluation/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ These triplets were extracted using GPT-3.5-Turbo (1106) with HippoRAG's OpenIE
7474

7575
1. Install dependencies:
7676
```bash
77-
uv sync
77+
uv sync --extra hf
7878
```
7979

8080
2. Set environment variables:

pyproject.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,6 @@ dependencies = [
3232
"tqdm>=4.65.0",
3333
"tiktoken>=0.5.0",
3434
"tenacity>=8.2.0",
35-
"transformers>=4.30.0",
36-
"torch>=2.0.0",
3735
"langchain-core>=0.3.0",
3836
"python-multipart>=0.0.22",
3937
"setuptools<71", # Required for pkg_resources used by milvus-lite
@@ -49,8 +47,12 @@ api = [
4947
"fastapi>=0.109.0",
5048
"uvicorn[standard]>=0.27.0",
5149
]
50+
hf = [
51+
"transformers>=4.30.0",
52+
"torch>=2.0.0",
53+
]
5254
all = [
53-
"vector-graph-rag[dev,api]",
55+
"vector-graph-rag[dev,api,hf]",
5456
]
5557
loaders = [
5658
"markitdown[docx,pdf]>=0.1.4",

src/vector_graph_rag/storage/embeddings.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
from typing import List, Literal, Optional, Union
99

1010
import numpy as np
11-
import torch
1211
from tqdm import tqdm
1312

1413
from vector_graph_rag.config import Settings, get_settings
@@ -50,7 +49,7 @@ def _get_model_family(model_name: str) -> Optional[str]:
5049
return None
5150

5251

53-
def _mean_pooling(token_embeddings: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
52+
def _mean_pooling(token_embeddings, attention_mask):
5453
"""Mean pooling with attention mask."""
5554
token_embeddings = token_embeddings.masked_fill(~attention_mask[..., None].bool(), 0.0)
5655
sentence_embeddings = token_embeddings.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
@@ -70,10 +69,18 @@ def __init__(
7069
instruction: Optional[str] = None,
7170
instruction_template: Optional[str] = None,
7271
):
73-
from transformers import AutoModel, AutoTokenizer
72+
try:
73+
import torch
74+
from transformers import AutoModel, AutoTokenizer
75+
except ImportError as exc:
76+
raise ImportError(
77+
"HuggingFace embedding models require the optional 'hf' dependencies. "
78+
"Install with: uv sync --extra hf, or pip install 'vector-graph-rag[hf]'."
79+
) from exc
7480

7581
self.model_name = model_name
7682
self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
83+
self._torch = torch
7784
self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(self.device)
7885
self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
7986
self.model.eval()
@@ -121,6 +128,7 @@ def encode(
121128

122129
# Apply instruction if configured
123130
processed_texts = self._apply_instruction(texts, text_type)
131+
torch = self._torch
124132

125133
with torch.no_grad():
126134
inputs = self.tokenizer(
@@ -132,7 +140,7 @@ def encode(
132140
if normalize:
133141
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
134142

135-
return embeddings.cpu().numpy()
143+
return embeddings.float().cpu().numpy()
136144

137145

138146
class OpenAIEmbedding:

0 commit comments

Comments
 (0)