RagFlow crashing when using a custom embedding server #12953

MoktarEls · 2026-02-02T12:56:55Z

MoktarEls
Feb 2, 2026

Hi,

I’m trying to integrate RagFlow with a custom embedding agent. I set up a separate FastAPI server to serve embeddings using SentenceTransformer :

from contextlib import asynccontextmanager
from sentence_transformers import SentenceTransformer, util
from fastapi import FastAPI
from pydantic import BaseModel

"""
https://huggingface.co/dangvantuan/french-document-embedding
"""


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Événement de démarrage de l'application FastAPI"""
    print("Démarrage de l'application FastAPI pour les embeddings...")
    global model 
    model = SentenceTransformer("dangvantuan/french-document-embedding",  trust_remote_code=True)
    yield

app = FastAPI(lifespan=lifespan)
model: SentenceTransformer | None = None

class EmbeddingRequest(BaseModel):
    inputs: list[str]


@app.post("/embed")
@app.post("/")
def embed(req: EmbeddingRequest):

    assert model is not None, "model did not load properly"

    # Calcul embeddings normalisés
    emb_doc = model.encode(
        req.inputs,
        normalize_embeddings=True,
        show_progress_bar=False
        )

    print("Shape embeddings:", emb_doc.shape)
    for i, e in enumerate(emb_doc):
        print(f"Embedding {i} length:", len(e))
    return emb_doc.tolist()

I can connect this server to RagFlow using the embedding parameters, but when I try to process even a single document, RagFlow crashes without any error code — it just stops running at 80%.

I’m wondering if this could be caused by the dimension of the vectors I generate, which might not be compatible with the backend. If that’s the case, how can I adjust the embedding dimension? Or could it be another type of error?

I also occasionally see the following heartbeat log, but everything else seems to run fine:

2026-02-02 18:47:14,497 INFO 28 task_executor_b6bfa94dbc9a_0 reported heartbeat: {"ip_address": "172.18.0.6", "pid": 28, "name": "task_executor_b6bfa94dbc9a_0", "now": "2026-02-02T18:47:14.420+08:00", "boot_at": "2026-02-02T18:40:47.718+08:00", "pending": 1, "lag": 0, "done": 0, "failed": 0, "current": {"d69f3e0e002311f1896ae2c4b3474a5c": {"id": "d69f3e0e002311f1896ae2c4b3474a5c", "doc_id": "d6404d32002311f18110e2c4b3474a5c", "from_page": 0, "to_page": 2, "retry_count": 0, "kb_id": "c1d0d589002311f1ae02e2c4b3474a5c", "parser_id": "laws", "parser_config": {"table_context_size": 0, "image_context_size": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": false}, "llm_id": "llama-3.1-70b-versatile@Groq"}, "name": "Doc_test_RAG.pdf", "type": "pdf", "location": "Doc_test_RAG.pdf", "size": 57722, "tenant_id": "e185a1fd002111f1a7e06a9d1e10a776", "language": "English", "embd_id": "dangvantuan/french-document-embedding___HuggingFace@HuggingFace", "pagerank": 0, "kb_parser_config": {"table_context_size": 0, "image_context_size": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": false}, "llm_id": "llama-3.1-70b-versatile@Groq"}, "img2txt_id": "", "asr_id": "", "llm_id": "llama-3.1-70b-versatile@Groq", "update_time": 1770028935013, "task_type": ""}}}

Any guidance on what could be causing this crash or how to make RagFlow work with this custom embedding server would be greatly appreciated.

@MoktarEls · 2026-02-02T13:03:31Z

dosubot[bot]
bot Feb 2, 2026

Hi @MoktarEls! I'm Dosu and I’m helping the ragflow team.

Your crash at 80% is very likely caused by an embedding dimension mismatch or inconsistent output from your custom embedding server. RagFlow requires that all embeddings for a dataset have the same vector dimension, and this dimension must match what your backend (like OpenSearch) expects. If your embedding server returns vectors of a different size than the backend mapping, RagFlow will crash or hang, often without a clear error message—this is a known issue when using custom embedding servers or switching embedding models example example.

Here’s what you should check:

Embedding dimension: The model dangvantuan/french-document-embedding outputs 768-dimensional vectors. If your backend (e.g., OpenSearch) expects a different dimension (like 384, 512, 1536, etc.), you must update the backend mapping to match 768, or use a model that matches the backend’s expected dimension. You can verify the dimension by printing emb_doc.shape in your FastAPI server and by checking your backend index mapping how-to.
Backend mapping: If you use OpenSearch, add a dynamic template for _768_vec in conf/os_mapping.json with dimension: 768, then delete and recreate the affected index so the new mapping takes effect details.
Consistent output: Make sure your embedding server always returns vectors of the same dimension for every input. Variable-length outputs (due to input anomalies or model issues) will cause RagFlow to fail details.
API contract: RagFlow expects the embedding API to accept a string or list of strings and return a list of floats (or list of lists for batch input). If your API differs, you may need to implement a custom handler in rag/llm/embedding_model.py and register it in rag/llm/__init__.py how-to.
Timeouts and resource limits: If your embedding server is slow, RagFlow may hit a hardcoded timeout (often 5 seconds per batch). You can lower EMBEDDING_BATCH_SIZE and MAX_CONCURRENT_TASKS in your .env file, or increase the timeout in the code if needed reference.

If you change the embedding model or its dimension, you must delete all existing chunks in the dataset and recreate the index before reprocessing documents reference.

If you want to check your backend mapping for a field like q_768_vec, use:

curl -u elastic:infini_rag_flow http://localhost:1200/YOUR_INDEX_NAME/_mapping?pretty

and look for the correct knn_vector type and dimension.

If you need to support a different dimension, update your backend mapping, delete and recreate the index, and ensure your embedding server always returns vectors of that dimension. This should resolve the crash.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

3 replies

MoktarEls Feb 2, 2026
Author

@dosu , Hmm, that confirms what I was thinking. I mainly use RagFlow through the UI — is there a way to change the embedding vector size from there, or would it be better to use the Python SDK directly?

If using the Python SDK is preferable, is there a way to run only the RagFlow backend, since the frontend isn’t really necessary for my use

dosubot[bot] bot Feb 2, 2026

You can't set the embedding vector size directly in the RagFlow UI—the UI only lets you pick an embedding model, and the vector size is determined by the model you select. Once documents are parsed (chunks exist), the embedding model selection is locked and can't be changed unless you delete all chunks first. If you need to switch to a model with a different vector size, you also have to manually drop and recreate the backend index (e.g., in OpenSearch or Infinity) to match the new dimension—RagFlow doesn't automate this step for you (source, source).

The Python SDK gives you more control for bulk operations (like deleting chunks, resetting statuses, or creating datasets with a specific embedding model), but it also doesn't let you set the vector size directly—again, it's determined by the embedding model you specify. Backend index changes (for dimension mismatches) must still be handled manually (source, source).

Running RagFlow "backend only" isn't an officially documented mode, but you can interact with the backend API or Python SDK and simply ignore the UI. The default deployment runs both frontend and backend together, but for headless/API-driven workflows, just use the SDK or direct API calls—no need to use the UI at all (source).

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

MoktarEls Feb 2, 2026
Author

@dosu , i’m not sure I fully follow your explanation, so let me clarify the exact steps I followed — maybe I should have specified this earlier:

I created a brand-new dataset (completely empty, no existing chunks).

I added a single document.

I launched the analysis.

The custom embedding model is already registered in RagFlow, and the embedding server is up, running, and reachable — there are no connection issues on that side.

In os_mapping.json, there is already a mapping that supports 768-dimension vectors:

      {
        "knn_vector": {
          "match": "*_768_vec",
          "mapping": {
            "type": "knn_vector",
            "index": true,
            "space_type": "cosinesimil",
            "dimension": 768
          }
        }

So the vector dimension is already supported, the index set is fresh, and the embedding endpoint is correctly returning vectors.

Given all that, I’m not sure why the analysis gets stuck or crashes. At this point, it doesn’t seem to be caused by an existing index, a dimension mismatch, or an embedding connectivity issue.

MoktarEls · 2026-02-02T15:22:55Z

MoktarEls
Feb 2, 2026
Author

I managed to get a more specific error, it still occurs at 80% of the document :

 Page(1~3): [ERROR]Generate embedding error:
2026-02-02 16:10:21,204 ERROR    28 Generate embedding error:
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 1061, in do_handle_task
    token_count, vector_size = await embedding(chunks, embedding_model, task_parser_config, progress_callback)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/svr/task_executor.py", line 591, in embedding
    assert len(vects) == len(docs)
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
2026-02-02 16:10:21,243 INFO     28 set_progress(f27fe9c4004811f1b89f5a9543e7167c), progress: -1, progress_msg: 16:10:21 [ERROR][Exception]:
2026-02-02 16:10:21,244 ERROR    28 handle_task got exception for task {"id": "f27fe9c4004811f1b89f5a9543e7167c", "doc_id": "ebdc6109004811f19e255a9543e7167c", "from_page": 0, "to_page": 2, "retry_count": 0, "kb_id": "e6aa60b7004811f197865a9543e7167c", "parser_id": "naive", "parser_config": {"table_context_size": 0, "image_context_size": 0, "layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "llm_id": "llama-3.1-70b-versatile@Groq", "enable_metadata": false, "metadata": {}}, "name": "Doc_test_RAG.pdf", "type": "pdf", "location": "Doc_test_RAG.pdf", "size": 57722, "tenant_id": "addb90a4004811f198a15a9543e7167c", "language": "English", "embd_id": "dangvantuan/french-document-embedding___HuggingFace@HuggingFace", "pagerank": 0, "kb_parser_config": {"table_context_size": 0, "image_context_size": 0, "layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "llm_id": "llama-3.1-70b-versatile@Groq"}, "img2txt_id": "", "asr_id": "", "llm_id": "llama-3.1-70b-versatile@Groq", "update_time": 1770044873151, "task_type": ""}
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 1150, in handle_task
    await do_handle_task(task)
  File "/ragflow/common/connection_utils.py", line 74, in async_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/svr/task_executor.py", line 1061, in do_handle_task
    token_count, vector_size = await embedding(chunks, embedding_model, task_parser_config, progress_callback)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragflow/rag/svr/task_executor.py", line 591, in embedding
    assert len(vects) == len(docs)
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

After that, ragflow is completly unusable, no heartbeat, my dataset disappear, i have to restart it

0 replies

InfiniFlow

RagFlow crashing when using a custom embedding server #12953

Uh oh!

Uh oh!

MoktarEls Feb 2, 2026

Replies: 2 comments · 3 replies

Uh oh!

dosubot[bot] bot Feb 2, 2026

Uh oh!

Uh oh!

MoktarEls Feb 2, 2026 Author

Uh oh!

dosubot[bot] bot Feb 2, 2026

Uh oh!

Uh oh!

MoktarEls Feb 2, 2026 Author

Uh oh!

Uh oh!

MoktarEls Feb 2, 2026 Author

MoktarEls
Feb 2, 2026

Replies: 2 comments 3 replies

dosubot[bot]
bot Feb 2, 2026

MoktarEls Feb 2, 2026
Author

MoktarEls Feb 2, 2026
Author

MoktarEls
Feb 2, 2026
Author