Replies: 2 comments 3 replies
-
|
Hi @MoktarEls! I'm Dosu and I’m helping the ragflow team. Your crash at 80% is very likely caused by an embedding dimension mismatch or inconsistent output from your custom embedding server. RagFlow requires that all embeddings for a dataset have the same vector dimension, and this dimension must match what your backend (like OpenSearch) expects. If your embedding server returns vectors of a different size than the backend mapping, RagFlow will crash or hang, often without a clear error message—this is a known issue when using custom embedding servers or switching embedding models example example. Here’s what you should check:
If you change the embedding model or its dimension, you must delete all existing chunks in the dataset and recreate the index before reprocessing documents reference. If you want to check your backend mapping for a field like curl -u elastic:infini_rag_flow http://localhost:1200/YOUR_INDEX_NAME/_mapping?prettyand look for the correct If you need to support a different dimension, update your backend mapping, delete and recreate the index, and ensure your embedding server always returns vectors of that dimension. This should resolve the crash. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
I managed to get a more specific error, it still occurs at 80% of the document : Page(1~3): [ERROR]Generate embedding error:
2026-02-02 16:10:21,204 ERROR 28 Generate embedding error:
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 1061, in do_handle_task
token_count, vector_size = await embedding(chunks, embedding_model, task_parser_config, progress_callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ragflow/rag/svr/task_executor.py", line 591, in embedding
assert len(vects) == len(docs)
^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
2026-02-02 16:10:21,243 INFO 28 set_progress(f27fe9c4004811f1b89f5a9543e7167c), progress: -1, progress_msg: 16:10:21 [ERROR][Exception]:
2026-02-02 16:10:21,244 ERROR 28 handle_task got exception for task {"id": "f27fe9c4004811f1b89f5a9543e7167c", "doc_id": "ebdc6109004811f19e255a9543e7167c", "from_page": 0, "to_page": 2, "retry_count": 0, "kb_id": "e6aa60b7004811f197865a9543e7167c", "parser_id": "naive", "parser_config": {"table_context_size": 0, "image_context_size": 0, "layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "llm_id": "llama-3.1-70b-versatile@Groq", "enable_metadata": false, "metadata": {}}, "name": "Doc_test_RAG.pdf", "type": "pdf", "location": "Doc_test_RAG.pdf", "size": 57722, "tenant_id": "addb90a4004811f198a15a9543e7167c", "language": "English", "embd_id": "dangvantuan/french-document-embedding___HuggingFace@HuggingFace", "pagerank": 0, "kb_parser_config": {"table_context_size": 0, "image_context_size": 0, "layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\n", "auto_keywords": 0, "auto_questions": 0, "html4excel": false, "topn_tags": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n {cluster_content}\nThe above is the content you need to summarize.", "max_token": 256, "threshold": 0.1, "max_cluster": 64, "random_seed": 0}, "graphrag": {"use_graphrag": true, "entity_types": ["organization", "person", "geo", "event", "category"], "method": "light"}, "llm_id": "llama-3.1-70b-versatile@Groq"}, "img2txt_id": "", "asr_id": "", "llm_id": "llama-3.1-70b-versatile@Groq", "update_time": 1770044873151, "task_type": ""}
Traceback (most recent call last):
File "/ragflow/rag/svr/task_executor.py", line 1150, in handle_task
await do_handle_task(task)
File "/ragflow/common/connection_utils.py", line 74, in async_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ragflow/rag/svr/task_executor.py", line 1061, in do_handle_task
token_count, vector_size = await embedding(chunks, embedding_model, task_parser_config, progress_callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ragflow/rag/svr/task_executor.py", line 591, in embedding
assert len(vects) == len(docs)
^^^^^^^^^^^^^^^^^^^^^^^
AssertionErrorAfter that, ragflow is completly unusable, no heartbeat, my dataset disappear, i have to restart it |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I’m trying to integrate RagFlow with a custom embedding agent. I set up a separate FastAPI server to serve embeddings using
SentenceTransformer:I can connect this server to RagFlow using the embedding parameters, but when I try to process even a single document, RagFlow crashes without any error code — it just stops running at 80%.
I’m wondering if this could be caused by the dimension of the vectors I generate, which might not be compatible with the backend. If that’s the case, how can I adjust the embedding dimension? Or could it be another type of error?
I also occasionally see the following heartbeat log, but everything else seems to run fine:
2026-02-02 18:47:14,497 INFO 28 task_executor_b6bfa94dbc9a_0 reported heartbeat: {"ip_address": "172.18.0.6", "pid": 28, "name": "task_executor_b6bfa94dbc9a_0", "now": "2026-02-02T18:47:14.420+08:00", "boot_at": "2026-02-02T18:40:47.718+08:00", "pending": 1, "lag": 0, "done": 0, "failed": 0, "current": {"d69f3e0e002311f1896ae2c4b3474a5c": {"id": "d69f3e0e002311f1896ae2c4b3474a5c", "doc_id": "d6404d32002311f18110e2c4b3474a5c", "from_page": 0, "to_page": 2, "retry_count": 0, "kb_id": "c1d0d589002311f1ae02e2c4b3474a5c", "parser_id": "laws", "parser_config": {"table_context_size": 0, "image_context_size": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": false}, "llm_id": "llama-3.1-70b-versatile@Groq"}, "name": "Doc_test_RAG.pdf", "type": "pdf", "location": "Doc_test_RAG.pdf", "size": 57722, "tenant_id": "e185a1fd002111f1a7e06a9d1e10a776", "language": "English", "embd_id": "dangvantuan/french-document-embedding___HuggingFace@HuggingFace", "pagerank": 0, "kb_parser_config": {"table_context_size": 0, "image_context_size": 0, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": false}, "llm_id": "llama-3.1-70b-versatile@Groq"}, "img2txt_id": "", "asr_id": "", "llm_id": "llama-3.1-70b-versatile@Groq", "update_time": 1770028935013, "task_type": ""}}}Any guidance on what could be causing this crash or how to make RagFlow work with this custom embedding server would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions