Skip to content

[Bug]:text-embedding-v4因为敏感词调用失败会造成流程卡死 #2710

@hoky

Description

@hoky

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

我在尝试用LightRAG做法律数据RAG,遇到几个问题。
1、向量处理的时候敏感词会造成流程卡死。
2、速度非常慢,好几天了只处理了14万数据,一方面是因为阿里云限流,已工单处理限流问题。
3、Merged过程非常慢,应该如何加速?

目前我用的数据库是PostgreSQL + PGVector插件+ AGE插件。

Steps to reproduce

No response

Expected Behavior

No response

LightRAG Config Used

Paste your config here

Logs and screenshots

INFO: 172.18.0.1:47886 - "GET /docs HTTP/1.1" 200
INFO: 172.18.0.1:43422 - "GET /openapi.json HTTP/1.1" 200
ERROR: OpenAI API Call Failed,
Model: deepseek-v3.2,
Params: {'max_completion_tokens': 9000, 'timeout': 180}, Got: Error code: 400 - {'error': {'message': 'Input data may contain inappropriate content. For details, see: https://help.aliyun.com/zh/model-studio/error-code#inappropriate-content', 'type': 'data_inspection_failed', 'param': None, 'code': 'data_inspection_failed'}, 'id': 'chatcmpl-154ff82f-c06f-9063-98d0-0e2a59e301b4', 'request_id': '154ff82f-c06f-9063-98d0-0e2a59e301b4'}
ERROR: LLM func: Error in decorated function for task 140528032922688_919984.602066943: Error code: 400 - {'error': {'message': 'Input data may contain inappropriate content. For details, see: https://help.aliyun.com/zh/model-studio/error-code#inappropriate-content', 'type': 'data_inspection_failed', 'param': None, 'code': 'data_inspection_failed'}, 'id': 'chatcmpl-154ff82f-c06f-9063-98d0-0e2a59e301b4', 'request_id': '154ff82f-c06f-9063-98d0-0e2a59e301b4'}
ERROR: Failed to extract entities and relationships: C[4/6]: chunk-4a1579de11a5b4b24f3d3f69a2350ede: BadRequestError: Error code: 400 - {'error': {'message': 'Input data may contain inappropriate content. For details, see: https://help.aliyun.com/zh/model-studio/error-code#inappropriate-content', 'type': 'data_inspection_failed', 'param': None, 'code': 'data_inspection_failed'}, 'id': 'chatcmpl-154ff82f-c06f-9063-98d0-0e2a59e301b4', 'request_id': '154ff82f-c06f-9063-98d0-0e2a59e301b4'} (Original exception could not be reconstructed: APIStatusError.init() missing 2 required keyword-only arguments: 'response' and 'body')
ERROR: Traceback (most recent call last):
File "/app/lightrag/operate.py", line 2962, in _process_with_semaphore
return await _process_single_content(chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/operate.py", line 2847, in _process_single_content
final_result, timestamp = await use_llm_func_with_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 2029, in use_llm_func_with_cache
res: str = await use_llm_func(
^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 1016, in wait_func
return await future
^^^^^^^^^^^^
File "/app/lightrag/utils.py", line 720, in worker
result = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
return await fut
^^^^^^^^^
File "/app/lightrag/api/lightrag_server.py", line 512, in optimized_openai_alike_model_complete
return await openai_complete_if_cache(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 189, in async_wrapped
return await copy(fn, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 111, in call
do = await self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 153, in iter
result = await action(retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
return call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/tenacity/init.py", line 400, in
self._add_action_func(lambda rs: rs.outcome.result())
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/app/.venv/lib/python3.12/site-packages/tenacity/asyncio/init.py", line 114, in call
result = await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/llm/openai.py", line 334, in openai_complete_if_cache
response = await openai_async_client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 2603, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1794, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1594, in request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': 'Input data may contain inappropriate content. For details, see: https://help.aliyun.com/zh/model-studio/error-code#inappropriate-content', 'type': 'data_inspection_failed', 'param': None, 'code': 'data_inspection_failed'}, 'id': 'chatcmpl-154ff82f-c06f-9063-98d0-0e2a59e301b4', 'request_id': '154ff82f-c06f-9063-98d0-0e2a59e301b4'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/app/lightrag/operate.py", line 2966, in _process_with_semaphore
raise prefixed_exception from e
RuntimeError: chunk-4a1579de11a5b4b24f3d3f69a2350ede: BadRequestError: Error code: 400 - {'error': {'message': 'Input data may contain inappropriate content. For details, see: https://help.aliyun.com/zh/model-studio/error-code#inappropriate-content', 'type': 'data_inspection_failed', 'param': None, 'code': 'data_inspection_failed'}, 'id': 'chatcmpl-154ff82f-c06f-9063-98d0-0e2a59e301b4', 'request_id': '154ff82f-c06f-9063-98d0-0e2a59e301b4'} (Original exception could not be reconstructed: APIStatusError.init() missing 2 required keyword-only arguments: 'response' and 'body')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/app/lightrag/lightrag.py", line 1903, in process_document
chunk_results = await entity_relation_task
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/lightrag.py", line 2187, in _process_extract_entities
raise e
File "/app/lightrag/lightrag.py", line 2172, in _process_extract_entities
chunk_results = await extract_entities(
^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/operate.py", line 3008, in extract_entities
raise prefixed_exception from first_exception
RuntimeError: C[4/6]: chunk-4a1579de11a5b4b24f3d3f69a2350ede: BadRequestError: Error code: 400 - {'error': {'message': 'Input data may contain inappropriate content. For details, see: https://help.aliyun.com/zh/model-studio/error-code#inappropriate-content', 'type': 'data_inspection_failed', 'param': None, 'code': 'data_inspection_failed'}, 'id': 'chatcmpl-154ff82f-c06f-9063-98d0-0e2a59e301b4', 'request_id': '154ff82f-c06f-9063-98d0-0e2a59e301b4'} (Original exception could not be reconstructed: APIStatusError.init() missing 2 required keyword-only arguments: 'response' and 'body')

ERROR: Failed to extract document 37/583398: (2025)京0114民初20623号

Additional Information

  • LightRAG Version: v1.4.9.11
  • Operating System: Ubuntu 22 Docker Compose
  • Python Version:
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions