xinference + langchain llm.with_structured_output 多并发时  小概率会报缺少逗号报错  100次大约3次左右

### System Info / 系統信息

langchain1.0    xinference v1.17.0   cuda12.9  

### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

- [ ] docker / docker
- [x] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装

### Version info / 版本信息

xinference v1.17.0

### The command used to start Xinference / 用以启动 xinference 的命令

xinference-local --host 0.0.0.0 --port 9997  方式运行

### Reproduction / 复现过程

class DicInfo(BaseModel):
    """xxxx"""
    category: StateType = Field(description="xxxx")
    confidence: Optional[float] = Field(
        default=0.0,
        ge=0.0, 
        le=1.0, 
        description="xxxx"               
    )
    detail_categories: Optional[List[str]] = Field(
        default_factory=list,
        description="xxxx"
    )

structured_llm = llm.with_structured_output(DicInfo)
messages = [SystemMessage(content=system_prompt), HumanMessage(content=f"{user_input}")]
final_answer = structured_llm.invoke(messages)

Traceback (most recent call last):
  File "/xxxx/xxxx.py", line 130, in classification_execution
    final_answer = structured_llm.invoke(messages)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/site-packages/langchain_core/runnables/base.py", line 3149, in invoke
    input_ = context.run(step.invoke, input_, config, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 402, in invoke
    self.generate_prompt(
  File "/xxxx/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 1121, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/site-packages/langchain_core/language_models/chat_models.py", line 931, in generate
    self._generate_with_cache(
  File "/xxxx/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 1225, in _generate_with_cache
    result = self._generate(
             ^^^^^^^^^^^^^^^
  File "/xxxx/lib/python3.12/site-packages/langchain_xinference/chat_models.py", line 226, in _generate
    final_chunk = self._chat_with_aggregation(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxx/lib/python3.12/site-packages/langchain_xinference/chat_models.py", line 310, in _chat_with_aggregation
    for stream_resp in response:
                       ^^^^^^^^
  File "/xxxx/lib/python3.12/site-packages/xinference/client/common.py", line 62, in streaming_response_iterator
    raise Exception(str(error))
Exception: [address=127.0.0.1:39175, pid=167075] Expecting ',' delimiter: line 3 column 1 (char 120)    

xinference + langchain llm.with_structured_output  多并发时，有小概率会返回少逗号报错。

### Expected behavior / 期待表现

能正常返回完整数据

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

xinference + langchain llm.with_structured_output 多并发时小概率会报缺少逗号报错 100次大约3次左右 #4512

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xinference + langchain llm.with_structured_output 多并发时 小概率会报缺少逗号报错 100次大约3次左右 #4512

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

xinference + langchain llm.with_structured_output 多并发时小概率会报缺少逗号报错 100次大约3次左右 #4512