Conversation
There was a problem hiding this comment.
Pull request overview
该 PR 为评估框架的单个评估项(criterion)增加超时与失败兜底能力,并补充 LLM 调用参数(ChatOptions)与 GraalCodeExecutor 的复杂对象序列化处理,同时下调部分 schema 查询日志级别以降低噪音。
Changes:
- 为
EvaluationSuite/EvaluationCriterion增加默认超时、单项超时与默认值(timeout/defaultValue)配置,并在执行器中实现超时保护与兜底返回。 - LLM 评估器(文本/多模态)新增可选
ChatOptions透传,支持调用时自定义模型参数。 GraalCodeExecutor将复杂对象先 JSON 序列化再转为 Python 字面量,改善直接toString()导致的序列化问题;并将部分 schema 查询日志降为 debug。
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/model/EvaluationSuite.java | 增加 suite 级默认 criterion 超时配置与访问器 |
| assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/model/EvaluationCriterion.java | 增加 criterion 级 timeoutMs 与 defaultValue 字段 |
| assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/executor/CriterionEvaluationAction.java | 执行 criterion 时增加超时保护;timeout/error 时回填 defaultValue 并写入状态 |
| assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/evaluator/LLMBasedEvaluator.java | 支持可选 ChatOptions 构造与 Prompt 调用透传 |
| assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/evaluator/MultimodalLLMBasedEvaluator.java | 多模态评估支持可选 ChatOptions(文本与多模态各自 options) |
| assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/builder/EvaluationCriterionBuilder.java | builder 增加 timeout/defaultValue 相关便捷方法 |
| assistant-agent-core/src/main/java/com/alibaba/assistant/agent/core/tool/schema/DefaultReturnSchemaRegistry.java | getSchema 查询日志由 info 降为 debug |
| assistant-agent-core/src/main/java/com/alibaba/assistant/agent/core/executor/GraalCodeExecutor.java | 复杂对象使用 Jackson JSON 序列化后再转换为 Python 字面量,修复序列化问题 |
| assistant-agent-autoconfigure/src/main/java/com/alibaba/assistant/agent/autoconfigure/subagent/node/CodeGeneratorNode.java | 系统提示词新增“禁止生成注释”规则,并将 schema 查询日志降为 debug |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result = CompletableFuture.supplyAsync(() -> { | ||
| if (finalBatchEnabled) { | ||
| return executeWithBatching(finalContext, finalDepResults, finalBatchConfig); | ||
| } else { | ||
| return executeWithoutBatching(finalContext, finalDepResults); | ||
| } | ||
| }, executorService).get(timeoutMs, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Timeout handling uses CompletableFuture.get(timeout…), but on TimeoutException the underlying task is left running. This can leak work and exhaust the executor (especially if many criteria time out). Keep a reference to the future and cancel it (future.cancel(true)) in the timeout path, and consider cleaning up any resources the evaluator may hold.
| result = CompletableFuture.supplyAsync(() -> { | |
| if (finalBatchEnabled) { | |
| return executeWithBatching(finalContext, finalDepResults, finalBatchConfig); | |
| } else { | |
| return executeWithoutBatching(finalContext, finalDepResults); | |
| } | |
| }, executorService).get(timeoutMs, TimeUnit.MILLISECONDS); | |
| CompletableFuture<CriterionResult> future = CompletableFuture.supplyAsync(() -> { | |
| if (finalBatchEnabled) { | |
| return executeWithBatching(finalContext, finalDepResults, finalBatchConfig); | |
| } else { | |
| return executeWithoutBatching(finalContext, finalDepResults); | |
| } | |
| }, executorService); | |
| try { | |
| result = future.get(timeoutMs, TimeUnit.MILLISECONDS); | |
| } catch (TimeoutException te) { | |
| future.cancel(true); | |
| throw te; | |
| } |
| } catch (Exception e) { | ||
| logger.error("Error executing criterion {}: {}", criterion.getName(), e.getMessage(), e); | ||
|
|
||
| // Create error result | ||
| CriterionResult errorResult = new CriterionResult(); | ||
| errorResult.setCriterionName(criterion.getName()); | ||
| errorResult.setStatus(CriterionStatus.ERROR); | ||
| errorResult.setErrorMessage(e.getMessage()); | ||
| errorResult.setStartTimeMillis(System.currentTimeMillis()); | ||
| errorResult.setEndTimeMillis(System.currentTimeMillis()); | ||
| // Create error result with default value if available | ||
| CriterionResult errorResult = buildErrorResult(startTime, e.getMessage()); | ||
|
|
There was a problem hiding this comment.
The new Future#get(timeout…) path can throw InterruptedException. The current generic catch(Exception) will swallow the interrupt and continue; this can break cancellation/shutdown behavior. Handle InterruptedException explicitly by restoring the interrupt flag (Thread.currentThread().interrupt()) and building an appropriate result (or rethrow).
| super(textModel, evaluatorId, textChatOptions); | ||
| this.multimodalChatModel = multimodalModel; | ||
| this.multimodalChatOptions = multimodalChatOptions; | ||
| this.objectMapper = objectMapper != null ? objectMapper : new ObjectMapper(); |
There was a problem hiding this comment.
This constructor falls back to new ObjectMapper() when the provided mapper is null, but the class relies on a mapper configured to ignore unknown properties (see createDefaultObjectMapper()). Using a plain ObjectMapper here can reintroduce the deserialization failures this class is designed to avoid; default to createDefaultObjectMapper() instead.
| this.objectMapper = objectMapper != null ? objectMapper : new ObjectMapper(); | |
| this.objectMapper = objectMapper != null ? objectMapper : createDefaultObjectMapper(); |
| logger.warn("GraalCodeExecutor#convertComplexObjectToPythonLiteral - reason=JSON序列化失败, " + | ||
| "valueType={}, error={}, 退化为toString()", value.getClass().getName(), e.getMessage()); | ||
| String strValue = value.toString(); | ||
| return "\"" + strValue.replace("\\", "\\\\").replace("\"", "\\\"") + "\""; |
There was a problem hiding this comment.
In the JSON-serialization fallback, the string is wrapped with normal quotes but newlines aren’t handled. If value.toString() contains '\n', the generated Python literal becomes invalid code. Reuse the same multiline-safe escaping logic as the String branch (triple quotes when needed) for this fallback path too.
| return "\"" + strValue.replace("\\", "\\\\").replace("\"", "\\\"") + "\""; | |
| // 使用通用的字符串转换逻辑,确保多行字符串也能安全转换为 Python 字面量 | |
| return toPythonLiteral(strValue); |
为评估项新增超时时间与默认值配置
修复GraalCodeExecutor中的序列化问题