Skip to content

feat: 为评估项新增超时时间与默认值配置#35

Merged
AQing-527 merged 2 commits intomainfrom
feature/evaluation_with_timeout
Mar 4, 2026
Merged

feat: 为评估项新增超时时间与默认值配置#35
AQing-527 merged 2 commits intomainfrom
feature/evaluation_with_timeout

Conversation

@AQing-527
Copy link
Collaborator

为评估项新增超时时间与默认值配置
修复GraalCodeExecutor中的序列化问题

Copilot AI review requested due to automatic review settings March 4, 2026 05:17
Copy link
Collaborator

@canfuu canfuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 为评估框架的单个评估项(criterion)增加超时与失败兜底能力,并补充 LLM 调用参数(ChatOptions)与 GraalCodeExecutor 的复杂对象序列化处理,同时下调部分 schema 查询日志级别以降低噪音。

Changes:

  • EvaluationSuite / EvaluationCriterion 增加默认超时、单项超时与默认值(timeout/defaultValue)配置,并在执行器中实现超时保护与兜底返回。
  • LLM 评估器(文本/多模态)新增可选 ChatOptions 透传,支持调用时自定义模型参数。
  • GraalCodeExecutor 将复杂对象先 JSON 序列化再转为 Python 字面量,改善直接 toString() 导致的序列化问题;并将部分 schema 查询日志降为 debug。

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/model/EvaluationSuite.java 增加 suite 级默认 criterion 超时配置与访问器
assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/model/EvaluationCriterion.java 增加 criterion 级 timeoutMs 与 defaultValue 字段
assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/executor/CriterionEvaluationAction.java 执行 criterion 时增加超时保护;timeout/error 时回填 defaultValue 并写入状态
assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/evaluator/LLMBasedEvaluator.java 支持可选 ChatOptions 构造与 Prompt 调用透传
assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/evaluator/MultimodalLLMBasedEvaluator.java 多模态评估支持可选 ChatOptions(文本与多模态各自 options)
assistant-agent-evaluation/src/main/java/com/alibaba/assistant/agent/evaluation/builder/EvaluationCriterionBuilder.java builder 增加 timeout/defaultValue 相关便捷方法
assistant-agent-core/src/main/java/com/alibaba/assistant/agent/core/tool/schema/DefaultReturnSchemaRegistry.java getSchema 查询日志由 info 降为 debug
assistant-agent-core/src/main/java/com/alibaba/assistant/agent/core/executor/GraalCodeExecutor.java 复杂对象使用 Jackson JSON 序列化后再转换为 Python 字面量,修复序列化问题
assistant-agent-autoconfigure/src/main/java/com/alibaba/assistant/agent/autoconfigure/subagent/node/CodeGeneratorNode.java 系统提示词新增“禁止生成注释”规则,并将 schema 查询日志降为 debug

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +132 to +138
result = CompletableFuture.supplyAsync(() -> {
if (finalBatchEnabled) {
return executeWithBatching(finalContext, finalDepResults, finalBatchConfig);
} else {
return executeWithoutBatching(finalContext, finalDepResults);
}
}, executorService).get(timeoutMs, TimeUnit.MILLISECONDS);
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout handling uses CompletableFuture.get(timeout…), but on TimeoutException the underlying task is left running. This can leak work and exhaust the executor (especially if many criteria time out). Keep a reference to the future and cancel it (future.cancel(true)) in the timeout path, and consider cleaning up any resources the evaluator may hold.

Suggested change
result = CompletableFuture.supplyAsync(() -> {
if (finalBatchEnabled) {
return executeWithBatching(finalContext, finalDepResults, finalBatchConfig);
} else {
return executeWithoutBatching(finalContext, finalDepResults);
}
}, executorService).get(timeoutMs, TimeUnit.MILLISECONDS);
CompletableFuture<CriterionResult> future = CompletableFuture.supplyAsync(() -> {
if (finalBatchEnabled) {
return executeWithBatching(finalContext, finalDepResults, finalBatchConfig);
} else {
return executeWithoutBatching(finalContext, finalDepResults);
}
}, executorService);
try {
result = future.get(timeoutMs, TimeUnit.MILLISECONDS);
} catch (TimeoutException te) {
future.cancel(true);
throw te;
}

Copilot uses AI. Check for mistakes.
Comment on lines 177 to 182
} catch (Exception e) {
logger.error("Error executing criterion {}: {}", criterion.getName(), e.getMessage(), e);

// Create error result
CriterionResult errorResult = new CriterionResult();
errorResult.setCriterionName(criterion.getName());
errorResult.setStatus(CriterionStatus.ERROR);
errorResult.setErrorMessage(e.getMessage());
errorResult.setStartTimeMillis(System.currentTimeMillis());
errorResult.setEndTimeMillis(System.currentTimeMillis());
// Create error result with default value if available
CriterionResult errorResult = buildErrorResult(startTime, e.getMessage());

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Future#get(timeout…) path can throw InterruptedException. The current generic catch(Exception) will swallow the interrupt and continue; this can break cancellation/shutdown behavior. Handle InterruptedException explicitly by restoring the interrupt flag (Thread.currentThread().interrupt()) and building an appropriate result (or rethrow).

Copilot uses AI. Check for mistakes.
super(textModel, evaluatorId, textChatOptions);
this.multimodalChatModel = multimodalModel;
this.multimodalChatOptions = multimodalChatOptions;
this.objectMapper = objectMapper != null ? objectMapper : new ObjectMapper();
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor falls back to new ObjectMapper() when the provided mapper is null, but the class relies on a mapper configured to ignore unknown properties (see createDefaultObjectMapper()). Using a plain ObjectMapper here can reintroduce the deserialization failures this class is designed to avoid; default to createDefaultObjectMapper() instead.

Suggested change
this.objectMapper = objectMapper != null ? objectMapper : new ObjectMapper();
this.objectMapper = objectMapper != null ? objectMapper : createDefaultObjectMapper();

Copilot uses AI. Check for mistakes.
logger.warn("GraalCodeExecutor#convertComplexObjectToPythonLiteral - reason=JSON序列化失败, " +
"valueType={}, error={}, 退化为toString()", value.getClass().getName(), e.getMessage());
String strValue = value.toString();
return "\"" + strValue.replace("\\", "\\\\").replace("\"", "\\\"") + "\"";
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the JSON-serialization fallback, the string is wrapped with normal quotes but newlines aren’t handled. If value.toString() contains '\n', the generated Python literal becomes invalid code. Reuse the same multiline-safe escaping logic as the String branch (triple quotes when needed) for this fallback path too.

Suggested change
return "\"" + strValue.replace("\\", "\\\\").replace("\"", "\\\"") + "\"";
// 使用通用的字符串转换逻辑,确保多行字符串也能安全转换为 Python 字面量
return toPythonLiteral(strValue);

Copilot uses AI. Check for mistakes.
@AQing-527 AQing-527 merged commit e1b7dec into main Mar 4, 2026
1 check passed
@AQing-527 AQing-527 deleted the feature/evaluation_with_timeout branch March 4, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants