refactor(CodeGeneratorNode): Code Extraction Improvements#38
Conversation
将原有的内联代码提取逻辑抽取为独立方法 `extractCodeFromContent()`,并添加详细的注释说明各种提取策略。同时,在 `GraalCodeExecutor` 中增强了错误日志记录能力,能够更精确地定位执行错误的具体位置。此外,还增加了单元测试以验证新的代码提取功能。
This commit improves the readability and maintainability of the test file by updating comments and assertions to English while preserving their technical meaning. The changes include renaming nested classes, updating test method names and descriptions, modifying assertion messages, and standardizing variable values across different test scenarios. These updates make the tests more accessible to international developers without altering any functional logic. The primary goal is to enhance clarity and consistency in documentation and error messaging within the testing framework. By aligning these elements with common practices in software development, we aim to reduce potential misunderstandings and facilitate easier collaboration among team members working on the project.
There was a problem hiding this comment.
Pull request overview
This PR improves the agent’s Python code generation/execution robustness by (1) extracting executable Python code more reliably from LLM responses and (2) enhancing GraalPython execution failure diagnostics with surrounding code context logging, backed by new unit tests.
Changes:
- Refactors
CodeGeneratorNodeto centralize robust code extraction viaextractCodeFromContent. - Adds error-context logging in
GraalCodeExecutorto print code lines around the reported error line. - Introduces a comprehensive JUnit 5 test suite for extraction scenarios and adds the test dependency.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
assistant-agent-core/src/main/java/com/alibaba/assistant/agent/core/executor/GraalCodeExecutor.java |
Adds error-line extraction + surrounding-code logging on execution failures. |
assistant-agent-autoconfigure/src/main/java/com/alibaba/assistant/agent/autoconfigure/subagent/node/CodeGeneratorNode.java |
Adds extractCodeFromContent to robustly strip markdown/natural language and select code blocks. |
assistant-agent-autoconfigure/src/test/java/com/alibaba/assistant/agent/autoconfigure/subagent/node/CodeGeneratorNodeExtractCodeTest.java |
Adds extraction coverage across common/edge LLM output formats. |
assistant-agent-autoconfigure/pom.xml |
Adds JUnit Jupiter test dependency for the new test suite. |
Comments suppressed due to low confidence (1)
assistant-agent-core/src/main/java/com/alibaba/assistant/agent/core/executor/GraalCodeExecutor.java:862
- 当无法从异常中提取行号时,这里会把“全部待执行代码(带行号)”以 error 级别打印出来;如果生成代码很长会导致日志爆炸(并可能把敏感上下文写入生产日志)。建议改为只打印前/后 N 行并截断总字符数(或降级到 debug),并在日志里明确已截断。
} else {
return value.toString();
}
}
private String getStackTrace(Exception e) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (matcher.find()) { | ||
| // 如果有多个代码块,取最后一个(通常最终版本在最后) | ||
| String lastBlock = matcher.group(1); | ||
| while (matcher.find()) { | ||
| lastBlock = matcher.group(1); | ||
| } | ||
| String code = lastBlock.trim(); | ||
| if (!code.isEmpty()) { | ||
| return code; | ||
| } | ||
| } |
There was a problem hiding this comment.
extractCodeFromContent 在检测到多个 fenced code block 时始终取“最后一个”匹配;如果最后一个代码块为空(例如模型输出了一个空的 python 结尾块),这里会丢掉前面非空的代码块并进入后续兜底逻辑,最终可能返回包含 markdown/自然语言的原文。建议在遍历 matcher 时记录“最后一个非空代码块”,优先返回最后一个非空 block。
| if (matcher.find()) { | |
| // 如果有多个代码块,取最后一个(通常最终版本在最后) | |
| String lastBlock = matcher.group(1); | |
| while (matcher.find()) { | |
| lastBlock = matcher.group(1); | |
| } | |
| String code = lastBlock.trim(); | |
| if (!code.isEmpty()) { | |
| return code; | |
| } | |
| } | |
| // 遍历所有匹配,记录“最后一个非空代码块” | |
| String lastNonEmptyCodeBlock = null; | |
| while (matcher.find()) { | |
| String block = matcher.group(1); | |
| if (block != null) { | |
| String candidate = block.trim(); | |
| if (!candidate.isEmpty()) { | |
| lastNonEmptyCodeBlock = candidate; | |
| } | |
| } | |
| } | |
| if (lastNonEmptyCodeBlock != null) { | |
| return lastNonEmptyCodeBlock; | |
| } |
| * <ul> | ||
| * <li>GraalPython: {@code SyntaxError: ... (Unnamed, line 21)}</li> | ||
| * <li>GraalPython: {@code ... at line 21}</li> |
There was a problem hiding this comment.
extractErrorLineNumber 每次调用都会重新 Pattern.compile(...)。虽然只在异常路径触发,但仍建议提取为 private static final Pattern 复用,减少重复编译并让意图更清晰。
| <dependency> | ||
| <groupId>org.junit.jupiter</groupId> | ||
| <artifactId>junit-jupiter</artifactId> | ||
| <version>5.11.0</version> |
There was a problem hiding this comment.
该模块的父 POM 已通过根 pom.xml 导入了 spring-boot-dependencies(BOM)来统一管理测试依赖版本;仓库内其它模块引入 junit-jupiter 也未显式声明版本。这里固定写死 5.11.0 可能导致版本漂移/冲突(尤其是多模块依赖收敛时)。建议移除 <version>,交由 BOM 统一管理。
| <version>5.11.0</version> |
This pull request introduces robust code extraction and error logging enhancements for the agent's code generation and execution modules. The main improvements focus on reliably parsing Python code from LLM outputs (handling markdown, natural language, and edge cases), and providing better diagnostics for code execution errors by logging the context around error lines. Comprehensive unit tests have been added to ensure the extraction logic works across real-world scenarios.
Code Extraction Improvements
CodeGeneratorNodeto include a newextractCodeFromContentmethod that robustly parses Python code from LLM outputs, supporting markdown code blocks, natural language prefixes, multiple code blocks (returns the last), and fallback strategies.PatternandMatcherto support regex-based extraction logic.Testing
CodeGeneratorNodeExtractCodeTest.javacovering standard markdown, natural language, multiple code blocks, no markdown, edge cases, and real-world LLM output formats.pom.xml.Error Logging and Diagnostics
GraalCodeExecutorto log the code context around error lines when execution fails, extracting line numbers from exception messages and printing surrounding code for easier debugging.executeandexecuteDirectmethods to call the new logging function, and improved error log messages for clarity. [1] [2]