Skip to content

Comments

feat(agent): add streaming and multimodal tool result support (#3912)#4055

Open
SeasonPilot wants to merge 20 commits intoalibaba:mainfrom
SeasonPilot:feat/issue-3912
Open

feat(agent): add streaming and multimodal tool result support (#3912)#4055
SeasonPilot wants to merge 20 commits intoalibaba:mainfrom
SeasonPilot:feat/issue-3912

Conversation

@SeasonPilot
Copy link
Contributor

Describe what this PR does / why we need it

Add support for streaming tool execution and multimodal tool results in the Agent Framework.
This enables tools to emit incremental results during execution and return rich content
types including images, audio, video, and files.

Key additions:

  • StreamingToolCallback interface for tools that emit results incrementally via Flux
  • ToolResult class supporting text, media (images/audio/video), files, and mixed content
  • ToolStreamingOutput for streaming graph execution pipeline integration
  • ToolStreamingErrorHandler for unified error handling across streaming scenarios
  • AgentToolNode.executeToolCallsStreaming() for Flux-based tool execution

Does this pull request fix one issue?

Fixes #3912

Describe how you did it

  1. Created StreamingToolCallback extending AsyncToolCallback with callStream() method
  2. Implemented ToolResult as immutable value class with factory methods for different content types:
  • text(), chunk(), media(), file(), mixed(), error()
  • merge() for combining streaming chunks
  1. Added ToolStreamingOutput extending StreamingOutput with tool-specific fields
  2. Enhanced AgentToolNode with:
  • executeToolCallsStreaming() returning Flux
  • Streaming tool priority over async tools in executeToolByType()
  • Cached Scheduler for efficient reactive execution
    1. Added ToolStreamingErrorHandler for consistent error extraction and recovery

    Describe how to verify it

    Run the new unit tests:

    # Run all new streaming tests
    mvn test -pl spring-ai-alibaba-agent-framework -Dtest=StreamingToolCallbackTest,ToolResultTest,ToolStreamingErrorHandlerTest,AgentToolNodeStreamingExecutionTest
    
    # Run ToolStreamingOutput tests
    mvn test -pl spring-ai-alibaba-graph-core -Dtest=ToolStreamingOutputTest
    
    # Run all tests for affected modules
    mvn test -pl spring-ai-alibaba-agent-framework,spring-ai-alibaba-graph-core
    image image image

    Special notes for reviews

    • StreamingToolCallback takes execution priority over AsyncToolCallback (line 446-451)
    • ToolResult is immutable; all modifications return new instances
    • Streaming execution uses reactive Flux to support backpressure
    • Error handling preserves original exceptions while providing user-friendly messages
    • Added comprehensive test coverage (60+ test cases across 6 test files)

…aba#3988)

Add comprehensive async tool execution capabilities to the agent framework:

- AsyncToolCallback: Interface for async tool operations with CompletableFuture
- CancellableAsyncToolCallback: Async tools with cooperative cancellation
- CancellationToken/DefaultCancellationToken: Cancellation coordination
- ToolStateCollector: Thread-safe parallel tool state management
- StateAwareToolCallback: Marker interface for state-aware tools
- ToolCancelledException: Exception for cancelled tool operations

Enhanced AgentToolNode with:
- Parallel tool execution with configurable executor
- Async tool detection and execution
- Timeout support for async operations
- Graceful error handling and state collection

Added comprehensive test coverage (75 tests) and usage examples.
…tests

The cooperative cancellation and state update tests were failing in CI
due to thread scheduling delays in the ForkJoinPool.commonPool().
CI environments have higher load causing async tasks to start late.

Changes:
- Increased getTimeout() from 50-100ms to 500ms
- Increased await() times from 1-2s to 5-10s
- Adjusted iteration threshold from 100 to 200
…a#3912)

### Describe what this PR does / why we need it

Add support for streaming tool execution and multimodal tool results in the Agent Framework.
This enables tools to emit incremental results during execution and return rich content
types including images, audio, video, and files.

Key additions:
- StreamingToolCallback interface for tools that emit results incrementally via Flux<ToolResult>
- ToolResult class supporting text, media (images/audio/video), files, and mixed content
- ToolStreamingOutput for streaming graph execution pipeline integration
- ToolStreamingErrorHandler for unified error handling across streaming scenarios
- AgentToolNode.executeToolCallsStreaming() for Flux-based tool execution

### Does this pull request fix one issue?

Fixes alibaba#3912

### Describe how you did it

1. Created StreamingToolCallback extending AsyncToolCallback with callStream() method
2. Implemented ToolResult as immutable value class with factory methods for different content types:
   - text(), chunk(), media(), file(), mixed(), error()
   - merge() for combining streaming chunks
3. Added ToolStreamingOutput extending StreamingOutput with tool-specific fields
4. Enhanced AgentToolNode with:
   - executeToolCallsStreaming() returning Flux<Object>
   - Streaming tool priority over async tools in executeToolByType()
   - Cached Scheduler for efficient reactive execution
5. Added ToolStreamingErrorHandler for consistent error extraction and recovery

### Describe how to verify it

Run the new unit tests:
```bash
# Run all new streaming tests
mvn test -pl spring-ai-alibaba-agent-framework -Dtest=StreamingToolCallbackTest,ToolResultTest,ToolStreamingErrorHandlerTest,AgentToolNodeStreamingExecutionTest

# Run ToolStreamingOutput tests
mvn test -pl spring-ai-alibaba-graph-core -Dtest=ToolStreamingOutputTest

# Run all tests for affected modules
mvn test -pl spring-ai-alibaba-agent-framework,spring-ai-alibaba-graph-core
```

### Special notes for reviews

- StreamingToolCallback takes execution priority over AsyncToolCallback (line 446-451)
- ToolResult is immutable; all modifications return new instances
- Streaming execution uses reactive Flux to support backpressure
- Error handling preserves original exceptions while providing user-friendly messages
- Added comprehensive test coverage (60+ test cases across 6 test files)
@Aias00
Copy link
Collaborator

Aias00 commented Jan 21, 2026

@SeasonPilot resolve conflicts pls

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds streaming and multimodal tool result support to the Spring AI Alibaba Agent Framework. It enables tools to emit incremental results during execution and return rich content types including images, audio, video, and files.

Changes:

  • Introduces StreamingToolCallback interface for tools that emit results incrementally via Flux
  • Adds ToolResult class supporting text, media (images/audio/video), files, and mixed content with serialization/deserialization
  • Implements ToolStreamingOutput for streaming graph execution pipeline integration with tool identification metadata
  • Adds comprehensive test coverage (60+ test cases across multiple test files)

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ToolStreamingOutput.java Streaming output wrapper for tool execution results with tool identification (toolCallId/toolName)
ToolStreamingOutputTest.java Comprehensive test coverage for ToolStreamingOutput including JSON escaping and reflection
ToolStreamingErrorHandler.java Unified error handler converting errors to streaming outputs instead of throwing
ToolStreamingErrorHandlerTest.java Tests for error message extraction and timeout/cancellation detection
ToolResult.java Immutable value class supporting text and multimodal content with JSON serialization
ToolResultTest.java Tests for ToolResult factory methods, merge operations, and serialization
ToolStateCollector.java Thread-safe state collector for parallel tool executions with isolation
ToolStateCollectorTest.java Tests for parallel state management and discard functionality
ToolCancelledException.java Exception for cancelled tool executions
StateAwareToolCallback.java Marker interface for tools requiring state injection
ToolCallRequest.java Added defensive copy of context map to prevent external modification
README updates Documentation for new streaming and async features
Multiple test files Extensive test coverage for async execution, cancellation, parallel execution, and integration scenarios
Comments suppressed due to low confidence (1)

spring-ai-alibaba-agent-framework/src/main/java/com/alibaba/cloud/ai/graph/agent/interceptor/ToolCallRequest.java:74

	public Map<String, Object> getContext() {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +334 to +337
* @return the list of media content
*/
public List<Media> getMediaContent() {
return mediaContent;
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMediaContent exposes the internal representation stored in field mediaContent. The value may be modified after this call to getMediaContent.

Suggested change
* @return the list of media content
*/
public List<Media> getMediaContent() {
return mediaContent;
* @return an unmodifiable list of media content (empty if none)
*/
public List<Media> getMediaContent() {
return mediaContent == null ? Collections.emptyList() : Collections.unmodifiableList(mediaContent);

Copilot uses AI. Check for mistakes.
void shouldBeRuntimeException() {
ToolCancelledException exception = new ToolCancelledException("test");

assertTrue(exception instanceof RuntimeException, "Should be a RuntimeException");
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to test whether an instance of ToolCancelledException is also an instance of RuntimeException - it always is.

Copilot uses AI. Check for mistakes.
@Aias00
Copy link
Collaborator

Aias00 commented Jan 22, 2026

@SeasonPilot resolve conflicts pls

SeasonPilot and others added 9 commits January 25, 2026 21:07
…olNode

Fix three bugs identified during code review:
1. Parallel mode cancellation token not propagated to tools on outer timeout
2. Thread pool starvation/deadlock when parallelToolExecution=true with wrapSyncToolsAsAsync=true
3. Sequential execution timeout clearing all tools' state updates instead of isolating per-tool

Add comprehensive test coverage for these scenarios.
Change-Id: I77b3559ed20c0a76c6c94ab6387cbb69d6d76e62
Change-Id: I97b2aea6bb9eeccc59da937fec4a377adc7c90c5
Change-Id: Ib8ed7f97c49e081a74ae3bc490963104ba441283
…safety

Fixes multiple issues in streaming tool execution path:

- Fix semaphore deadlock in parallel tool execution by using single-stage
  pattern where acquire and release happen within same task
- Add returnDirect metadata support to streaming execution GraphResponse.done
- Apply ToolExecutionExceptionProcessor to streaming tool errors for
  consistent error handling across sync/async/streaming paths
- Fix JSON validation in ToolStreamingOutput using Jackson ObjectMapper
- Clear state updates on streaming tool failure to prevent partial state leakage
- Fix null mimeType handling in ToolResult.dtoToMedia
- Fix parallel edge processing in CompiledGraph to avoid IllegalStateException

Add comprehensive test coverage for all fixes.
Replace blocking Flux.collectList().block() with StepVerifier for
reactive stream testing. The 1-second timeout was causing intermittent
failures in CI due to resource contention.

Changes:
- toFluxEmitsAsyncData: use StepVerifier.assertNext() + verifyComplete()
- toFluxPropagatesErrors: use StepVerifier.expectErrorMatches() + verify()
The A2aNodeActionWithConfigTests uses Project Reactor's StepVerifier
for testing reactive streams, but the reactor-test dependency was
missing from pom.xml, causing CI compilation failures.

Added reactor-test with test scope to resolve the compilation error.
Version is managed by Spring Boot BOM for compatibility.
The Hook interface now extends Prioritized which requires getOrder()
implementation. Added the missing method to fix CI compilation failure.
@SeasonPilot
Copy link
Contributor Author

@chickenlj PTAL

…sTest

The TestHookWithTools inner class had getOrder() defined twice,
causing compilation failure in CI. Remove the duplicate definition.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 353 to +356
catch (InterruptedException e) {
Thread.currentThread().interrupt();
failures.add(e);
// CAS: only set error if still null
orderedResponses.compareAndSet(index, null,
ToolCallResponse.error(toolCall.id(), toolCall.name(), "Tool execution was interrupted"));
throw new RuntimeException("Interrupted while waiting for semaphore", e);
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the semaphore acquire path, an InterruptedException is converted into a RuntimeException with message "Interrupted while waiting for semaphore". This bypasses the existing extractErrorMessage(InterruptedException) mapping ("Tool execution was interrupted") and exposes an internal implementation detail in the tool result. Consider handling InterruptedException here by setting an appropriate ToolCallResponse error (and recording failures) without wrapping it, so the user-facing message stays consistent with other interruption paths.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix. The message difference between semaphore acquire paths is minor and doesn't affect functionality. Users rarely encounter these interruption scenarios, and the existing messages are descriptive enough for debugging.

Comment on lines +926 to +929
orderedResponses.set(index, ToolCallResponse.error(toolCall.id(), toolCall.name(),
"No ToolCallback found for tool name: " + toolCall.name()));
return Flux.<Object>just(ToolStreamingErrorHandler.handleError(toolCall.id(), toolCall.name(),
new IllegalStateException("No ToolCallback found"), stateSnapshot, AGENT_TOOL_NAME, agentName));
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a ToolCallback cannot be resolved, the per-tool streaming output is generated from an IllegalStateException("No ToolCallback found"), but the final doneMap response is populated with a different error message ("No ToolCallback found for tool name: ..."). This can lead to inconsistent error reporting between streamed chunks and the final ToolResponseMessage. Consider reusing the same error message (or the same ToolCallResponse instance) for both paths.

Suggested change
orderedResponses.set(index, ToolCallResponse.error(toolCall.id(), toolCall.name(),
"No ToolCallback found for tool name: " + toolCall.name()));
return Flux.<Object>just(ToolStreamingErrorHandler.handleError(toolCall.id(), toolCall.name(),
new IllegalStateException("No ToolCallback found"), stateSnapshot, AGENT_TOOL_NAME, agentName));
String errorMessage = "No ToolCallback found for tool name: " + toolCall.name();
orderedResponses.set(index,
ToolCallResponse.error(toolCall.id(), toolCall.name(), errorMessage));
return Flux.<Object>just(ToolStreamingErrorHandler.handleError(toolCall.id(), toolCall.name(),
new IllegalStateException(errorMessage), stateSnapshot, AGENT_TOOL_NAME, agentName));

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix. This is a code style suggestion. The error messages serve the same purpose and the slight inconsistency doesn't impact functionality or debugging.

synchronized (schedulerLock) {
if (cachedToolScheduler != null && cachedExecutor == executor) {
return cachedToolScheduler;
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOrCreateToolScheduler() replaces the cached Scheduler when the executor changes, but it never disposes the previous Scheduler instance. Over time (e.g., if different executors are supplied via RunnableConfig), this can retain resources/workers unnecessarily. Consider disposing the old cachedToolScheduler when swapping executors (and/or when the node is torn down), or avoid caching if executor variability is expected.

Suggested change
}
}
// Dispose the previous scheduler before replacing it to avoid leaking resources
if (cachedToolScheduler != null) {
cachedToolScheduler.dispose();
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix. The executor rarely changes in practice, so the potential resource leak is minimal. Adding shutdown logic would add complexity without significant benefit for the actual usage patterns.

Comment on lines 201 to 202
// Take the most recent final flag
boolean mergedFinal = other.isFinal || this.isFinal;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolResult.merge() claims to "take the most recent final flag", but the implementation uses other.isFinal || this.isFinal, which keeps isFinal=true once any earlier chunk was final. This can produce incorrect final-state semantics if a later merge should override/clear finality (and it diverges from the comment). Consider using the other (most recent) value for the final flag, or adjust the comment if the intended semantics are sticky-final.

Suggested change
// Take the most recent final flag
boolean mergedFinal = other.isFinal || this.isFinal;
// Take the most recent final flag (from the other, newer result)
boolean mergedFinal = other.isFinal;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated the comment to accurately describe the sticky-final semantics. The OR logic is intentional and correct for streaming scenarios where once any chunk signals completion, the merged result should remain final. See commit 700e1cb.

Map<?, ?> resultValue = (Map<?, ?>) second.resultValue().orElseThrow();
assertEquals("ok", resultValue.get("result"));
})
.verifyComplete();
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StepVerifier.verifyComplete() has no timeout here. If this Flux stops completing due to a regression, the test can hang indefinitely and stall the build. Consider using StepVerifier.verify(Duration) (or adding a timeout operator to the Flux in the test) to keep the test suite bounded.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix. These are synchronous tests with predictable behavior. The risk of hanging is extremely low, and adding timeouts would add unnecessary complexity to the test code.

assertEquals("boom", exception.getMessage());
StepVerifier.create(flux)
.expectErrorMatches(e -> e instanceof IllegalStateException && "boom".equals(e.getMessage()))
.verify();
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StepVerifier.verify() has no timeout here. If the Flux never terminates (e.g., due to a change in error propagation), this test can hang indefinitely. Consider using verify(Duration) to bound execution time.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix. Same reasoning as above - these are synchronous tests with minimal hang risk. The test framework's default timeout is sufficient.

* @return the list of media content
*/
public List<Media> getMediaContent() {
return mediaContent;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMediaContent exposes the internal representation stored in field mediaContent. The value may be modified after this call to getMediaContent.

Suggested change
return mediaContent;
return mediaContent == null ? null : Collections.unmodifiableList(mediaContent);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix. While defensive copying is a good practice, the callers of this API are internal and don't modify the returned list. Adding Collections.unmodifiableList() would be over-engineering for the current usage patterns.

Update comment to accurately describe sticky-final semantics: once any
result is marked final, the merged result stays final (OR logic).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs SAA Docs module area/graph SAA Grpah module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Streaming and multi-modal result support for ToolCallback

3 participants