AgentScope Integration
AgentScope Java is an agent-oriented programming framework for building LLM-powered applications.
- AgentScope – AgentScopeAgent wraps AgentScope ReActAgent as a BaseAgent for use in graph workflows.
# Add the following dependency to ochestrate AgentScope using SAA Graph
<dependency>
<groupId>com.alibaba.cloud.ai</groupId>
<artifactId>spring-ai-alibaba-starter-agentscope</artifactId>
<version>1.1.2.2</version>
</dependency>Multiagent Patterns
- Subagent – Main orchestrator delegates tasks to specialized sub-agents (codebase explorer, web researcher, etc.) via Task/TaskOutput tools; supports both Markdown and API-defined sub-agents.
- Supervisor – Central supervisor agent wraps calendar and email agents as tools (AgentTool), invokes them on demand, and synthesizes results.
- Skills – Single agent uses
read_skillto load skill content on demand; system prompt shows only skill descriptions for progressive disclosure and smaller context. - Routing
- Routing (simple) – LlmRoutingAgent classifies the user query, invokes specialist agents (GitHub, Notion, Slack) in parallel, then synthesizes a single answer.
- Routing (graph) – LlmRoutingAgent as a StateGraph node with preprocess/postprocess and an internal merge node for routing and result synthesis.
- Handoffs
- Handoffs (single-agent) – One ReactAgent advances steps via state (e.g.
current_step); a ModelInterceptor injects step-specific system prompt and tools per turn. - Handoffs (multi-agent) – Sales and support agents as graph nodes; handoff tools update
active_agentand conditional edges route between agents.
- Handoffs (single-agent) – One ReactAgent advances steps via state (e.g.
- Workflow – Custom workflow examples: RAG (rewrite → retrieve → prepare → agent) and SQL agent (list_tables → get_schema → run_query) as graph-based flows.
Multimodal & Voice Agent
- Voice Agent – Sandwich architecture (STT → ReactAgent → TTS): WebSocket-based real-time voice with DashScope ASR and CosyVoice TTS, plus text input; agent uses sandwich-order tools and streams events (stt_chunk, agent_chunk, tts_chunk).
- Multimodal – Vision (image in/out) and TTS: DashScope vision models for image understanding and ReactAgent with media input; image generation via tools (Wanx), TTS via DashScopeAudioSpeechModel; ToolMultimodalResult for structured tool responses (url/base64).