Skip to content

v1.1.2.2

Latest

Choose a tag to compare

@chickenlj chickenlj released this 10 Mar 09:32
· 2 commits to main since this release
7405a7d

AgentScope Integration

AgentScope Java is an agent-oriented programming framework for building LLM-powered applications.

# Add the following dependency to ochestrate AgentScope using SAA Graph
<dependency>
  <groupId>com.alibaba.cloud.ai</groupId>
  <artifactId>spring-ai-alibaba-starter-agentscope</artifactId>
  <version>1.1.2.2</version>
</dependency>

Multiagent Patterns

  • Subagent – Main orchestrator delegates tasks to specialized sub-agents (codebase explorer, web researcher, etc.) via Task/TaskOutput tools; supports both Markdown and API-defined sub-agents.
  • Supervisor – Central supervisor agent wraps calendar and email agents as tools (AgentTool), invokes them on demand, and synthesizes results.
  • Skills – Single agent uses read_skill to load skill content on demand; system prompt shows only skill descriptions for progressive disclosure and smaller context.
  • Routing
    • Routing (simple) – LlmRoutingAgent classifies the user query, invokes specialist agents (GitHub, Notion, Slack) in parallel, then synthesizes a single answer.
    • Routing (graph) – LlmRoutingAgent as a StateGraph node with preprocess/postprocess and an internal merge node for routing and result synthesis.
  • Handoffs
    • Handoffs (single-agent) – One ReactAgent advances steps via state (e.g. current_step); a ModelInterceptor injects step-specific system prompt and tools per turn.
    • Handoffs (multi-agent) – Sales and support agents as graph nodes; handoff tools update active_agent and conditional edges route between agents.
  • Workflow – Custom workflow examples: RAG (rewrite → retrieve → prepare → agent) and SQL agent (list_tables → get_schema → run_query) as graph-based flows.

Multimodal & Voice Agent

  • Voice Agent – Sandwich architecture (STT → ReactAgent → TTS): WebSocket-based real-time voice with DashScope ASR and CosyVoice TTS, plus text input; agent uses sandwich-order tools and streams events (stt_chunk, agent_chunk, tts_chunk).
  • Multimodal – Vision (image in/out) and TTS: DashScope vision models for image understanding and ReactAgent with media input; image generation via tools (Wanx), TTS via DashScopeAudioSpeechModel; ToolMultimodalResult for structured tool responses (url/base64).