[Discussion] Supervisor + Sub-Agent Orchestration: Concrete Multi-Agent Use Cases for Event-Driven Agents #660

weiqingy · 2026-05-12T05:37:34Z

weiqingy
May 12, 2026
Collaborator

This thread proposes a concrete use case and primitive set for multi-agent orchestration — directly responding to @xintongsong's question in #516:

"How important is it to support multi-agent system for event-driven agents? What are the concrete use cases? And how do we build a multi-agent system on Flink's streaming engine?"

It builds on the gap analysis @thefalc opened in #84 (which identified "agent self-reflection / output evaluation" and "orchestrator-worker / hierarchical patterns" as core multi-agent gaps but didn't propose a specific primitive set) and the foundations being laid by #429 (async execution) and #598 (durable execution reconcile).

Motivation: capability orchestration, not LLM chaining

Internal production discussions at my company landed on a framing worth surfacing:

"Agentic" should not mean "calling LLMs in a loop." It should mean autonomously orchestrating heterogeneous capabilities — LLMs, internal services, REST APIs, ML models, other agents — based on reasoning. Any text-in / text-out service should be a peer to an LLM in the orchestration graph.

Today flink-agents leans LLM-centric in naming, examples, and resource model (CHAT_MODEL is first-class; everything else is "a tool"). This framing limits perceived applicability — users see "Flink + LLM calls" rather than "Flink for autonomous capability orchestration." That's a positioning weakness; capability-agnostic framing is what enterprise teams want.

Concrete use case: supervisor + sub-agent with iterative refinement

A pattern repeatedly requested by enterprise teams — and well-established in the Python ecosystem (LangGraph create_supervisor, CrewAI hierarchical mode, AutoGen GroupChat).

Example: real-time customer support ticket triage on a Kafka stream

Kafka ticket stream
        │
        ▼
┌────────────────┐
│   Supervisor   │  reads ticket, decides which sub-agent to delegate
│     Agent      │
└────────┬───────┘
         │
   ┌─────┴──────┬──────────────┬───────────────┐
   ▼            ▼              ▼               ▼
Classifier   Knowledge      Ticket-API     External Agent
 Sub-Agent   Search          (REST,         (LangChain
 (LLM)       Sub-Agent       no LLM)        wrapped as
              (LLM+RAG)                     REST endpoint)
   │            │              │               │
   └────────────┴──────────────┴───────────────┘
                       │
                       ▼
                Supervisor evaluates:
                "Is this answer complete & accurate?"
                       │
                ┌──────┴───────┐
                │ NO           │ YES
                ▼              ▼
        Refine prompt    Emit response
        with feedback    to output topic
        → loop back
        (bounded by max rounds /
         quality threshold / cost budget)

Key properties:

Heterogeneous sub-agents — LLM, RAG, REST service, external agent — look the same to the supervisor.
Iterative refinement — supervisor judges sub-agent output and re-prompts with feedback context until satisfied.
Streaming-native — runs continuously over a Kafka stream, not per-request.
Durable — multi-round loop survives task manager crashes via Flink keyed state + checkpoints.

Why this pattern fits flink-agents (vs LangGraph / CrewAI)

LangGraph already does supervisor + refinement very well — per request, in one Python process, with DB-backed checkpointing. flink-agents' opportunity is the same pattern over continuous streaming inputs with distributed exactly-once durability. A multi-round refinement loop processing millions of events per day, surviving node failures, is something neither LangGraph nor CrewAI can offer. That answers @xintongsong's question on "how is it different on a streaming engine."

Primitives to discuss

I checked all open PRs/issues — nothing addresses these. ResourceType has CHAT_MODEL, TOOL, MCP_SERVER, SKILLS but no abstraction for "another agent" or "remote text service as peer to LLM." ReActAgent is single-agent only. No correlation/reply primitives for cross-job calls.

The list below is a starting proposal — the goal of this thread is to decide which of these are must-have to unblock the use case, which can wait, and which are out of scope.

Tier 1

Unified callable resource type — one abstraction subsuming Tool + REST service + sub-agent (sync HTTP, async Kafka, MCP). Supervisor shouldn't care about wire protocol. Directly addresses the capability-agnostic framing.
Async cross-job RPC pattern — correlation IDs, reply topics, timeouts, retries, in-flight state in Flink keyed state. This is the streaming-durability differentiator — Flink already gives us durable state + exactly-once; we should expose it as a clean "delegate to another flink-agent and await reply" primitive instead of forcing every team to reinvent the plumbing.

Tier 2 (likely starts as recipes, promoted to primitives after validation)

Judge / critic step — document the pattern first with example code; formalize only after multiple users converge on the shape. Avoids over-abstraction.
Richer loop termination — quality threshold, budget (tokens, wall-clock, rounds), not just AGENT_MAX_ITERATIONS.

Tier 0 (free wins, can do now)

Reframe docs/examples — lead with "agents orchestrate capabilities," not "agents call LLMs." Add a multi-agent reference example using existing primitives (Tool + ReActAgent + Kafka request/reply) to demonstrate the pattern works today, even if rough.

Open questions for the community

Is this use case compelling enough to move multi-agent from "Not likely" to "If possible" in the 0.3 roadmap (#516) (or 0.4)? The supervisor pattern has broad industry validation; the streaming-durable variant is uniquely ours.
Should the supervisor pattern be built first as a recipe/example using existing primitives, validated, and only then formalized into framework primitives? I'd argue yes — avoids LangChain's "too many abstractions too fast" mistake.
Should we add new resource types (SubAgent, RemoteCapability), or extend existing TOOL / MCP_SERVER? Extending is less disruptive; new types are cleaner.
Cross-job RPC over Kafka — correlation ID + reply topic is the obvious shape, but timer-based timeouts, retries, and dead-lettering deserve a dedicated design proposal.

Linking @xintongsong @thefalc @yanand0909 since you've engaged on adjacent topics in #516 / #84.

weiqingy · 2026-05-12T05:39:12Z

weiqingy
May 12, 2026
Collaborator Author

@alnzng feel free to add any additional thoughts on this topic and the use case.

0 replies

wenjin272 · 2026-05-12T06:33:37Z

wenjin272
May 12, 2026
Collaborator

Hi, @weiqingy. Actually, we’ve recently received some requests for sub-agents and are currently considering how to address this within flink-agents. This work is being handled by other colleagues at the moment, but I’d like to share what I know about it.

Recently, the Flink community initiated a discussion on FLIP-577, aiming to lay out a direction for evolving Flink into a data engine that natively supports AI workloads. It mentions the hope of introducing a new RpcOperator in Flink, which is independently deployed outside the stream graph and can communicate with operators within the stream graph via RPC. @xintongsong shared in community discussions his view that RpcOperator can help flink-agents implement sub-agent capabilities.

I believe this discussion is highly beneficial for clarifying the requirements and design of multi-agent systems. Thank you for your insights, and I will also invite my colleagues to join this discussion.

0 replies

xintongsong · 2026-05-12T11:46:15Z

xintongsong
May 12, 2026
Collaborator

Hi @weiqingy,

Thanks for putting this thread together — it does a good job laying out concrete use cases and the open questions to discuss. On the big picture: I think sub-agent support is worth pursuing.

That "Not likely" line of mine in #516 is actually a February take. Since then we've had a lot of conversations with team members and some users we're talking to, and my view on sub-agents has evolved. As I mentioned in the email @wenjin272 linked, if Flink Agents can support sub-agents, I see three benefits:

Better compatibility with the broader agent skill ecosystem
Better context isolation
Independently scalable shared resource pools

So putting this on the 0.3 / 0.4 roadmap makes sense.

Two quick questions before going further:

How are the Tier 0 / 1 / 2 priorities organized? Tier 2 and Tier 0 both have short notes ("likely starts as recipes", "free wins"), but Tier 1 doesn't — curious about your original intent.
By "SubAgent", do you mean another standalone Flink Agents job? I read yes from "cross-job RPC over Kafka" and "delegate to another flink-agent and await reply", but want to confirm since it affects the where-it-runs discussion below.

Now to share how we've been thinking about sub-agents.

Before getting into the supervisor + sub-agent pattern itself, I think it's worth pinning down where sub-agents actually run. There are roughly three possibilities (the first one splits into two sub-cases):

1. Supervisor and sub-agents in the same Flink Agents job

1.a In the same operator: Today's Flink Agents can already support this: you write the supervisor and each sub-agent as separate actions, each with its own prompt and toolset, and use events to invoke a sub-agent and return results. The support today isn't friendly though; users have to wire a lot of things themselves. And this approach doesn't get you the "independently scalable shared resource pool" benefit.
1.b In different operators: Sub-agent resource pools can be scaled independently, and a sub-agent can be shared across multiple supervisors in the same job. To support this, we need a way to do request & response loops between two Flink operators. The RpcOperator from FLIP-577 looks like a nice fit here.

2. Supervisor and sub-agents in different Flink Agents jobs

I'm not fully sold on the use case for this shape yet. Multiple operators and agents inside a single Flink job share a lifecycle and deployment story, so if a sub-agent is required for the supervisor to run, putting them in the same job seems more natural.

My guess is you're proposing Kafka between Flink Agents jobs to handle the case where a sub-agent isn't available when the supervisor calls it? But what's the advantage over just colocating them in the same job?

That said, I can think of a legitimate scenario for multi-agent collaboration across jobs: each agent owns a dedicated responsibility along with the data / state it needs, and processes tasks coming from various requesters. Think of a company with separate procurement, sales, warehousing, logistics, and after-sales departments, where orders flow between departments without always going through a supervisor. This looks more like a system of independent services, each with its own mailbox / request queue: upstream drops tasks into Kafka, the current service picks them up, processes them, and forwards results to the downstream mailbox. Kafka fits naturally here, but this is a different architecture from supervisor-subagent.

The other way around — connecting supervisor and sub-agents via Kafka — means each supervisor / sub-agent pair needs two queues (input and output). That feels complex and not very natural.

So on your Tier 1 item 2 (async cross-job RPC pattern) and open question 4 (cross-job RPC over Kafka design), I'd suggest first clarifying the use case for cross-job, then discussing the concrete design.

3. Supervisor in a Flink Agents job, sub-agents served by an external RPC framework

This is really just an async remote call inside a custom action — the server side could be an agent or any other RPC / HTTP service, doesn't matter.

On our team's end, the main focus is 1.b (same job, different operators). This depends on RpcOperator, so it's unlikely to land in 0.3.

In parallel, for cases where the sub-agent doesn't have heavy workload, I think there's a nice opportunity for the community to make 1.a (same job, same operator) more user-friendly: a built-in supervisor + sub-agent implementation along the lines of ReActAgent — possibly even by extending ReActAgent directly — to cut down on what users have to wire by hand. This looks doable within 0.3. Once RpcOperator is ready, the same built-in can be extended to offer a choice between "sub-agent in-operator" and "sub-agent in an independent resource pool".

This built-in implementation incidentally also covers two of your Tier 2 items:

Judge / critic step (item 3): can be a built-in step inside this implementation
Richer loop termination (item 4): quality threshold, token / wall-clock / round budgets can all be exposed as config

So compared to "start as recipe / example" in Tier 0 / Tier 2, going one step further and providing a built-in implementation feels better to me — friendlier for users, and the community can iterate on a single shared implementation.

To wrap up, a few points on specific items in the proposal:

On "flink-agents leans LLM-centric" (Motivation / Tier 0 item 5): I'd push back a bit here. Looking at the single-agent orchestration design today, Flink Agents is really workflow orchestration with action and event as the basic units — calling an LLM, calling a tool, searching a vector store are all just different action types, peers to each other. Once multi-agent support lands, it'll extend to orchestration between agents. So I don't see the current design itself as LLM-centric. That said, if docs or examples are giving that impression, that's a separate matter. Happy to look at specific descriptions you find misleading and figure out how to improve them.

On sub-agent-specific primitives (open question 3): Agreed they're needed, but doing it well takes careful design, and 0.3 looks tight. Punting to the next release cycle feels safer. In the meantime, introducing the sub-agent concept on top of the ReActAgent built-in is a safer move — once Flink Agents API formally introduces sub-agent primitives, we just update the built-in, and users won't notice.

On a unified callable resource type (Tier 1 item 1): I'd hold off on this for now, no rush to abstract. Tool, REST service, and sub-agent are already familiar standalone concepts to both users and models. A unified abstraction looks cleaner conceptually, but doesn't really add capability, and the help with lowering the learning curve seems limited too. If we later hear users actually complaining about switching between the three, we can revisit then.

0 replies

armorer-labs · 2026-05-12T16:18:04Z

armorer-labs
May 12, 2026

For event-driven multi-agent systems, the primitive I would care about most is the durable run record between agents.

A supervisor/sub-agent design becomes much easier to operate if every handoff carries:

root run id
parent step id
sub-agent identity/version
input artifact references
expected output contract
timeout/retry policy
final artifact or failure reason

That seems very aligned with Flink’s strengths: state, replay, event history, and recovery. The interesting part is avoiding “LLM call chains” as the abstraction and instead modeling agents as event-producing workers with inspectable state transitions.

0 replies

pltbkd · 2026-05-13T08:49:26Z

pltbkd
May 13, 2026

Hi, @weiqingy. Thanks a lot for raising this discussion. I'm currently investigating the multi-agent framework, including its integration with the RPC Operator proposed in FLIP-577.

The supervisor+subagent pattern is a very typical application scenario. To facilitate a more productive discussion, I suggest we separate two concerns: (1) the definition and execution of the subagent itself, and (2) the orchestration and execution of the overall workflow.

Workflow Orchestration and Execution

First, I strongly agree with your idea of "callable resource". I believe the introduction of callable resources will bring significant and beneficial changes to the orchestration of subagents—and indeed to the overall orchestration approach and usability of flink-agent.

Historically, flink-agent has adopted an event-driven execution model, including coordination among actions within an agent. User job orchestration has been built around this paradigm. This feels natural in a purely pipeline: each participant completes their task and hands it off to the next, without worrying about who picks it up next. However, in a subagent architecture, the main agent needs to perform further processing based on the subagent's execution results, and the current design becomes less user-friendly.

Actually, LLM actions face a similar issue. flink-agent requires users to split LLM input preparation and output handling into separate steps, manually subscribing to and processing events. To implement a logical operation A that calls a model, users must implement two Actions: Action-A (to produce the chat request) and Action-A' (to handle the chat response). This is not only hard to work with, but also changes how users expect the system to behave. For example, in the diagram below: (1) represents the user's logical intent, (2) is how the user expects the execution to flow, (3) is how flink-agent actually executes it, and (4), describes how user may feel about the execution model-as if all user Actions are serving the LLM, rather than orchestrate their own business logic. I guess this is why you think flink-agent as being LLM-centric.

Building on this, I've rethought about flink-agent's current APIs and execution model, and arrived at conclusions very similar to the "callable resource" concept. We should provide users with a new request-response style interaction paradigm for orchestration, rather than being limited to event-triggered flows. LLM calls and subagents naturally fit the former, while the latter still holds value for decoupled orchestration and flexible subscription. The two paradigms can complement each other, and the event-driven approach can still be used for orchestrating complex subagent workflows.

Users would interact via a new call + await interface. At the implementation level, we can wrap subagents, LLMs, and other callable resources as Actions, automatically orchestrating their request/response flows: call sends the request (with framework-provided request wrapper, response dispatcher, and completion signaling), while await waits for the completion signal and retrieves the result, similar to execute_async. This approach can reuse most of our existing capabilities, including supporting parallel subagent invocations by executing actions in parallel, which is already validated. (Though minor enhancements are still needed; I'll raise a separate discussion on that.)

Definition and Execution of the Subagent Itself

Building on the foundation above, how users define and use subagents becomes clear: a subagent can be as simple as a single Action, or a complex workflow orchestrated via event subscription; at runtime, it can be wrapped as a callable resource and directly called from the main agent, while the framework internally continues to use event-based subscription and scheduling.

However, a subagent entails more than just executing an Action or Action chain. It may also require: isolated context, an independent toolset, specialized prompts, dedicated compute resources, and more. I haven't deeply analyzed the requirements specific to subagents yet. Please feel free to share your ideas.

Regarding subagent execution: based on the approach outlined in section 1, we can already run subagents within the same TaskManager. However, due to the GIL, this model cannot support multiple subagents running concurrently. This may suffice for simple, LLM-centric logic, but for more complex scenarios, we likely need to run subagents in isolated processes or dedicated external resources—to prevent subagents from affecting the main agent's stability or competing for its compute resources.

Currently, the RPC Operator planned in FLIP-577 appears to be a promising option. As Flink's new infrastructure for AI workloads, it enables unified lifecycle and resource management at the job level, while supporting flexible, independent scaling, fault tolerance, and targeted communication optimizations. We can keep an eye on it.

0 replies

jingchang0623-crypto · 2026-05-15T12:02:30Z

jingchang0623-crypto
May 15, 2026

Great technical deep dive. Running a 5-agent team for 90+ days has taught us a few things about supervisor + sub-agent patterns that might be relevant here.

On the "same job vs cross-job" question:

Our production setup uses both:

Same-job (OpenClaw sessions_spawn): We spawn sub-agents for parallel tasks within the same context. Key insight: Pass minimal context to sub-agents, not the full conversation history. Our rule: sub-agent gets task description + relevant files, never the coordinator's full memory. This prevents context pollution.
Cross-job (GitHub Discussions as mailbox): For async collaboration, we use GitHub Discussions as a message bus. Each agent writes to a discussion thread, other agents read and respond. This is slower (minutes to hours latency) but survives crashes, works across time zones, and leaves an audit trail.

On context isolation:

The biggest win from sub-agent architecture is not scalability — it is context hygiene. A coordinator running for hours accumulates garbage context. Spawning fresh sub-agents for specific tasks gives you clean working memory. Our content creation pipeline:

Coordinator: "Write article about MCP"
  ├─ Sub-agent A: Research (fresh context)
  ├─ Sub-agent B: Write (fresh context)  
  └─ Sub-agent C: Review (fresh context)

Each sub-agent sees only its inputs, not the whole history. This reduces hallucination and improves consistency.

On the "judge/critic" pattern:

We implemented this via a competitive pattern: 3 writer agents generate drafts, a judge agent picks the best. The judge uses different evaluation criteria than the writers. This gives better results than a single agent self-critiquing.

Cost control note:

Supervisor + sub-agent is expensive if you run it continuously. We use cron to spawn the supervisor, which then spawns sub-agents, which terminate after completing tasks. No idle agents burning tokens.

Our detailed patterns documented here: https://miaoquai.com/tools/openclaw-multi-agent-orchestration

Thanks for the FLIP-577 reference — the RpcOperator direction looks promising for Flink-native sub-agent support.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Supervisor + Sub-Agent Orchestration: Concrete Multi-Agent Use Cases for Event-Driven Agents #660

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Discussion] Supervisor + Sub-Agent Orchestration: Concrete Multi-Agent Use Cases for Event-Driven Agents #660

Uh oh!

weiqingy May 12, 2026 Collaborator

Motivation: capability orchestration, not LLM chaining

Concrete use case: supervisor + sub-agent with iterative refinement

Why this pattern fits flink-agents (vs LangGraph / CrewAI)

Primitives to discuss

Open questions for the community

Replies: 6 comments

Uh oh!

weiqingy May 12, 2026 Collaborator Author

Uh oh!

wenjin272 May 12, 2026 Collaborator

Uh oh!

xintongsong May 12, 2026 Collaborator

Uh oh!

armorer-labs May 12, 2026

Uh oh!

pltbkd May 13, 2026

Uh oh!

jingchang0623-crypto May 15, 2026

weiqingy
May 12, 2026
Collaborator

weiqingy
May 12, 2026
Collaborator Author

wenjin272
May 12, 2026
Collaborator

xintongsong
May 12, 2026
Collaborator

armorer-labs
May 12, 2026

pltbkd
May 13, 2026

jingchang0623-crypto
May 15, 2026