Build a multi-agent system on Google Kubernetes Engine (GKE) that analyzes sentiment around a topic on social media, then produces a structured report showing the major positions, the debate axes, the crossovers, the antagonisms, and which side has the stronger case on each argument.
The system uses:
- ADK for agent implementation
- A2A for agent-to-agent communication
- OpenAI API as the LLM backend for all agents
- Tavily for web/search grounding
- GKE Standard as the runtime
There are three logical roles:
- Judge Agent
- Orchestrates the run
- Decides how many worker agents to spawn
- Assigns stances
- Interrogates workers
- Aggregates all evidence and arguments
- Invokes the summarizer
- Worker Agent
- A single pod that internally performs both:
- Researcher behavior
- Advocate behavior
- The researcher side gathers evidence and identifies the strongest arguments for the assigned stance
- The advocate side turns that research into a persuasive defense of the stance
- A single pod that internally performs both:
- Summarizer Agent
- Takes the judge’s aggregate output and produces the final polished report
- The judge runs in its own pod and deployment.
- Each worker runs in its own pod.
- The summarizer runs in its own pod.
Even though researcher and advocate are conceptually distinct, they are implemented together in the same worker pod.
Use Redis only.
There are no external IPs for individual containers. All agent communication must happen internally inside the cluster using Kubernetes Services and cluster DNS.
Use Tavily for search grounding in two places:
- The judge uses Tavily during planning to estimate how broad or fragmented the topic is and to decide how many workers to spawn.
- Each worker uses Tavily during research to gather supporting evidence, counterarguments, and examples for its assigned stance.
- GKE Standard
- Artifact Registry for container images
- Cloud Build for building images
- Cloud Deploy for promotion between environments
- Workload Identity Federation for GKE for GCP authentication from pods
- Redis for coordination and run artifacts
- ADK for agent logic and orchestration
- A2A for inter-agent communication
- OpenAI API for model inference
- Tavily API for search/retrieval
- Deployments for judge and summarizer
- Jobs for workers
- Services of type ClusterIP for all agent endpoints
- NetworkPolicies to restrict lateral access
- Secrets for API keys
- ConfigMaps for prompts and static settings
Redis is used as the single shared coordination layer.
- Run metadata
run_id- topic
- status
- timestamps
- spawned stance list
- worker roster
- Agent outputs
- raw worker results
- judge intermediate summaries
- final summarizer output
- Task state
- pending / running / complete / failed
- worker heartbeats if desired
- retries
- cancellation flags
- Temporary coordination structures
- locks
- queues
- per-run maps
- dedupe keys
The judge writes its intermediate and aggregate state to Redis, keyed by run:
run:{run_id}:judge:planrun:{run_id}:judge:interrogationrun:{run_id}:judge:aggregate
Each worker writes a single structured JSON result to Redis:
run:{run_id}:worker:{worker_id}:resultrun:{run_id}:worker:{worker_id}:status
The summarizer writes the final report to Redis:
run:{run_id}:final_reportrun:{run_id}:final_report_md
Use separate namespaces:
judgeworkerssummarizersystem
Type: Deployment
The judge is a long-running service that accepts a topic and controls the full workflow.
Type: Kubernetes Job
Each spawned worker is a one-off job that handles exactly one stance for exactly one run.
Type: Kubernetes Job
The summarizer runs after worker aggregation is complete.
Because there are no external IPs, every agent exposes its A2A endpoint through a ClusterIP Service.
Use internal service DNS names such as:
judge.judge.svc.cluster.localworker-<run-id>-<stance>.workers.svc.cluster.localsummarizer.summarizer.svc.cluster.local
Kubernetes services of type ClusterIP provide a stable internal IP for internal clients, and GKE service discovery maps service names to those IPs through cluster DNS. That is the right exposure mechanism for A2A inside the cluster.
Each agent should expose:
/.well-known/agent-card.jsonPOST /a2a/messagesor the framework’s equivalent message endpointGET /a2a/tasks/{task_id}POST /a2a/tasks/{task_id}/cancel
The exact path implementation can follow the ADK/A2A library defaults, but the service must advertise a resolvable internal URL in its Agent Card.
For each worker Job:
- Create a dedicated ClusterIP Service.
- Label the worker pod with the run/stance identifiers.
- Point the Service selector at that pod.
- Let the judge call the worker over internal DNS.
This avoids any need for external ingress, load balancers, or public IPs.
Allow:
- judge → workers
- judge → summarizer
- all agents → Redis
- all agents → Tavily API egress
- all agents → OpenAI API egress
Deny:
- worker → worker lateral traffic
- unsolicited inbound traffic from outside the namespace boundaries
A2A is used as the agent communication layer.
The judge acts as the A2A client and each worker acts as an A2A server.
The worker publishes an Agent Card describing:
- name
- description
- capabilities
- preferred endpoint URL
- supported transport
The judge uses that card to send tasks and fetch task state.
The summarizer is also an A2A server. The judge sends the aggregate run object to the summarizer and retrieves the polished final report.
A2A only needs a reachable HTTP endpoint, not a public endpoint. In this design, the endpoint is reachable through internal cluster DNS and ClusterIP Services, so the judge can talk to workers and the summarizer entirely inside GKE.
The user submits a topic.
Example:
- “Taylor Swift social sentiment right now”
- “Sentiment around the new policy announcement”
The judge creates a run_id.
The judge performs a short Tavily search pass to estimate:
- how much conversation exists
- whether the conversation is polarized
- what the major axes of disagreement are
- how many workers to spawn
The judge should use a small, bounded search budget so planning stays fast.
The judge outputs a stance plan such as:
- positive
- negative
- mixed
- skeptical
- partisan subgroups
- niche/fringe positions
The exact number of workers is dynamic and can vary by topic.
For each stance, the judge creates a worker Job and a matching internal ClusterIP Service.
Each worker receives:
RUN_IDSTANCE_IDSTANCE_LABELTOPIC- Redis connection info
- OpenAI credentials
- Tavily credentials
The worker first behaves like a researcher:
- uses Tavily search
- gathers evidence
- identifies dominant arguments
- finds rebuttals against the stance
- identifies supporting communities or audience clusters
The same worker then behaves like an advocate:
- strengthens the stance
- packages evidence into a persuasive argument set
- anticipates judge questions
- identifies what would most damage its side
The worker writes a structured JSON result to Redis and marks itself complete.
The judge asks each worker targeted questions such as:
- How popular is your stance?
- What is your strongest point?
- This other agent said [something], what do you have to say to that?
- What do you think of [this piece of evidence]?
It should make decisions on what to ask each agent and in what order based on how it feels the debate is evolving. Note that the research agents do not see each others’ outputs directly.
Judge determines after each round of interrogation (on a specific topic) if there is a clear winner → if not, keeps prompting agents for refutation/rebuttal args → max number of rounds of debate for a specific argument/point is a specific parameter to prevent endless loop of debate.
When it is satisfied, the judge combines all worker results into a single run-level intermediate object.
The judge sends the aggregate object to the summarizer.
The summarizer writes the polished final report to Redis and returns the rendered report.
Each worker must return structured JSON, not freeform prose.
{ "run_id": "...", "stance_id": "...", "stance_label": "positive | negative | mixed | fringe | other", "summary": "...", "top_claims": [ { "claim": "...", "supporting_evidence": ["..."], "rebuttals": ["..."], "confidence": 0.0, "popularity_estimate": "low | medium | high" } ], "crossover_positions": ["..."], "antagonistic_positions": ["..."], "fringe_positions": ["..."], "consensus_points": ["..."], "axes_of_debate": ["..."], "key_sources": ["..."], "confidence": 0.0 }
{ "run_id": "...", "topic": "...", "stances": ["..."], "controversy_level": "low | medium | high", "agreement_matrix": [], "rebuttal_graph": [], "shared_ground": [], "fringe_positions": [], "conversation_locus_shift": "...", "judge_notes": "..." }
The summarizer should produce:
- structured JSON for machine use
- markdown for human reading
The markdown report should include:
- TOPIC
- DEGREE OF CONTROVERSY
- POSITIVE POSITIONS
- NEGATIVE POSITIONS
- ANALYSIS
- POSITIONS THAT HAVE CROSSOVER
- ANTAGONISTIC POSITIONS
- RECOGNIZED SOCIAL COHESION
- HAS THE LOCUS OF CONVERSATION CHANGED OVER TIME?
- FRINGE POSITIONS
- CONSENSUS
- AXES OF DEBATE
The judge must:
- stay neutral
- identify the debate structure
- decide how many workers to spawn
- ask probing questions
- prevent the system from collapsing into one-sided summaries
- prefer structured outputs
Each worker must internally split into two phases:
- skeptical evidence gathering
- Tavily search
- extraction of facts, claims, and communities
- identification of opposing arguments
- defend assigned stance
- steelman the stance
- organize the argument into a persuasive case
- prioritize clarity and internal consistency
The worker should not claim certainty where it does not have it.
The summarizer must:
- compress the judge’s verbose aggregate output
- preserve the actual argumentative structure
- avoid inventing new claims
- produce a clean final report
Recommended key schema:
run:{run_id}:metarun:{run_id}:judge:planrun:{run_id}:judge:aggregaterun:{run_id}:worker:{worker_id}:statusrun:{run_id}:worker:{worker_id}:resultrun:{run_id}:summarizer:statusrun:{run_id}:final_reportrun:{run_id}:final_report_md
Use Redis hashes or JSON values consistently. Pick one canonical representation and stick to it.
Store in Kubernetes Secrets:
- OpenAI API key
- Tavily API key
- Redis auth, if enabled
Use Workload Identity Federation for GKE so workloads can authenticate to Google Cloud without service account keys.
- Judge can create and read worker state.
- Workers can write only their own result keys.
- Summarizer can read the aggregate and write the final report.
- Developer pushes code
- Cloud Build builds the container images
- Images are pushed to Artifact Registry
- Cloud Deploy promotes them to GKE
- Judge deployment is updated first
- Then worker/summarizer images are rolled out by the judge using Kubernetes API
- Redis remains separate and persistent
- judge service
- worker service template
- summarizer service
- Dockerfiles
- Kubernetes manifests
- Services
- Jobs
- NetworkPolicies
- Secrets templates
- Redis connection configuration
- OpenAI wrapper
- Tavily wrapper
- A2A server/client wrapper
- Redis state manager
- Judge planner
- Worker research/advocate pipeline
- Summarizer pipeline
Build in this order:
- Judge service with a single worker
- Worker Job with Redis output
- Internal A2A via ClusterIP Service
- Multiple workers from dynamic stance planning
- Summarizer Job
- Final report formatting
- Judge is separate from workers
- Researcher and advocate are one pod, one worker agent
- Summarizer is always present
- Redis is the only shared persistence layer
- No external IPs per container
- Tavily is used by both judge and workers
- OpenAI is the LLM backend (will be changed later on so keep it generalizable)
- A2A runs over internal Kubernetes Services only