Skip to content

SOLARIS-JHU/Multi-Agent-DPC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Agent Differentiable Predictive Control for Zero-Shot PDE Scalability

"You speak for the whole planet, do you? For the common consciousness of every dewdrop, of every pebble, of even the liquid central core of the planet?"

"I do, and so can any portion of the planet in which the intensity of the common consciousness is great enough."

Isaac Asimov, Foundation and Earth

This project introduces and experiments with a decentralized control framework for systems described by PDEs. Leveraging Tesseract-Jax to implement the PDE solver as a differentiable layer, we leverage the Differentiable Predictive Control framework to enable autonomous agents to interact with the physical field for trajectory tracking.

This project was ideated and evaluated by Pietro Zanotta1, Dibakar Roy1 and Honghui Zheng1 as part of the Tesseract Hackathon 2025.

Contacts:

1: shared first authorship

Heat 2d Decentralized


Key Features

  • Differentiable Operator Learning for Control: we recast policy synthesis for PDE systems as an operator learning problem using the DeepONet framework. By treating the PDE solver as a differentiable layer through the Tesseract differentiable programming library, we compute exact sensitivity gradients for policy optimization then used within the Differentiable Predictive Control framework.
  • Zero-Shot Scalability: Policies trained on a fixed swarm size $N$ generalize to unseen cardinalities $M$ (e.g., training on 20 agents and deploying on 60) without further tuning, allowing resilience to actuator failure.
  • Communication-Free Coordination: We test the scenario where agents operate using local-only sensing and zero inter-agent communication, where we observe an emerging self-normalization property, coming from stigmergic interaction, preventing overactuation.
  • Theoretical Gradient Consistency: We provide a mathematical foundation theorem ensuring that discrete policy gradients converge to the mean-field limit as the swarm size $N \rightarrow \infty$.
  • Parameter Efficiency: In our toy examples, the decentralized approach utilizes 48% fewer parameters in the 1d cases and 76% fewer in the 2d case than centralized benchmarks while maintaining competitive performance.

For a more rigorous discussion about all the above points we suggest reading through our technical document.


Table of Contents


About this Project

This research explores the intersection of Differentiable Programming, Operator Learning, and Swarm Intelligence. We demonstrate that treating a PDE solver as a neural network layer allows for the training of highly efficient, decentralized control policies. In this section we provide a brief introduction to the problem formulation. For a more rigorous discussion we refer to our technical document.

Problem Statement

The control objective is to find an optimal control sequence $U(t) = \lbrace u_i(t) \rbrace_{i=1}^N$ and velocity sequence $V(t) = \lbrace v_i(t) \rbrace_{i=1}^N$ that minimizes a cost functional $\mathcal{J}$ involving a tracking cost $\mathcal{L}_{track}(z, z_{ref})$, a term $\mathcal{L}_{force}(u)$ discouraging large energy consumption, and $\mathcal{L}_{coll}(\xi)$ to prevent collision between the actuators:

$$\min_{U,V} \mathcal{J} = \mathbb{E}_{z_0 \sim \mathcal{D}} \left[ \int_{0}^{T} \left( \mathcal{L}_{track}(z, z_{ref}) + \lambda_u \mathcal{L}_{force}(u) + \lambda_c \mathcal{L}_{coll}(\xi) \right) dt \right]$$

where $\xi$ is the position of the $i$-th actuator.

System Dynamics (PDE): The state field $z(x,t)$ evolves according to a non-homogeneous nonlinear partial differential equation:

$$\frac{\partial z(x,t)}{\partial t} = \mathcal{A}(z; \mu) + \mathcal{B}(x,t)$$

where the total forcing $\mathcal{B}(x,t)$ is the superposition of individual actuator contributions filtered through a spatial Gaussian kernel $b(x, \xi_i)$:

$$\mathcal{B}(x,t) = \sum_{i=1}^{N} b(x, \xi_i(t)) u_i(t)$$

Actuator Kinematics: Each mobile actuator $i \in {1, \dots, N}$ follows first-order integrator dynamics:

$$\frac{d\xi_i(t)}{dt} = v_i(t), \quad \xi_i(0) = \xi_{i,0}$$

Constraints:

  • Control Saturation: $|u_i(t)| \le u_{\max}$
  • Kinematic Limits: $|v_i(t)| \le v_{\max}$
  • Boundary Containment: $\xi_i(t) \in \Omega$

Differentiable Predictive Control

Multi-Agentic_DPC

To synthesize a policy approximating the optimal control sequence $U(t) = \lbrace u_i(t) \rbrace_{i=1}^N$ and velocity sequence $V(t) = \lbrace v_i(t) \rbrace_{i=1}^N$ we rely on Differentiable Predictive Control. In our framework, the control policy is parameterized by a neural operator $\mathcal{G}_{\theta}$ that maps current observations to optimal actions. During training, we perform the following steps:

  • Forward Pass: The current state $z_k$ and control actions $u_k$ are passed through a differentiable operator $\Psi$ (the PDE solver) to predict the future state $z_{k+1}$. It is relevant that such a solver is created using Tesseract, to allow differentiable simulations.
  • Sensitivity Analysis: By applying the chain rule through the solver, we compute exact sensitivity gradients of the future state with respect to the policy parameters $\theta$
  • Policy Optimization: These gradients are used to update the neural network, minimizing the total loss $\mathcal{J}$ over a trajectory of length $K$.

Note that part of the theoretical results on Zero-Shot Scalability rely on a conjecture that we are only empirically validating. For a more rigorous discussion about all the above points we suggest reading through our technical document.

Algorithm pseudocodes can be found below:

  • Centralized Policy Pseudocode:

pseudocode centralized

  • Decentralized Policy Pseudocode:

pseudocode decentralized


Numerical Experiments

The framework was validated on two primary physical systems:

  1. Linear Heat Equation: Focused on temperature tracking and heat spreading.
  2. Nonlinear Fisher-KPP Equation: Modeled population dynamics and chemical fronts, where agents must overcome natural growth to achieve stability.

Performance Summary

Metric Heat 1d (Centr.) Heat 1d (Decentr.) Heat 2d (Centr.) Heat 2d (Decentr.) Fisher-KPP (Centr.) Fisher-KPP (Decentr.)
Branch Input Dim 200 40 1024 144 200 40
Total Parameters 21,794 11,298 2,116,003 158,531 21,794 11,298
Final Tracking Loss 5.2e-3 6.4e-3 7.8e-3 9.0e-3 7.0e-3 8.3e-3
Scalability Zero-shot Zero-shot Zero-shot Zero-shot Zero-shot Zero-shot
Communication Global None Global None Global None
Training Time (500 ep.) ~1 min ~1 min ~4 min ~3 min ~3 min ~3 min

Structure of this Repository

tesseract-hackathon/
├── examples/                       # High-level scripts for specific PDE problems
│   ├── fkpp1d/                     # Fisher-KPP 1D reaction-diffusion examples
│   │   ├── centralized/            # Training and visualization for global control
│   │   └── decentralized/          # Multi-agent/local control versions
│   ├── heat1d/                     # 1D Heat Equation examples
│   │   ├── centralized/            
│   │   └── decentralized/          
│   └── heat2D/                     # 2D Heat Equation examples   
│       ├── centralized/
│       └── decentralized/
│
├── models/                         # Core neural network architectures
│   └── policy.py                   # JAX implementation of the DPC policies
│
├── tesseracts/                     # The "Legacy" Simulator Wrappers
│   ├── solverFKPP_.../             # Solvers specifically for FKPP problems
│   ├── solverHeat_.../             # Solvers specifically for Heat problems (both 1d and 2d)
│   │   ├── solver.py               # The underlying physics engine logic
│   │   ├── tesseract_api.py        # Interface defining 'apply' and 'vjp' for JAX
│   │   └── tesseract_config.yaml
│   └── ...
│
├── requirements.txt                # Python dependencies
└── README.md                       # Project documentation

Getting Started

Prerequisites

  1. Clone the repository:
git clone https://github.com/PietroZanotta/Multi-Agent-DPC
cd Multi-Agent-DPC
  1. Set up Python virtual environment:
python -m venv .venv

Activate the virtual environment:

  • Linux/MacOS:
source .venv/bin/activate
  • Windows (PowerShell):
.venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Tip

If you have GPU access and want to accelerate training, also install JAX with CUDA:

pip install jax[cuda12]
  1. Verify Tesseract installation:
which tesseract  # Linux/MacOS
# or
where tesseract  # Windows

Note

For Mac Users: If tesseract build conflicts with the Tesseract OCR binary, use the full path:

/path/to/venv/bin/tesseract build .

Building Tesseract Solvers

Build the differentiable PDE solvers (required only once). This step containerizes each solver with its neural network policy as a differentiable layer.

# Build Heat Equation (1D)
cd tesseracts/solverHeat_centralized && tesseract build .
cd ../solverHeat_decentralized && tesseract build .

# Build Fisher-KPP (1D reaction-diffusion)
cd ../solverFKPP_centralized && tesseract build .
cd ../solverFKPP_decentralized && tesseract build .

# Build 2D Heat Equation (2D)
cd ../solverHeat2D_centralized && tesseract build .
cd ../solverHeat2D_decentralized && tesseract build .

# Return to project root
cd ../..

[!INFO] Each tesseract build command creates a Docker image containing the PDE solver and its trained policy. Subsequent builds are cached. You can verify built images with:

docker images | grep solver

Quick Start: Visualizing Pre-trained Models

Pre-trained policy weights are included, so you can visualize results immediately without training:

Heat Equation - 1D

Centralized policy (global sensing):

cd examples/heat1d/centralized
python visualize_conference.py
# Generates: heat_dpc_visualization_*.png, heat_dpc_agents_*.png

Decentralized policy (local sensing, communication-free):

cd ../decentralized
python visualize_conference.py
# Generates: heat_dpc_decentralized_visualization_*.png

Example output (centralized):

Fisher-KPP Equation - 1D

Centralized policy:

cd ../../fkpp1d/centralized
python visualize_conference.py
# Generates: fkpp_dpc_visualization_*.png

Decentralized policy:

cd ../decentralized
python visualize_conference.py
# Generates: fkpp_dpc_decentralized_visualization_*.png

Heat Equation - 2D

Centralized policy:

cd ../../heat2D/centralized
python visualize.py
# Generates: heat2d_centralized_visualization.png/pdf

Decentralized policy:

cd ../decentralized
python visualize.py
# Generates: heat2d_decentralized_visualization.png/pdf

Example output (2D Heat - centralized):


Optional: Generate Animations

Create animated trajectories (.gif and .mp4) demonstrating the policy performance:

# Heat 1D - Centralized
cd examples/heat1d/centralized && python animate.py
# Generates: heat_dpc_animation.gif, heat_dpc_animation.mp4

# Fisher-KPP - Decentralized
cd ../../fkpp1d/decentralized && python animate.py
# Generates: fkpp_dpc_animation.gif, fkpp_dpc_animation.mp4

# Heat 2D - Centralized
cd ../../heat2D/centralized && python animate.py
# Generates: heat2d_animation.gif, heat2d_animation.mp4

Note

Animation generation requires FFmpeg. On most systems:

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

# Windows (with Chocolatey)
choco install ffmpeg

Example animations:

  • Fisher-KPP - Centralized:

  • Fisher-KPP - Decentralized:

  • Heat 2D - Centralized:

  • Heat 2D - Decentralized:


Optional: Train Custom Policies

To train policies on new datasets or modify architectures, use the training scripts. This requires significant compute (GPU recommended):

Make sure you are at project root:

# Example: Train 1D Heat centralized policy
cd examples/heat1d/centralized
python train.py                # Generates dataset and trains for 500 epochs (saves centralized_params.msgpack)
python visualize_conference.py # Visualize results
python animate.py              # Create animated trajectories

Full workflow for all experiments:

Make sure you are at project root:

# Heat 1D
for variant in centralized decentralized; do
  cd examples/heat1d/$variant
  python train.py && python visualize_conference.py && python animate.py
  cd ../../..
done

cd ../..;

# Fisher-KPP 1D
for variant in centralized decentralized; do
  cd examples/fkpp1d/$variant
  python train.py && python visualize_conference.py && python animate.py
  cd ../../..
done


# Heat 2D
for variant in centralized decentralized; do
  cd examples/heat2D/$variant
  python train.py && python visualize.py && python animate.py
  cd ../../..
done

Advanced: Analyzing Scalability & Stigmergy

For decentralized policies, explore the self-normalization property and zero-shot scalability empirically:

Make sure you are at project root:

cd examples/fkpp1d/decentralized

# Analyze control effort across different effort penalty weights
python visualize_lambda_effort.py
# Tests how control effort scales as the number of agents increases beyond training size.
# Validates the self-normalization conjecture: individual control efforts u_i ~ O(1/N),
# so the total forcing norm ||B|| remains bounded as N increases.

# Test zero-shot scalability: deploy policy trained on N agents on M agents (M ≠ N)
python visualize_comparison.py
# Evaluates tracking MSE and control effort as agent count varies from training conditions.
# Demonstrates that policies generalize to unseen swarm sizes without retraining.

Troubleshooting

Issue Solution
Image solver_X:latest not found Run tesseract build tesseracts/solverX/ first
tesseract command not found on Mac Use full path: /path/to/venv/bin/tesseract build .
Training is slow on CPU Install jax[cuda12] and verify GPU is detected: python -c "import jax; print(jax.devices())"
Out of memory errors Reduce batch_size in train.py (default: 32)
Animations won't generate Install FFmpeg (see section above)

Future Work

There are various research directions we believe can stem from this project. Here is a list of the ones we believe are the most promising:

  • Understand all the perks and the limitations of casting the policy synthesis into an operator learning paradigm.
  • Extending our theoretical analysis to a wider class of PDEs and formally proving our self-normalization conjecture.
  • Implementing Shared Memory strategies (e.g. /dev/shm) to minimize the serialization cost of communication between the python script and the Tesseract during the training of the policy.

Tech Stack

  • Processor: Intel Core Ultra 9 275HX (24 cores, up to 5.4 GHz)
  • GPU: NVIDIA GeForce RTX 5090 Laptop GPU (24GB GDDR7 VRAM)
  • Operating System: Ubuntu 22.04 running under Windows Subsystem for Linux (WSL2)
  • Main Frameworks: JAX (v0.8.1) for numerical computing; Tesseract-JAX (v0.2.2) for differentiable PDE solvers
  • Hardware Acceleration: CUDA backend with NVIDIA driver v581.57

See our technical document for details about our experimental setup.


About

Multi-Agent Differentiable Predictive Control for Zero-Shot PDE Scalability. Won the 1st place @ Tesseract Hackathon 2025

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages