Skip to content

Multi-Agent Reinforcement Learning for Dynamic Resource Collection Optimization: A Dyson Swarm Simulation

Notifications You must be signed in to change notification settings

UmbertoFasci/MARL-DYSON

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

MARL-Dyson

Multi-Agent Reinforcement Learning for Dynamic Resource Optimization: A Dyson Swarm Simulation

Project Status: Early Development

This project is actively under development. Features and documentation will be updated regularly.

Overview

MARL-Dyson simulates resource optimization using multi-agent reinforcement learning. It models autonomous agents optimizing their positions around a central energy source, inspired by the concept of a Dyson swarm.

Energy Distribution Generation

The energy distribution is generated by creating a uniform random field across a spherical coordinate grid, with values between 0 and 1. A Gaussian smoothing filter is then applied to this random field, creating continuous regions of varying energy levels. The smoothing parameter controls the transition gradient between these regions. The final distribution is normalized to ensure all values remain in the [0,1] range.

Discretization of the Spherical Domain

theta = np.linspace(0, np.pi, resolution)
phi = np.linspace(0, 2*np.pi, resolution)
self.theta, self.phi = np.meshgrid(theta, phi)

This creates a discretized grid over the entire spherical surface using standard spherical coordinates:

  • θ ∈ [0, π] spans from the north pole (θ=0) to south pole (θ=π)
  • φ ∈ [0, 2π] covers the full longitudinal rotation

The choice of uniform discretization with resolution points enables a computational tradeoff: higher values provide better spatial accuracy at the cost of increased computational complexity (O(resolution²)).

Stochastic Field Generation

random_field = np.random.rand(self.resolution, self.resolution)

This initializes a random field following a uniform distribution U(0,1). Mathematically, each point (i,j) in the field is assigned a value:

$$F_0(i,j) \sim \mathcal{U}(0,1)$$

This represents a white noise process with zero spatial correlation. The uniform distribution was selected rather than, for instance, Gaussian or power-law distributions, to ensure maximum entropy in the initial state.

Spatial Correlation via Gaussian Filtering

smoothed_field = gaussian_filter(random_field, sigma=self.smoothing)

This operation transforms the white noise field into a correlated field through convolution with a Gaussian kernel:

$$F_1(i,j) = \sum_{k,l} F_0(k,l) \cdot G_\sigma(i-k, j-l)$$

Where the Gaussian kernel is defined as:

$$G_\sigma(x,y) = \frac{1}{2\pi\sigma^2} e^{-\frac{x^2+y^2}{2\sigma^2}}$$

The parameter sigma (σ) has profound effects on the resulting field:

  • It establishes the correlation length scale between points
  • It controls the characteristic size of energy "features" on the sphere
  • It determines the smoothness of gradients in the field

Normalization for Scale Invariance

return (smoothed_field - smoothed_field.min()) / (smoothed_field.max() - smoothed_field.min())

This performs min-max normalization, transforming the field to the range [0,1]:

$$F_2(i,j) = \frac{F_1(i,j) - \min(F_1)}{\max(F_1) - \min(F_1)}$$

This normalization serves several purposes:

  1. Creates a dimensionless energy measure independent of absolute scale
  2. Ensures consistency across different random initializations
  3. Simplifies agent reward calculations by bounding all possible values

Position-Based Energy Lookup

def get_energy_at_position(self, theta, phi):
    theta_idx = np.argmin(np.abs(np.linspace(0, np.pi, self.resolution) - theta))
    phi_idx = np.argmin(np.abs(np.linspace(0, 2*np.pi, self.resolution) - phi))
    return self.energy_field[phi_idx, theta_idx]

This implements nearest-neighbor interpolation in the spherical coordinate space. For any continuous position (θ,φ), it finds the closest discretized grid point using the L1-norm:

$$\text{idx}_\theta = \arg\min_i |θ_i - θ|$$

$$\text{idx}_\phi = \arg\min_j |\phi_j - \phi|$$

The lookup returns $F_2(\text{idx}\phi, \text{idx}\theta)$, effectively creating a piecewise-constant function over the sphere. This approach was chosen for computational efficiency, though alternative interpolation methods (bilinear, cubic) could provide smoother transitions at the cost of computational complexity.

References

Sherman, Michael. Spatial Statistics and Spatio-Temporal Data : Covariance Functions and Directional Properties / Michael Sherman. Hobooken, N.J: Wiley, 2010.

Swarm Agent Definition

The initial development of MARL system is based on the class definition of the swarm agent, where coordinate location, energy collection, directionality, and movement is defined. This is a simple implementation of the swarm agent rule set, and more will be experimented with in the future. These future experiements include a ruleset introduction where only a single agent can occupy a coordinate location at a time, thus limiting movement and generating a more dynamic environment.

About

Multi-Agent Reinforcement Learning for Dynamic Resource Collection Optimization: A Dyson Swarm Simulation

Topics

Resources

Stars

Watchers

Forks

Languages