This project is actively under development. Features and documentation will be updated regularly.
MARL-Dyson simulates resource optimization using multi-agent reinforcement learning. It models autonomous agents optimizing their positions around a central energy source, inspired by the concept of a Dyson swarm.
The energy distribution is generated by creating a uniform random field across a spherical coordinate grid, with values between 0 and 1. A Gaussian smoothing filter is then applied to this random field, creating continuous regions of varying energy levels. The smoothing parameter controls the transition gradient between these regions. The final distribution is normalized to ensure all values remain in the [0,1] range.
theta = np.linspace(0, np.pi, resolution)
phi = np.linspace(0, 2*np.pi, resolution)
self.theta, self.phi = np.meshgrid(theta, phi)This creates a discretized grid over the entire spherical surface using standard spherical coordinates:
- θ ∈ [0, π] spans from the north pole (θ=0) to south pole (θ=π)
- φ ∈ [0, 2π] covers the full longitudinal rotation
The choice of uniform discretization with resolution points enables a computational tradeoff: higher values provide better spatial accuracy at the cost of increased computational complexity (O(resolution²)).
random_field = np.random.rand(self.resolution, self.resolution)This initializes a random field following a uniform distribution U(0,1). Mathematically, each point (i,j) in the field is assigned a value:
This represents a white noise process with zero spatial correlation. The uniform distribution was selected rather than, for instance, Gaussian or power-law distributions, to ensure maximum entropy in the initial state.
smoothed_field = gaussian_filter(random_field, sigma=self.smoothing)This operation transforms the white noise field into a correlated field through convolution with a Gaussian kernel:
Where the Gaussian kernel is defined as:
The parameter sigma (σ) has profound effects on the resulting field:
- It establishes the correlation length scale between points
- It controls the characteristic size of energy "features" on the sphere
- It determines the smoothness of gradients in the field
return (smoothed_field - smoothed_field.min()) / (smoothed_field.max() - smoothed_field.min())This performs min-max normalization, transforming the field to the range [0,1]:
This normalization serves several purposes:
- Creates a dimensionless energy measure independent of absolute scale
- Ensures consistency across different random initializations
- Simplifies agent reward calculations by bounding all possible values
def get_energy_at_position(self, theta, phi):
theta_idx = np.argmin(np.abs(np.linspace(0, np.pi, self.resolution) - theta))
phi_idx = np.argmin(np.abs(np.linspace(0, 2*np.pi, self.resolution) - phi))
return self.energy_field[phi_idx, theta_idx]This implements nearest-neighbor interpolation in the spherical coordinate space. For any continuous position (θ,φ), it finds the closest discretized grid point using the L1-norm:
The lookup returns $F_2(\text{idx}\phi, \text{idx}\theta)$, effectively creating a piecewise-constant function over the sphere. This approach was chosen for computational efficiency, though alternative interpolation methods (bilinear, cubic) could provide smoother transitions at the cost of computational complexity.
Sherman, Michael. Spatial Statistics and Spatio-Temporal Data : Covariance Functions and Directional Properties / Michael Sherman. Hobooken, N.J: Wiley, 2010.
The initial development of MARL system is based on the class definition of the swarm agent, where coordinate location, energy collection, directionality, and movement is defined. This is a simple implementation of the swarm agent rule set, and more will be experimented with in the future. These future experiements include a ruleset introduction where only a single agent can occupy a coordinate location at a time, thus limiting movement and generating a more dynamic environment.
