The objective of this experiment is to develop a group of drones that have the ability to move together while avoiding collisions with each other. This is achieved through the implementation of a policy that enables each drone to decide its movement based on the relative position of its neighbors. The experiment is conducted in a 2D environment with unlimited space, where each drone's neighborhood is limited to the five closest neighbors (a hyper-parameter). The drones are capable of moving in eight different directions in a square grid, including horizontal, vertical, and diagonal movements. The environment state is determined by the relative distance between the drones and their closest neighbors, as perceived by the individual drone. By integrating Scafi, we are able to express this concept effectively.
val state = foldhoodPlus(Seq.empty)(_ ++ _)(Set(nbrVector))We designed a reward function based on two factors: collision factor and cohesion factor. We aim to learn a policy by which agents, initially spread in a very sparse way in the environment, move toward each other until reaching approximate
In this way, when the negative factor is taken into account: the system will tend to move nodes away from each other. However, if only this factor were used, the system would be disorganized. This is where the cohesion facator comes in. Given the neighbour with the maximum distance
All the files needed to describe the experiment are in the src/main/scala/experiment folder. As described in ScaRLib, to run a learning the user must define:
- The action space
- The reward function
- The state
- The neural network used to approximate the Q-function
- The scafi logic
- The alchemist specification
Finally, all these elements are merged to create the learning system in the file CohesionCollisionTraining.scala.
Since configuring ScalaPy may not be straightforward on all operating systems, we have integrated the possibility of executing the experiments through Docker using docker compose.
docker-compose.yml: this file describes the docker services required to reproduce the experiment, in particular, there are three services:learning: performs the learning from scratchevaluation: performs the evaluation using a pre-trained neural networkcharts: generates all the charts
docker/: this folder contains all the Dockerfiles needed by docker compose
N.B. Since learning from scratch may take a long time on certain systems with low resources (e.g., without a GPU) we provided a pre-trained neural network (networks/network) which can be used to perform the evaluation and reproduce the charts presented in the paper.
Furthermore, we have also included the data extracted from the evaluation phase, so it is also possible to perform only the generation of the charts.
- Learning:
docker compose run --no-deps learning
- Evaluation:
docker compose run --no-deps evaluation
- Charts
docker compose run --no-deps charts
N.B By default the evaluation uses the pre-trained neural network. If you want to use a network trained from scratch, it is necessary to change the snapshot path in the file evaluation/CohesionCollisionEval.scala
Due to the usage of ScalaPy there might be the need for some extra-configuration, all the details can be found here (sections: Execution and Virtualenv). Tip: if if you don't want to configure environment variables on your PC you can pass the required arguments directly to the gradle task adding the following code (in build.gradle.kts file):
jvmArgs(
"-Dscalapy.python.library=${pyhtonVersion}",
//Other required parameters...
)Before running the learning you must install the following dependencies:
pip install -r requirements.txtIn order to launch the learning only one change is needed, you must specify the path on where the snapshots of the policy will be saved. You can do this editing the following line of code in the file CohesionCollisionTraining.scala:
private val learningConfiguration = new LearningConfiguration(dqnFactory = new NNFactory, snapshotPath = "path-to-snapshot-folder")After making this change it is possible to run the learning using a pre-configured Gradle task launching the following command:
./gradlew runCohesionAndCollisionTrainingIf you want to see the dynamics of the system during the learning you can use the following command:
./gradlew runCohesionAndCollisionTrainingYou can follow the progess of learning using Tensorboard by running the following command:
tensorboard --logdir=runsThis could take a while (hours in a modern machine). So we already upload the last network snapshot in the folder network.
To verify the performance of the policy you can run the following command the evaluation tasks with:
./gradlew runCohesionAndCollisionEvaluationIf you want to see the GUI you can run the following command:
./gradlew runCohesionAndCollisionEvaluationGuiThis will produce the data needed to plot the graphs in the data folder.
To plot the graphs you can run the following command:
python plotter.pyThis will create the graphs in the charts folder.