Installation

The code can be used for training new models or testing them. Testing requires fewer dependencies and does not need a CUDA-supported GPU.

Training

Ensure the following dependencies are installed:

To start training, modify the .\run-scripts\train_script_* files or execute:

python src\FM3_MicRo\rl.py --exc train

For a list of available options and their description, use:

python src\FM3_MicRo\rl.py --help

Testing

Testing does not require any of the torch libraries and bitsandbytes library, which can be removed from requirements.txt. If you do not have a CUDA-supported GPU or have not installed the CUDA Toolkit, their installation may fail. Other dependencies can remain since their install time and size are negligible.

Required dependencies:

Python 3.10
requirements.txt (install with pip install -r)

To start testing, modify the .\run-scripts\test_script.bat file or execute:

python src\FM3_MicRo\rl.py --exc test

For available options, use:

python src\FM3_MicRo\rl.py --help

Project Details

Introduction

This repository contains Python code, .bat scripts, .sh SLURM scripts, media files and log files for the thesis "Application of Foundation Model Based Methods in Micro-Robotics" at the Robotic Instruments Lab, Aalto University by Muhammad Usama Sattar during year 2025 and subsequently the conference paper "Application of Large Language Models in Magnetically Manipulated Microrobots" submitted to MARSS2025.

We employ a novel RL technique where LLM acts as reward generator to control micro-robots via magnetic fields. The LLM receives the previous state, current state, and goal. This information is used to generate the reward value for each timestep. The idea is that LLMs can identify relevant workspace features that indicate good actions, eliminating the need for manual reward design.

For more details, refer to: Reward Design with Language Models.

Magnetic Manipulation Setup and Simulator

The setup consists of 8 solenoids arranged in a circular configuration. A ferromagnetic particle is placed within the solenoid circle. Applying current to a solenoid attracts the particle toward it which enables precise particle placement.

System Mechanics

The particle is accelerated toward the solenoids by magnetic force. Fluid drag from the water surface reduces its momentum while the meniscus effect pulls the particle toward the petri dish center, though this is negligible due to the dish's size.

Thus, the system dynamics are governed by only magnetic and drag forces, as shown below:

Simulator

The simulator is built using pygame and mimics the experimental setup through a deep-learning-trained model. It enables simultaneous RL training on an HPC cluster.

Implementation

For RL, we use PPO from Stable-Baselines3. Local LLM inference is achieved through Transformers libary. We utilize the following QWEN2.5-Instruct models:

3B
7B
14B
32B

The problem has been simplified by keeping the goal position at the center for every episode. Total training timesteps are 1M. The environment provides two types of rewards:

Global Rewards: Rewards that do not directly relate to local movements of the particle such as amount of time spent in an episode, staying in vicinity of the goal and achieving the goal.
Local Rewards: Rewards that directly relate to local movements of the particle. In our baseline, delta_r, this reward is given by normalized change in radial distance r, while in LLM Augmented RL, this reward is provided by the LLM.

The model has been trained using Triton HPC cluster provided by Aalto University's School of Science as part of the "Science-IT" project. We employed nodes with H100 GPUs due to their large VRAM and high clock speeds.

3 types of prompts have been utilized:

Zero-shot: Description of the workspace and task requirements are provided.
Five-shot: Description of the workspace and task requirements are provided along with 5 examples.
One-shot with Explanation: Description of the workspace and task requirements are provided along with one example and its explanation.

Each prompt has been further differnetiated by possible output values:

Binary: LLM can output either a 0 or 1.
Continuous: LLM can output any integer from -9 to +9.

Results

We only provide key results in the markdown. You can find the complete dataset of plots in ./media/.

Reward Maps

Reward Maps illustrate the reward values for moving from a paritcular location to another. The shape of the maps reveal the variation in performance with various prompts and model sizes. Key features are:

Black Cross: Goal Position
Black Dot: Initial Particle Position
Colored Dot: Final Particle Position

Reward Map for Zero-shot prompt at (112, 122)

Reward Map for Five-shot prompt at (112, 122)

Reward Map for One-shot with Explanation prompt at (112, 122)

We observe the following trends in the figures:

Performance improves with increasing size of model
One-shot with Explanation > Five-shot > Zero-shot
Binary Rewards > Continuous Rewards

Model Evaluations

Each model was tested 100 times during its training utilizing delta_r rewards to mantain consistency.

Model Evaluations for Binary One-shot with Explanation

We observe the same behaviours in evaluations as described in Reward Maps. Notably Binary One-shot with Explanation performs even better than our baseline delta_r. This underscore the benefits of utilizing LLMs as reward generators due to their ability to extract additional features from the workspace resulting in improved convergence. Furthermore, utilizing QWEN-32B-Instruct allows even Zero-shot to converge, suggesting that at sufficient model sizes, accurate reward values can be generated without examples and/or explanations.

Training Times

The trend in the plot is counter-intuitive. Generally, larger models take more time for inference. Only possible explanation that we could find for the observed trend is a better capability of larger models to utilize GPU and software optimizations.

Media Files

You can find videos of the trained model running on the simulator and experimental setup along with log files and various figures in ./media/.

Figures and videos relating to the paper "Application of Large Language Models in Magnetically Manipulated Microrobots" submitted at MARSS2025 can be found in ./media/Paper Figures and Videos/.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
dev-scripts		dev-scripts
gymnasium_envs		gymnasium_envs
media		media
outputs		outputs
run-scripts		run-scripts
src/FM3_MicRo		src/FM3_MicRo
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Training

Testing

Project Details

Introduction

Magnetic Manipulation Setup and Simulator

System Mechanics

Simulator

Implementation

Results

Reward Maps

Model Evaluations

Training Times

Media Files

About

Uh oh!

Releases

Packages

Languages

License

MuhammadUsamaSattar/FM3-MicRo

Folders and files

Latest commit

History

Repository files navigation

Installation

Training

Testing

Project Details

Introduction

Magnetic Manipulation Setup and Simulator

System Mechanics

Simulator

Implementation

Results

Reward Maps

Model Evaluations

Training Times

Media Files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages