A robotics project featuring smolVLA (Small Vision-Language-Action model) integration with Robosuite and MuJoCo physics simulation environment. This repository demonstrates a complete setup for vision-language-based robotic manipulation tasks.
β
smolVLA Integration - Vision-Language-Action model for robotic control
β
Robosuite Environment - Comprehensive robotic manipulation suite
β
MuJoCo Physics - High-performance physics simulation engine
β
Docker Environment - Fully containerized development setup
- Docker Desktop installed and running
- GPU support (recommended for smolVLA inference)
- Clone the Repository
git clone https://github.com/pradyai/Organizer-Robot.git
cd Organizer-Robot- Build the Docker Image
docker build -t organizer-robot-env .- Run the Container
Linux/macOS:
docker run -d --name robot-container \
-v "$(pwd)/Mounted_Repo:/app/Mounted_Repo" \
organizer-robot-envWindows (PowerShell):
docker run -d --name robot-container -v "${pwd}/Mounted_Repo:/app/Mounted_Repo" organizer-robot-env- Access the Container
docker exec -it robot-container bashimport mujoco
import gymnasium as gym
# Test basic MuJoCo functionality
env = gym.make('HalfCheetah-v4')
obs, info = env.reset()
print("β
MuJoCo setup successful!")import robosuite as suite
from robosuite.controllers import load_controller_config
# Create a Robosuite environment
controller_config = load_controller_config(default_controller="OSC_POSE")
env = suite.make(
env_name="Lift",
robots="Panda",
controller_configs=controller_config,
has_renderer=False,
has_offscreen_renderer=True,
use_camera_obs=True,
)
# Reset and test
obs = env.reset()
print("β
Robosuite environment ready!")# Example smolVLA integration
from smolvla import SmolVLA
# Initialize model
model = SmolVLA. from_pretrained("smolvla-base")
# Process vision-language commands
action = model.predict(
image=obs['camera_image'],
instruction="Pick up the red cube"
)
print("β
smolVLA inference successful!")Organizer-Robot/
βββ docker/
β βββ Dockerfile
β βββ docker-compose.yml
βββ src/
β βββ __init__.py
βββ tests/
βββ Mounted_Repo/ # Volume-mounted workspace
βββ requirements.txt
βββ requirements-dev.txt
βββ README.md
For advanced workflows, use the provided docker-compose.yml:
# Development environment
docker-compose -f docker/docker-compose.yml up dev
# Jupyter Lab (accessible at http://localhost:8888)
docker-compose -f docker/docker-compose.yml up jupyter- MuJoCo >= 2.3.0 - Physics simulation
- Gymnasium >= 0.28.0 - RL environment interface
- Robosuite - Robot manipulation environments
- smolVLA - Vision-Language-Action model
- NumPy >= 1.21.0
- Matplotlib >= 3.5.0
- SciPy >= 1.8.0
See requirements.txt for complete list.
The environment supports multiple rendering modes:
export MUJOCO_GL=osmesadocker run -it --rm \
-v /tmp/.X11-unix:/tmp/.X11-unix:rw \
-e DISPLAY=$DISPLAY \
organizer-robot-envFor remote development, consider setting up VNC inside the container for GUI access.
# Inside the container
pytest tests/
# With coverage
pytest tests/ --cov=src --cov-report=htmlWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Install pre-commit hooks:
pip install pre-commit
pre-commit install- Make your changes and ensure tests pass
- Submit a pull request
import robosuite as suite
env = suite.make(
"Stack",
robots="Panda",
has_renderer=True,
use_camera_obs=True,
)
for episode in range(10):
obs = env.reset()
for step in range(200):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
if done:
breakfrom smolvla import SmolVLA
import robosuite as suite
# Initialize environment and model
env = suite.make("PickPlace", robots="Panda", use_camera_obs=True)
model = SmolVLA.from_pretrained("smolvla-base")
# Execute natural language command
obs = env.reset()
action = model.predict(
image=obs['frontview_image'],
instruction="Place the blue block in the bin"
)
env.step(action)- MuJoCo: High-performance physics at 500+ FPS
- Robosuite: Realistic manipulation tasks with diverse robots
- smolVLA: Efficient vision-language-action inference (~10 Hz)
# Try different rendering backends
export MUJOCO_GL=glfw # or egl, osmesa# Stop and remove existing container
docker stop robot-container
docker rm robot-container
# Rebuild image
docker build -t organizer-robot-env . --no-cacheThis project is open-source. Please check individual component licenses:
- MuJoCo: Apache 2.0
- Robosuite: MIT
- smolVLA: Check model license
- MuJoCo - DeepMind for open-sourcing the physics engine
- Robosuite - Stanford Vision and Learning Lab
- smolVLA - Vision-Language-Action research community
For questions or collaboration, please open an issue or reach out to the maintainers.
Status: β MuJoCo Running | β Robosuite Configured | β smolVLA Integrated