- Gripper rotation and position (current implementation is fix the position before running a task)
- Robust testing is not stable (if I move the target while the robot arm is executing a task, it will not update to the new position)
- whole-arm obstacle avoidance planning
- multi depth camera? (current implementation will only get the center point from one surface)
- transfer the whole application to ros?
- Robot arm (TCP connection)
- Gripper (Serial connection)
- Depth Cameras (RealSense camera)
- Workspace (remember to strict the workspace bounds for robot arm inside real_env.py)
You should use scripts under src/toolbox to test the connection of external devices
-
Obtain an OpenAI API key, and put it inside config.ini
-
Install required submodules:
# Clone submodules for vision components(XMem) git submodule update --init --recursive -
Create a conda environment:
conda create -n voxposer-realworld-env python=3.10 conda activate voxposer-realworld-env
-
Install dependencies:
pip install -r requirements.txt
you may need to run
conda install -c conda-forge libstdcxx-ng
if you encounter the following error
libGL error: MESA-LOADER: failed to open iris: /usr/lib/dri/iris_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri) libGL error: failed to load driver: iris libGL error: MESA-LOADER: failed to open iris: /usr/lib/dri/iris_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri) libGL error: failed to load driver: iris libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri) libGL error: failed to load driver: swrast [Open3D WARNING] GLFW Error: GLX: Failed to create context: GLXBadFBConfig [Open3D WARNING] Failed to create window [Open3D WARNING] [DrawGeometries] Failed creating OpenGL window.
-
Download following models from ultralytics and XMem
- sam2.1_b.pt
- yolo11-seg.pt
- yolo11x.pt
- XMem.pth
-
Perform camera-to-robot calibration:
python src/toolbox/perceptron/d435i/calibration/cal_transform_mat.py
- Press
Spaceto capture the current frame and process a calibration sample. - Move the robot arm to different positions (10 or more captures recommended) and repeat the last step.
- Press
dto delete the last collected sample if needed. - Press
rto reset all collected calibration data. - Press
Escto finish calibration and compute the transformation matrix. - Copy the resulting transform matrix to
cam2robot.pyand test the accuracy. - Replace the transform matrix to
run.py
Note: For multi-camera setup, calibration needs to be performed for each camera.
- Press
-
Start to play
You may need to adjust the code acoording the devices you used
python src/run.py
.
├── .vscode/ # VS Code configuration
│ └── launch.json # Debug configurations
├── media/
├── src/ # Main project implementation
│ ├── configs/ # Configuration files
│ ├── model_weight/ # Pre-trained model weights
│ ├── prompts/ # LLM prompt templates
│ │ └── rlbench/ # RLBench prompt templates
│ ├── toolbox/ # Core functionality modules
│ │ ├── perceptron/ # Vision and perception tools
│ │ │ ├── XMem/ # Video object segmentation
│ │ │ ├── d435i/ # RealSense camera tools
│ │ │ │ └── calibration/ # Camera-robot calibration
│ │ │ └── ... # Other perception tools
│ │ ├── my_prompt/ # Custom prompt templates
│ │ ├── real_env.py # Environment interface
│ ├── envs/ # Environment definitions
│ └── run.py # Main execution entry point
├── requirements.txt # Python dependencies
├── config.ini # API and config keys
└── README.md # Project documentation
This project is built on top of VoxPoser.