Team:
Team Number: 6
Team Name: Inverse Kinema-tricks
Team Members:
- Xiao Feng
- Giacomo Randazzo
- Nicolas Rodriguez
Summary:
We got inspired by recent advances in humanoid robotics, especially that cool Figure robot demo at an Amazon fulfillment center where it was flipping packages to expose barcodes for scanning:
We thought: "Hey, we can do something similar!" So we built our version using more affordable hardware and open-source VLA models, plus a healthy dose of creative engineering!
We designed and built a compact sorting station inspired by modern fulfillment centers:
- Robotic Arms: 2x SO101 units
- Cameras: 3 provided by the hackathon organizers
- Conveyor Belt:
- Custom-built using a stepper motor salvaged from a used 3D printer
- Belt crafted with tape in a perfectly hacky mindset
Our system can be used in a package delivery center to make sure barcodes are facing the right way (just like that Figure robot demo!). Plus, we can sort packages to send them in the right direction too.
- We're using basic skills that have been shown in hackathons before (pick and place, sorting), but we're combining them in a way that actually solves a real-world problem!
- The "flipping packages" skill has never been demonstrated before to our knowledge.
- Let's be honest, our DIY conveyor belt made from a salvaged 3D printer motor and tape? That's peak hackathon creativity right there!
- We developed a novel approach for fast policy switching during inference for Arm 2. Inspired by recent developments in RTC, we developed a queuing system where we have 2 threads per arm: 1 running policies and pushing to a queue, and 1 thread running inference, choosing what policy to listen to (sorting or flipping) depending on what the computer vision and conveyor belt system tells it. This allows to switch between policies without latency. See the Technical Implementation section for more details!
- Finally, a hackathon would not be a hackathon without a fancy webapp, so we built one to track our packages color and delivery status:
We developed 3 policies in total:
- 1 policy for Arm 1: Picks up packages and drops them on the conveyor belt
- 2 policies for Arm 2: One for sorting packages into the correct color box, and one for flipping packages to expose barcodes
The computer vision system is responsible for selecting the appropriate policy for Arm 2. If it can detect the package color, it instructs Arm 2 to sort the package. If the color (and thus the barcode) is not visible, it instructs Arm 2 to flip the package.
Computer vision is used solely to start or stop each policy. The sorting policy for Arm 2 independently determines the correct color for sorting, without input from computer vision.
The computer vision system also controls the conveyor belt. The belt runs continuously and only stops when a package is detected at the end, ready for processing.
Our system runs on two computers that talk to each other over WebSocket:
Conveyor Belt Computer:
- Runs computer vision detection (checks if package is at the end and if barcode is visible)
- Has a state machine that decides what to do:
RUNNING(picking packages),FLIPPING(barcode hidden), orSORTING(barcode visible) - Controls the conveyor belt (starts/stops it based on state)
- Sends state changes to the Arms Computer via WebSocket
Arms Computer:
- Receives state commands from the Conveyor Belt Computer via WebSocket
- Has an
ArmControllerthat maps states to which policies should be active - Runs 2 threads per arm (4 threads total): one for inference (running the policy) and one for executing actions
- Uses a queuing system inspired by RTC - the inference thread pushes actions to a queue, and the action thread pulls from it. This architecture enables rapid policy switching (e.g., switching from flip to sort) without stopping the robot, ensuring smooth and continuous operation!
State Machine Flow:
RUNNING→FLIPPING: Package detected at end, but barcode is hiddenRUNNING→SORTING: Package detected at end, barcode is visibleFLIPPING→SORTING: After flipping, barcode becomes visibleSORTING→RUNNING: Package removed from detection zone
We captured 150 episodes for each policy, making sure to have recordings for both daylight and nighttime conditions.
Our datasets can be found here:
We first trained ACT models for each policy. Training was quite fast on the MI300X GPU (only 50 minutes for 10,000 steps)!
Then we trained SmolVLA models (5 hours for 80,000 steps).
Here are our 3 final trained models:
- Arm 1 Pick packages from cardboard (SmolVLA)
- Arm 2 Flip packages (SmolVLA)
- Arm 2 Sort packages (SmolVLA)
All SmolVLA policies run on the AMD laptop using the Radeon 890M GPU, with both arms connected and controlled on the same laptop. The system demonstrates real-time performance with smooth policy switching and low-latency inference.
We are also using RTC for all of our 3 policies to achieve smoother movements.
Here is a screenshot of htop running during inference. As shown below, our system uses only 8 GiB of memory during policy inference, leaving ample headroom out of a total of 30 GiB:
- Our system is easy to use and fully centralized, handling all arms, conveyor belt, computer vision, and state management in a unified architecture
- Leveraging VLA (Vision-Language-Action) models, we can quickly learn and adapt to new types of packages with minimal retraining
- Our Webapp makes it easy to monitor package states, even when you're not within the room.
- README
- Link to a video of your robot performing the task
- Models on HF
- Arm 1 Pick packages from cardboard (SmolVLA)
- Arm 2 Flip packages (SmolVLA)
- Arm 2 Sort packages (SmolVLA)
- Datasets on HF







