Skip to content

AMD x Hugging Face Hackathon on Dec 12th - 14th 2025. Inverse Kinema-tricks team. We ended up winning 1st place 🥇

Notifications You must be signed in to change notification settings

NRdrgz/AMD_Robotics_Hackathon_2025_InverseKinematricks

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

108 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMD_Robotics_Hackathon_2025_Flip&Ship

Team Information

Team:

Team Number: 6
Team Name: Inverse Kinema-tricks
Team Members:

  • Xiao Feng
  • Giacomo Randazzo
  • Nicolas Rodriguez

Summary:

Flip & Ship: Watch the Demo!

Watch the demo

We got inspired by recent advances in humanoid robotics, especially that cool Figure robot demo at an Amazon fulfillment center where it was flipping packages to expose barcodes for scanning:

Figure robot sorting packages

We thought: "Hey, we can do something similar!" So we built our version using more affordable hardware and open-source VLA models, plus a healthy dose of creative engineering!

Introducing Our Project: Flip & Ship!

System Overview

We designed and built a compact sorting station inspired by modern fulfillment centers:

Project schematic:
Project schematic

Real-world setup:
Real World Setup

Hardware

  • Robotic Arms: 2x SO101 units
  • Cameras: 3 provided by the hackathon organizers
  • Conveyor Belt:
    • Custom-built using a stepper motor salvaged from a used 3D printer
    • Belt crafted with tape in a perfectly hacky mindset

Submission Details

1. Mission Description

Our system can be used in a package delivery center to make sure barcodes are facing the right way (just like that Figure robot demo!). Plus, we can sort packages to send them in the right direction too.

2. Creativity

  • We're using basic skills that have been shown in hackathons before (pick and place, sorting), but we're combining them in a way that actually solves a real-world problem!
  • The "flipping packages" skill has never been demonstrated before to our knowledge.
  • Let's be honest, our DIY conveyor belt made from a salvaged 3D printer motor and tape? That's peak hackathon creativity right there!
  • We developed a novel approach for fast policy switching during inference for Arm 2. Inspired by recent developments in RTC, we developed a queuing system where we have 2 threads per arm: 1 running policies and pushing to a queue, and 1 thread running inference, choosing what policy to listen to (sorting or flipping) depending on what the computer vision and conveyor belt system tells it. This allows to switch between policies without latency. See the Technical Implementation section for more details!
  • Finally, a hackathon would not be a hackathon without a fancy webapp, so we built one to track our packages color and delivery status:

Web App Screenshot

3. Technical Implementations

Policy Architecture

We developed 3 policies in total:

  • 1 policy for Arm 1: Picks up packages and drops them on the conveyor belt
  • 2 policies for Arm 2: One for sorting packages into the correct color box, and one for flipping packages to expose barcodes

The computer vision system is responsible for selecting the appropriate policy for Arm 2. If it can detect the package color, it instructs Arm 2 to sort the package. If the color (and thus the barcode) is not visible, it instructs Arm 2 to flip the package.

Computer vision is used solely to start or stop each policy. The sorting policy for Arm 2 independently determines the correct color for sorting, without input from computer vision.

The computer vision system also controls the conveyor belt. The belt runs continuously and only stops when a package is detected at the end, ready for processing.

Architecture

Our system runs on two computers that talk to each other over WebSocket:

Conveyor Belt Computer:

  • Runs computer vision detection (checks if package is at the end and if barcode is visible)
  • Has a state machine that decides what to do: RUNNING (picking packages), FLIPPING (barcode hidden), or SORTING (barcode visible)
  • Controls the conveyor belt (starts/stops it based on state)
  • Sends state changes to the Arms Computer via WebSocket

Arms Computer:

  • Receives state commands from the Conveyor Belt Computer via WebSocket
  • Has an ArmController that maps states to which policies should be active
  • Runs 2 threads per arm (4 threads total): one for inference (running the policy) and one for executing actions
  • Uses a queuing system inspired by RTC - the inference thread pushes actions to a queue, and the action thread pulls from it. This architecture enables rapid policy switching (e.g., switching from flip to sort) without stopping the robot, ensuring smooth and continuous operation!

State Machine Flow:

  • RUNNING → FLIPPING: Package detected at end, but barcode is hidden
  • RUNNING → SORTING: Package detected at end, barcode is visible
  • FLIPPING → SORTING: After flipping, barcode becomes visible
  • SORTING → RUNNING: Package removed from detection zone

SystemArchitecture

Teleoperation / Dataset Capture

We captured 150 episodes for each policy, making sure to have recordings for both daylight and nighttime conditions.

Our datasets can be found here:

Training

We first trained ACT models for each policy. Training was quite fast on the MI300X GPU (only 50 minutes for 10,000 steps)!

Loss Act

Then we trained SmolVLA models (5 hours for 80,000 steps).

Loss Smolvla

Here are our 3 final trained models:

Inference

All SmolVLA policies run on the AMD laptop using the Radeon 890M GPU, with both arms connected and controlled on the same laptop. The system demonstrates real-time performance with smooth policy switching and low-latency inference.

We are also using RTC for all of our 3 policies to achieve smoother movements.

Here is a screenshot of htop running during inference. As shown below, our system uses only 8 GiB of memory during policy inference, leaving ample headroom out of a total of 30 GiB:

htop inference screenshot

4. Ease of Use

  • Our system is easy to use and fully centralized, handling all arms, conveyor belt, computer vision, and state management in a unified architecture
  • Leveraging VLA (Vision-Language-Action) models, we can quickly learn and adapt to new types of packages with minimal retraining
  • Our Webapp makes it easy to monitor package states, even when you're not within the room.

Additional Links

Mission 1

Mission 2

About

AMD x Hugging Face Hackathon on Dec 12th - 14th 2025. Inverse Kinema-tricks team. We ended up winning 1st place 🥇

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.1%
  • Jupyter Notebook 46.6%
  • Other 0.3%