Skip to content

feat(model-driver-mediapipe): integrate MediaPipe as a motion capture tool#828

Merged
nekomeowww merged 8 commits into
mainfrom
lilia/mediapipe
Dec 30, 2025
Merged

feat(model-driver-mediapipe): integrate MediaPipe as a motion capture tool#828
nekomeowww merged 8 commits into
mainfrom
lilia/mediapipe

Conversation

@Lilia-Chen
Copy link
Copy Markdown
Contributor

Description

Add @proj-airi/mediapipe-workshop: a MediaPipe mocap workshop package + a devtools playground

  • New workspace package: packages/mediapipe-workshop (@proj-airi/mediapipe-workshop)

    • MediaPipe Tasks Vision backend (pose/hands/face) for webcam frames
    • A small “engine” that schedules jobs by Hz and drops frames when busy (keeps UI responsive)
    • Unified output types (PerceptionState) for stage-side consumption
    • Canvas overlay rendering for landmarks/connectors debugging
    • src/three/ functional pose→VRM retargeting PoC (bone targets + stability heuristics)
  • New Stage Web devtools page: apps/stage-web/src/pages/devtools/mediapipe-workshop.vue

    • Webcam + overlay preview
    • Runtime toggles for jobs + rates
    • Side-by-side VRM viewer to iterate on retargeting quickly

Design notes

  • Separation of concerns:
    • backends/*: MediaPipe-specific initialization + inference
    • engine: scheduling, frame dropping, unified PerceptionState
    • three/*: all new retargeting logic lives here (functional-first)
  • Minimal touch to stage-ui-three: only a small per-frame hook surface so the workshop can drive VRM without extending core renderer behaviour.

Next steps

  • Face/head capture
  • Motion animation capture & recording
  • Integrating the tool into VRM model setting page

@Lilia-Chen Lilia-Chen self-assigned this Dec 24, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 24, 2025

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name Link
🔭 Waiting for approval For maintainers, approve here

Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Lilia-Chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new experimental package, @proj-airi/mediapipe-workshop, designed to integrate MediaPipe into AIRI as a motion capture tool. It provides a comprehensive solution for real-time webcam-based pose, hand, and face tracking, with direct retargeting capabilities for VRM models. A dedicated devtools playground has been added to stage-web to allow developers to easily test and configure the MediaPipe pipeline, offering runtime toggles and side-by-side VRM viewing. The core design emphasizes separation of concerns, efficient scheduling with frame dropping, and a minimal impact on existing rendering logic.

Highlights

  • New MediaPipe Workshop Package: Introduced @proj-airi/mediapipe-workshop, a new workspace package for integrating MediaPipe as a motion capture tool.
  • Webcam Motion Capture: Implemented webcam-based motion capture for pose, hands, and face using MediaPipe Tasks Vision backend.
  • Real-time VRM Retargeting: Added functional proof-of-concept for real-time pose-to-VRM retargeting, allowing VRM models to mimic detected movements.
  • Devtools Playground: Created a new devtools page in stage-web for the MediaPipe workshop, featuring webcam preview, overlay rendering, runtime configuration toggles, and a side-by-side VRM viewer.
  • Performance Optimization: Developed a small engine that schedules jobs by Hz and drops frames when busy, ensuring UI responsiveness despite synchronous MediaPipe operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: the @proj-airi/mediapipe-workshop package, which integrates MediaPipe for motion capture. The implementation is well-structured, adhering to the functional programming principles and separation of concerns outlined in the AGENTS.md document. The core components, including the engine, backend adapter, and VRM retargeting logic, are thoughtfully designed with a focus on performance and stability. The necessary modifications to the existing stage-ui-three package to support this new functionality are minimal and cleanly implemented via a hook system. I've included a couple of suggestions to enhance code robustness and readability. Overall, this is an excellent and well-executed addition to the project.

Comment thread apps/stage-web/src/pages/devtools/mediapipe-workshop.vue Outdated
Comment thread packages/model-driver-mediapipe/src/utils/overlay.ts Outdated
@nekomeowww nekomeowww changed the title feat(mediapipe-workshop): A new package to integrate MediaPipe into AIRI as a motion capture tool - initial demo feat(model-driver-mediapipe): integrate MediaPipe as a motion capture tool Dec 24, 2025
Comment thread packages/mediapipe-workshop/package.json Outdated
Comment thread packages/mediapipe-workshop/package.json Outdated
Comment thread packages/model-driver-mediapipe/package.json
Comment thread packages/model-driver-mediapipe/src/backends/mediapipe.ts Outdated
Comment thread packages/model-driver-mediapipe/README.md
Comment thread packages/stage-ui-three/src/components/Model/VRMModel.vue
Comment thread packages/model-driver-mediapipe/src/utils/overlay.ts Outdated
Comment thread packages/model-driver-mediapipe/src/utils/overlay.ts Outdated
Comment thread packages/model-driver-mediapipe/src/utils/overlay.ts Outdated
Comment thread packages/model-driver-mediapipe/src/types.ts
Comment thread packages/model-driver-mediapipe/src/types.ts Outdated
Comment thread packages/model-driver-mediapipe/src/three/pose-to-vrm.ts Outdated
if (!config)
throw new Error('MediaPipe backend not initialized (call init() first)')

busy = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://es-toolkit.dev/reference/promise/Semaphore.html use Semaphore(1) or Mutex for controlling to guard with only one instance / concurrent.

Comment thread packages/model-driver-mediapipe/src/backends/mediapipe.ts
@nekomeowww nekomeowww force-pushed the lilia/mediapipe branch 2 times, most recently from b5eeed4 to 3457825 Compare December 25, 2025 17:53
Comment thread packages/model-driver-mediapipe/tasks/prepare-tasks.ts
Copy link
Copy Markdown
Member

@sumimakito sumimakito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the current code is in a working state, and I’m unsure of the author’s follow-up action. Please feel free to merge it if we agree and decide to revisit these details in the future.

@sumimakito sumimakito added this to the v0.9.1 milestone Dec 26, 2025
@Lilia-Chen
Copy link
Copy Markdown
Contributor Author

Approving as the current code is in a working state, and I’m unsure of the author’s follow-up action. Please feel free to merge it if we agree and decide to revisit these details in the future.

Sorry for the delay. I'm on my holiday, and once I have time, I will:

  • Resolve all the review comments and complete this PR
  • Add necessary functionality to record and save the motion captured (a complete demo page, maybe another PR)
  • Integrate this demo into the Model setting page (maybe another PR)

Let's do it step by step. Such that there won't be too much in one PR to do and to review. Furthermore, if someone wants to step in and take over this thread of work, then it would be a more flexible way to collaborate, in my opinion.

@nekomeowww nekomeowww force-pushed the lilia/mediapipe branch 3 times, most recently from 5bbbe53 to d6d25a5 Compare December 29, 2025 20:15
@nekomeowww
Copy link
Copy Markdown
Member

nekomeowww commented Dec 30, 2025

Updates to address review notes:

  • MediaPipe assets now packaged locally: tasks/prepare-tasks.ts downloads task files + copies wasm into tasks/assets/wasm, and visionTaskWasmRoot now points there. Backend stays on import.meta.url paths to avoid runtime network requirements.
  • Overlay: switched to PoseLandmarker.POSE_CONNECTIONS / HandLandmarker.HAND_CONNECTIONS, added palette + style constants, and a TODO about per-frame perf / possibly using DrawingUtils later.
  • Types: Landmark2D/3D now reuse MediaPipe NormalizedLandmark / Landmark; presence/visibility access is guarded since TS defs don’t include those fields.
  • Concurrency: Semaphore(1) wraps backend run() to ensure single in-flight task.
  • Math helpers: added TODO to consider shared math utilities if more drivers need them.

@nekomeowww
Copy link
Copy Markdown
Member

Kind of laggy.

nekomeowww and others added 2 commits December 30, 2025 12:03
Co-authored-by: Makito <5277268+sumimakito@users.noreply.github.com>
@nekomeowww nekomeowww merged commit ecd5eee into main Dec 30, 2025
9 checks passed
@nekomeowww nekomeowww deleted the lilia/mediapipe branch December 30, 2025 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants