AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches #263

FurkanGozukara · 2025-10-25T04:29:11Z

FurkanGozukara
Oct 25, 2025
Maintainer

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches

Full tutorial: https://www.youtube.com/watch?v=m8W4l-peEBk

A system has been developed that can learn a range of physically simulated tennis skills from a vast collection of broadcast video demonstrations of tennis play. The system employs hierarchical models that combine a low-level imitation policy and a high-level motion planning policy to control the character's movements based on motion embeddings learned from the broadcast videos. By utilizing simple rewards and without the need for explicit annotations of stroke types, the system is capable of learning complex tennis shotmaking skills and stringing together multiple shots into extended rallies.

To account for the low quality of motions extracted from the broadcast videos, the system utilizes physics-based imitation to correct estimated motion and a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. The resulting controllers for physically-simulated tennis players are able to hit the incoming ball to target positions accurately using a diverse array of strokes (such as serves, forehands, and backhands), spins (including topspins and slices), and playing styles (such as one/two-handed backhands and left/right-handed play).

Overall, the system is able to synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics, demonstrating the effectiveness of the approach.

Paper link ⤵️

https://research.nvidia.com/labs/toronto-ai/vid2player3d/

Our Discord server ⤵️

https://bit.ly/SECoursesDiscord

If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 ⤵️

https://www.patreon.com/SECourses

Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews ⤵️

https://www.youtube.com/playlist?list=PL_pbwdIyffsnkay6X91BWb9rrfLATUMr3

Playlist of StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img ⤵️

https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

00:00:00 Introduction to amazing new AI technology that can learn playing tennis

00:00:18 The permission to upload video

00:00:26 The video of the paper starts with introduction

00:01:08 Motion capture has been the most common source of motion data for character animation

00:02:13 System Overview

00:03:07 Approach

00:05:00 Complex and Diverse Skills

00:06:05 Task Performance

00:06:46 Styles from Different Players

00:07:16 Two-Player Rallies

00:08:13 Ablation of Physics Correction

00:08:36 Ablation of Hybrid Control

00:08:58 Effects of Removing Residual Force Control

Computer animation faces a major challenge in developing controllers for physics-based character simulation and control. In recent years, a combination of deep reinforcement learning (DRL) and motion imitation techniques has yielded simulated characters with lifelike motions and athletic abilities. However, these systems typically rely on costly motion capture (mocap) data as a source of kinematic motions to imitate. Fortunately, video footage of athletic events is abundant and offers a rich source of in-activity motion data. This inspired a research paper by Zhang et al. that explores how video data can be leveraged to learn tennis skills.

The authors seek to answer several key questions, including how to use large-scale video databases of 3D tennis motion to produce controllers that can play full tennis rallies with simulated racket and ball dynamics, how to use state-of-the-art methods in data-driven and physically-based character animation to learn skills from video data, and how to learn character controllers with a diverse set of skills without explicit skill annotations.

To tackle these challenges, the authors propose a system that builds upon recent ideas in hierarchical physics-based character control. Their approach involves leveraging motions produced by physics-based imitation of example videos to learn a rich motion embedding for tennis actions. They then train a high-level motion controller that steers the character in the latent motion space to achieve higher-level task objectives, with low-level movements controlled by the imitation controller.

The system also addresses motion quality issues caused by perception errors in the learned motion embedding.

@Article{

zhang2023vid2player3d,

author = {Zhang, Haotian and Yuan, Ye and Makoviychuk, Viktor and Guo, Yunrong and Fidler, Sanja and Peng, Xue Bin and Fatahalian, Kayvon},

title = {Learning Physically Simulated Tennis Skills from Broadcast Videos},

journal = {ACM Trans. Graph.},

issue_date = {August 2023},

numpages = {14},

doi = {10.1145/3592408},

publisher = {ACM},

address = {New York, NY, USA},

keywords = {physics-based character animation, imitation learning, reinforcement learning},

}

Video Transcription

00:00:00 Greetings everyone. A group of brilliant researchers has recently published a new
00:00:05 research paper that enables AI to learn physically simulated tennis skills from broadcast videos.
00:00:11 Today I am excited to share their paper's amazing supplementary video in 4K super upscaled format. I
00:00:18 have obtained permission from the primary author of the paper and you can also find the link to
00:00:24 the paper in the video's description. In this paper, we present a system that allows physically
00:00:29 simulated characters to learn diverse and complex tennis skills from broadcast tennis videos.
00:00:37 Our simulated characters can hit consecutive incoming tennis balls
00:00:41 with a variety of tennis skills such as serve, forehand and backhand, topspin, and slice.
00:00:49 And the motions we generate resemble those of human players. The controllers can also be trained
00:00:55 using different players motion data, enabling the characters to adopt different playing styles.
00:01:08 Motion capture has been the most common source of motion data for character animation. While
00:01:13 MoCap is able to record high-quality data, it can be difficult to use these systems to record
00:01:18 athletic motion, which can require large capture volumes and highly skilled actors.
00:01:25 On the other hand, human athletes are frequently recorded in videos, especially for sports. These
00:01:32 videos have the potential to be a valuable source of data for character animation by
00:01:37 providing a vast volume of inactivity data of a highly specialized athletic motion. Despite being
00:01:44 large scale, the motion estimated from videos are usually in lower quality compared to mocap data.
00:01:51 While prior works have demonstrated learning skills from videos,
00:01:55 they are limited to reproducing short video clips. State-of-the-art, data-driven animation techniques
00:02:02 typically require high-quality motion data. Directly applying these methods to video data
00:02:08 may not produce natural human-like motions, and motions may not be precise enough to hit
00:02:13 incoming tennis balls close to desired locations. To enable characters to learn
00:02:19 skills from sports videos, we present a video imitation system that consists of four stages.
00:02:27 First, we estimate kinematic motions from source video clips. Secondly, a low-level imitation
00:02:34 policy is trained to imitate the kinematic motion for controlling the low-level behaviors
00:02:39 of the simulated character and generate physically corrected motion. Next, we fit conditional VAEs to
00:02:47 the corrected motion to learn a motion embedding that produces human-like tennis motions. Finally,
00:02:54 a high-level motion planning policy is trained to generate target kinematic motion from the
00:02:59 motion embedding, and then control a physically simulated character to perform a desired task.
00:03:09 To build our tennis motion data set from raw videos, we use a combination of 2D and
00:03:15 3D pose estimators to reconstruct the player's poses and route trajectories.
00:03:23 However, the estimated kinematic motions are pretty noisy, with jittering and foot
00:03:28 skating artifacts. More importantly, the wrist motion for controlling the racket is inaccurate,
00:03:36 since it is difficult to estimate the wrist or the racket motion due to occlusion and motion blur.
00:03:47 To address these artifacts, we train a low-level imitation policy to control a physically
00:03:54 simulated character to track these noisy kinematic motions and output physically corrected motions.
00:04:01 The resulting motions after correction are more physically plausible and stable compared to the
00:04:06 original kinematic motions. With the corrected motion dataset, we can construct a kinematic
00:04:14 motion embedding by fitting conditional VAEs to the motion data. Given the same initial pose,
00:04:21 diverse motions can be generated by sampling different trajectories of latency.
00:04:29 An additional benefit of the motion embedding is that it can
00:04:32 help smooth the motions and mitigate some of the jittering artifacts in the original motion data.
00:04:43 To address the inaccuracies in the wrist joint for precise control of the racket, we propose a hybrid
00:04:48 control structure where the full body motion is controlled by the reference trajectories
00:04:53 from the motion embedding, while the wrist motion is directly controlled by the high-level policy.
00:05:03 With our system, various tennis skills can be learned such as serve,
00:05:07 forehand topspin, backhand topspin, and backhand slice. These skills are
00:05:15 learned using data from a right-handed player who used a one-handed backhand.
00:05:27 The simulated character can hit fast-coming tennis balls with diverse and complex skills.
00:05:38 When given a target spin direction,
00:05:40 such as a backspin, the character will hit the ball with a slice.
00:05:50 Here we visualize the skills with the character model used in our physics simulation.
00:06:09 The simulated characters can hit incoming tennis balls close to random target locations with high
00:06:15 precision. They can hit the same incoming tennis ball to various target locations,
00:06:25 or hit different incoming tennis balls to the same target.
00:06:31 In the extreme cases, the simulated characters can still complete the task with exceptional skill,
00:06:37 such as hitting consecutive balls that land on the court edges. When constructing the motion
00:06:44 embedding with different players' motion, the simulated character can learn tennis skills in
00:06:53 different styles, such as a two-hand backhand swing learned using data from a right-handed
00:06:58 player who used a two-hand backhand, or holding the racket with the left
00:07:04 hand learned using data from a left-handed player who also used a two-hand backhand.
00:07:18 The learned controllers can further generate novel animations of tennis rallies between two players.
00:07:26 This rally is generated using controllers learned from two right-handed players.
00:07:42 This rally is generated using controllers learned
00:07:45 from a left-handed player and a right-handed player.
00:08:12 The physics correction is essential for constructing a good motion embedding for
00:08:16 generating natural tennis motions. Directly training the embedding from the uncorrected
00:08:21 kinematic motions will result in physically implausible motion that exhibits artifacts
00:08:27 such as foot skating and jittering. It also decreases precision when hitting the tennis balls.
00:08:35 The proposed hybrid control is crucial for precisely controlling
00:08:39 the tennis racket. Without the hybrid control to correct the wrist motions,
00:08:43 the simulated character may hit the ball, but fail to return it close to the target.
00:09:20 More details are available in the paper. Thank you for watching.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches #263

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches #263

Uh oh!

FurkanGozukara Oct 25, 2025 Maintainer

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches

Video Transcription

Replies: 0 comments

FurkanGozukara
Oct 25, 2025
Maintainer