Skip to content

choiseoyoon0330/RotoNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RotoNet: Rotoscoping-based Artistic Style Transfer Networks

Abstract: Conventional video style transfer techniques apply styles uniformly across entire frames, making it challenging to selectively transform specific objects. In this study, I propose RotoNet, a novel deep-learning framework that enables object-specific style transfer based on rotoscoping. RotoNet consists of an object tracking network and a style transfer network, aiming to selectively apply artistic styles to targeted objects within a video. By overcoming the limitations of existing style transfer models, RotoNet captures the distinctive aesthetic qualities of rotoscoping animation-precision in motion tracing, line expressiveness, and artistic interpretation of human movement.

1. Introduction

Rotoscoping is a traditional animation technique that involves manually tracing objects in live-action video frame by frame. While it enables highly realistic motion representation, it is time-consuming, labor-intensive, and requires pre-recorded footage, making it costly and difficult to scale for large projects. To address these limitations, I propose RotoNet, a deep learning–based framework for object-specific style transfer in videos. RotoNet aims to automate the rotoscoping process, reduce production time and cost, and improve the efficiency of video stylization.

2. Methods

The overall architecture of RotoNet consists of two main components designed to accurately track specific objects in a video and selectively apply style transformations. The object tracking network identifies the target object specified by the user in the initial frame and consistently segments and tracks the object throughout the entire sequence of video frames. The style transfer network utilizes the binary masks generated by the object tracking network to selectively apply style only to the designated object regions within the video.

2.1. Object Segmentation & Tracking

For accurate object segmentation and tracking in videos, I employ SAMURAI. SAMURAI introduces motion-based modeling and a motion-aware memory selection mechanism, enabling robust object tracking even in cluttered and dynamic environments. It supports zero-shot video object segmentation, allowing the target object to be segmented and tracked throughout the entire video using only a simple prompt—such as a box or mask—in the first frame, without any additional training. Built upon the Segment Anything Model (SAM), SAMURAI ensures strong segmentation performance and provides spatiotemporal consistency tailored for the video domain.

2.2. Video Stylization

Directly applying image style transfer models to video often leads to a lack of temporal consistency, resulting in frame-to-frame variations known as the "popping effect." To mitigate this temporal discontinuity, I adopt a blending strategy that combines the current frame with the previously stylized frame at a fixed ratio. This creates a ghosting effect that enhances temporal coherence across frames, ensuring more consistent stylization and reducing visual artifacts throughout the video.

3. Experiments

Original Object Tracking & Segmentation
Binary Mask Stylization

4. Result

Style Image

Final Video

final_video

Acknowledgment

This is the final project for the course <Introduction to Generative AI>.

About

RotoNet: Rotoscoping-based Artistic Style Transfer Networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors