RotoNet: Rotoscoping-based Artistic Style Transfer Networks

Abstract: Conventional video style transfer techniques apply styles uniformly across entire frames, making it challenging to selectively transform specific objects. In this study, I propose RotoNet, a novel deep-learning framework that enables object-specific style transfer based on rotoscoping. RotoNet consists of an object tracking network and a style transfer network, aiming to selectively apply artistic styles to targeted objects within a video. By overcoming the limitations of existing style transfer models, RotoNet captures the distinctive aesthetic qualities of rotoscoping animation-precision in motion tracing, line expressiveness, and artistic interpretation of human movement.

1. Introduction

Rotoscoping is a traditional animation technique that involves manually tracing objects in live-action video frame by frame. While it enables highly realistic motion representation, it is time-consuming, labor-intensive, and requires pre-recorded footage, making it costly and difficult to scale for large projects. To address these limitations, I propose RotoNet, a deep learning–based framework for object-specific style transfer in videos. RotoNet aims to automate the rotoscoping process, reduce production time and cost, and improve the efficiency of video stylization.

2. Methods

The overall architecture of RotoNet consists of two main components designed to accurately track specific objects in a video and selectively apply style transformations. The object tracking network identifies the target object specified by the user in the initial frame and consistently segments and tracks the object throughout the entire sequence of video frames. The style transfer network utilizes the binary masks generated by the object tracking network to selectively apply style only to the designated object regions within the video.

2.1. Object Segmentation & Tracking

For accurate object segmentation and tracking in videos, I employ SAMURAI. SAMURAI introduces motion-based modeling and a motion-aware memory selection mechanism, enabling robust object tracking even in cluttered and dynamic environments. It supports zero-shot video object segmentation, allowing the target object to be segmented and tracked throughout the entire video using only a simple prompt—such as a box or mask—in the first frame, without any additional training. Built upon the Segment Anything Model (SAM), SAMURAI ensures strong segmentation performance and provides spatiotemporal consistency tailored for the video domain.

2.2. Video Stylization

Directly applying image style transfer models to video often leads to a lack of temporal consistency, resulting in frame-to-frame variations known as the "popping effect." To mitigate this temporal discontinuity, I adopt a blending strategy that combines the current frame with the previously stylized frame at a fixed ratio. This creates a ghosting effect that enhances temporal coherence across frames, ensuring more consistent stylization and reducing visual artifacts throughout the video.

3. Experiments

Original	Object Tracking & Segmentation

Binary Mask	Stylization

4. Result

Style Image

Final Video

Acknowledgment

This is the final project for the course <Introduction to Generative AI>.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
Experiment		Experiment
Result		Result
Final_report.pdf		Final_report.pdf
README.md		README.md
RotoNet.ipynb		RotoNet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RotoNet: Rotoscoping-based Artistic Style Transfer Networks

1. Introduction

2. Methods

2.1. Object Segmentation & Tracking

2.2. Video Stylization

3. Experiments

4. Result

Style Image

Final Video

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RotoNet: Rotoscoping-based Artistic Style Transfer Networks

1. Introduction

2. Methods

2.1. Object Segmentation & Tracking

2.2. Video Stylization

3. Experiments

4. Result

Style Image

Final Video

Acknowledgment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages