Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors

Beihang University
*Corresponding Author
🎃🎃🎃 Accepted by ACM MM 2025 👻👻👻


✨ Highlights

  • A novel two-stage pipeline for multi-object sketch animation with user-defined grouping and motion trajectory priors.
  • Group-based Displacement Network with Context-conditioned Feature Enhancement for improved temporal consistency.
  • Significantly outperforms existing methods in visual quality and temporal consistency for complex multi-object scenarios.

Abstract

We introduce GroupSketch, a novel pipeline for vector sketch animation that effectively handles multi-object interactions and complex motions. Existing approaches struggle with these scenarios, either being limited to single objects or suffering from temporal inconsistency and generalization issues. Our method addresses these limitations through a two-stage approach: Motion Initialization and Motion Refinement. The first stage allows users to divide sketches into semantic groups and define key frames, generating a coarse animation through interpolation. The second stage employs our Group-based Displacement Network (GDN) to refine this animation by predicting group-specific displacement fields, leveraging priors from a text-to-video model. GDN incorporates specialized components including Context-conditioned Feature Enhancement (CCFE) to improve temporal consistency. Extensive experimental results demonstrate that our approach significantly outperforms existing methods in generating high-quality, temporally consistent animations for complex multi-object sketches, expanding the practical applications of sketch animation.


Methodology


GroupSketch comprises two main steps, Motion Initialization and Motion Refinement.

beautiful teaser
Overview of the proposed method GroupSketch and the architecture of the Group-based Displacement Network (GDN).

(a) In the Motion Initialization Stage, the model takes an input sketch and obtains the semantic groups and motion trajectories through a Canvas-based interactive process. This stage outputs a coarse-level sketch animation. In the Motion Refinement Stage, these groups are fed into GDN, which computes displacement fields to refine their motion. The updated motion is merged and then rendered by a differentiable rasterizer. The calculated loss is backpropagated to update GDNs'parameters.
(b) The GDN architecture includes two components: (1) Context-conditioned Feature Enhancement (CCFE) is composed of two key components: Frame-aware Positional Encoding (FPE), which encodes the temporal positions of input point sequences, and Motion Context Learning (MCL), which enhances the feature by conditioning on the context information extracted from all frames. (2) Group Displacement Field Prediction, which combines local and global grouping paths to produce final displacements for each group.


Multi-Object Results


We compare our GroupSketch with FlipSketch and LiveSketch on the multi-object cases.


Single-Object Results


We compare our GroupSketch with FlipSketch and LiveSketch on the single-object cases.


Comparison with Video Generation Models


We compare our GroupSketch with Dynamicrafter and I2VGen-XL on the video generation task.

Prompt: "The little dog walking on the grass saw the ball that was thrown out and quickly rushed towards it."
Input SVG or PNG
Input SVG for Comparison
Dynamicrafter
Dynamicrafter Result
I2VGen-XL
I2VGen-XL Result
Ours
Our GroupSketch Result

Different Actions from the Same SVG with Varying Prompts


We show the results of our GroupSketch on the same SVG with different prompts.

Input SVG or PNG
Input SVG for Prompt Showcase
Result 1 (Ours)
Prompt: "The dolphin is chasing and eating the fish in front of it."
Result for Prompt 1
Result 2 (Ours)
Prompt: "The dolphin is chasing the small fish in front of it, but the small fish escaped."
Result for Prompt 2
Result 3 (Ours)
Prompt: "The harmless dolphin swam past the little fish."
Result for Prompt 3

Ablation Study


We perform an ablation study to evaluate the contribution of each component of our GroupSketch.

Prompt: "This man is shooting at the basketball hoop in front of him."
Input SVG or PNG
Input SVG for Ablation
Full Model (Ours)
Full Model Result
LiveSketch (Baseline)
LiveSketch Baseline Result
w/o Motion Trajectory Priors
Result w/o Motion Trajectory Priors
w/o Grouping
Result w/o Grouping
w/o Motion Context Learning
Result w/o Motion Context Learning
w/o Frame-aware Positional Encoding
Result w/o Frame-aware Positional Encoding
with LLM Motion Priors
Result with LLM Motion Priors