Alibaba introduces a new TikTok generator: DreaMoving can create personalized dance videos through image or text prompts.

The system is based on diffusion models and uses Video ControlNet and a Content Guider. The Video ControlNet controls the generation along the specified animation. The Content Guider is responsible for controlling the content of the generated videos, including the appearance of people and backgrounds.

DreaMoving also integrates motion blocks into both the Denoising U-Net and ControlNet to improve temporal consistency and motion fidelity. Users can use text or image prompts to control the desired look and feel of the video.

DreamMoving learns from 1,000 dance videos

The DreaMoving system was trained on over 1,000 dance videos, which were divided into short clips of 8 to 10 seconds to ensure continuous images without transitions and special effects. For the individual frames of the clips, the team used MiniGPT-v2 to provide the captions necessary for the multimodal training.



Thanks to the training and the customized architecture, DreaMoving is able to generate realistic videos from text input, images or a combination of both. For example, the system can generate videos of a specific person wearing a specific piece of clothing provided by the user via an image.

DreaMoving project page. There is also a demo on HuggingFace where you can upload faces and animations or choose from a preselection.

