
CapCut
The all-in-one AI-powered creative platform for professional video editing and automated content generation.

Autoregressive visual synthesis for infinite-resolution images and long-form video generation.

NUWA-Infinity is a state-of-the-art generative model developed by Microsoft Research Asia, designed for the synthesis of high-quality images and videos from text, image, or video inputs. Unlike standard generative models that are limited by fixed resolutions, NUWA-Infinity employs an 'Autoregressive-over-Autoregressive' (AR-over-AR) architecture. This technical framework allows the model to generate visual content with essentially infinite resolution by modeling local and global context simultaneously. As of 2026, it remains a cornerstone in the evolution of visual AI, positioning itself as a superior alternative for tasks requiring extreme spatial extensions, such as outpainting and long-form video prediction. The architecture leverages a Vector Quantized Variational Autoencoder (VQ-VAE) to compress visual data into discrete tokens, which are then processed by a multi-modal transformer. Its primary market position is centered on high-fidelity creative automation and professional visual effects, providing a foundation for next-generation cinematic tools. While primarily a research-driven project, its open-source components and academic releases have heavily influenced commercial video generation platforms, setting the benchmark for temporal consistency and spatial resolution in synthetic media.
NUWA-Infinity is a state-of-the-art generative model developed by Microsoft Research Asia, designed for the synthesis of high-quality images and videos from text, image, or video inputs.
Explore all tools that specialize in text-to-video. This domain focus ensures NUWA-Infinity delivers optimized results for this specific requirement.
A dual-layer autoregressive model that handles both global structure and local patch details independently.
Uses spatial autoregression to extend an image in any direction (N, S, E, W) indefinitely.
Transformer-based temporal attention mechanism that ensures fluid movement across frames.
VQ-VAE quantization of visual patches into a shared latent space with text tokens.
Dynamic patch positioning allows the model to render in any aspect ratio without stretching.
Extends existing video clips by predicting future frames based on the previous context window.
Pre-trained on massive datasets (LAION-5B equivalent) for deep semantic understanding.
Clone the official NUWA-Infinity repository from GitHub.
Install Python 3.9+ and essential dependencies including PyTorch and CUDA.
Download pre-trained VQ-VAE and Transformer weights from the Microsoft Research portal.
Configure the environment variables and GPU memory allocation settings.
Prepare input data (text prompts or reference images) in the specified JSON format.
Initialize the AR-over-AR generation script via command line.
Define output resolution and patch-size parameters for infinite synthesis.
Execute the inference engine to begin patch-based visual generation.
Monitor GPU utilization and temporal consistency during video synthesis.
Use the provided post-processing scripts to stitch patches and export the final media.
All Set
Ready to go
Verified feedback from other users.
"Highly praised for its innovative outpainting and infinite synthesis capabilities, though noted for high VRAM requirements."
Post questions, share tips, and help other users.

The all-in-one AI-powered creative platform for professional video editing and automated content generation.

Create studio-quality, consistent AI characters and narrative videos from simple text scripts.

Turn text into photorealistic AI video in minutes with hyper-realistic digital humans.

The Collaborative AI Film Studio: Turning Storyboards into Cinematic Reality.

The idea-to-video platform that turns imagination into high-fidelity cinematic motion.

Create professional videos with AI.

AI-powered video generation platform.

An integrated Agency-as-a-Service platform using A.I. to create, edit, and scale design content in seconds.