
TVPaint Animation
The digital solution for your professional 2D animation projects.

Multi-concept customization of text-to-image diffusion models.

Custom Diffusion is a method for fine-tuning text-to-image diffusion models like Stable Diffusion with a few images (4-20) of a new concept. It achieves fast training (~6 minutes on 2 A100 GPUs) by fine-tuning only a subset of model parameters—key and value projection matrices in the cross-attention layers. This reduces storage per concept to 75MB. The method allows combining multiple concepts, such as a new object and artistic style. It uses regularization images to prevent overfitting. The tool provides scripts for single and multi-concept fine-tuning and merging fine-tuned models. Custom Diffusion supports training and inference through the Diffusers library and offers a dataset of 101 concepts with evaluation prompts.
Custom Diffusion is a method for fine-tuning text-to-image diffusion models like Stable Diffusion with a few images (4-20) of a new concept.
Explore all tools that specialize in text-to-image generation. This domain focus ensures Custom Diffusion delivers optimized results for this specific requirement.
Fine-tunes only key and value projection matrices in cross-attention layers, reducing training time significantly.
Each additional concept requires only 75MB of extra storage due to the parameter-efficient fine-tuning approach.
Supports the combination of multiple concepts, such as new object + new artistic style, multiple new objects, and new object + new category.
Enables merging of two fine-tuned models using optimization techniques to create a single model.
Supports training and inference using the Diffusers library, providing a user-friendly interface and access to advanced features.
Uses a small set of regularization images (200) to prevent overfitting during fine-tuning.
1. Clone the Custom Diffusion repository: `git clone https://github.com/adobe-research/custom-diffusion.git`
2. Navigate to the repository: `cd custom-diffusion`
3. Clone the Stable Diffusion repository: `git clone https://github.com/CompVis/stable-diffusion.git`
4. Navigate to the Stable Diffusion directory: `cd stable-diffusion`
5. Create a conda environment: `conda env create -f environment.yaml`
6. Activate the environment: `conda activate ldm`
7. Install clip-retrieval and tqdm: `pip install clip-retrieval tqdm`
8. Download the Stable Diffusion model checkpoint: `wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt`
9. For single-concept fine-tuning, download and unzip the dataset: `wget https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip && unzip data.zip`
10. Run the training script: `bash scripts/finetune_real.sh "cat" data/cat real_reg/samples_cat cat finetune_addtoken.yaml <pretrained-model-path>`
11. Save updated model weights: `python src/get_deltas.py --path logs/<folder-name> --newtoken 1`
12. Sample the fine-tuned model: `python sample.py --prompt "<new1> cat playing with a ball" --delta_ckpt logs/<folder-name>/checkpoints/delta_epoch=000004.ckpt --ckpt <pretrained-model-path>`
All Set
Ready to go
Verified feedback from other users.
"Users praise the efficiency and flexibility of Custom Diffusion for fine-tuning text-to-image models."
Post questions, share tips, and help other users.

The digital solution for your professional 2D animation projects.

Empowering independent artists with digital music distribution, publishing administration, and promotional tools.

Convert creative micro-blogs into high-performance web presences using generative AI and Automattic's core infrastructure.

Fashion design technology software and machinery for apparel product development.

Instantly turns any text to natural sounding speech for listening online or generating downloadable audio.

Professional studio-quality AI headshot generator for individuals and teams.