
BentoML
Inference platform built for speed and control, enabling deployment of any model anywhere with tailored optimization and efficient scaling.

Diffusion model inference in pure C/C++ for various image and video models.

stable-diffusion.cpp is a C/C++ implementation based on ggml for running diffusion models like Stable Diffusion. It supports a range of image models including SD1.x, SD2.x, SDXL, SD3, FLUX, Qwen Image, and Z-Image, as well as video models like Wan2.1/Wan2.2. The library offers features like Control Net and LoRA support, latent consistency models (LCM), and faster decoding with TAESD. It supports CPU (with AVX, AVX2, AVX512), CUDA, Vulkan, Metal, OpenCL, and SYCL backends. Weight formats include Pytorch checkpoints, Safetensors, and GGUF. The project aims for cross-platform reproducibility with configurable RNGs. It uses stable-diffusion-webui style tokenizers and VAE tiling for reduced memory usage, making it suitable for local diffusion applications.
stable-diffusion.
Explore all tools that specialize in generate images. This domain focus ensures stable-diffusion.cpp delivers optimized results for this specific requirement.
Explore all tools that specialize in generate videos. This domain focus ensures stable-diffusion.cpp delivers optimized results for this specific requirement.
Explore all tools that specialize in edit images. This domain focus ensures stable-diffusion.cpp delivers optimized results for this specific requirement.
Explore all tools that specialize in deploy ai models. This domain focus ensures stable-diffusion.cpp delivers optimized results for this specific requirement.
Explore all tools that specialize in image generation. This domain focus ensures stable-diffusion.cpp delivers optimized results for this specific requirement.
Optimizes memory usage during inference by using flash attention mechanisms.
Implements stable-diffusion-webui style token weighting for negative prompts.
Processes VAE in tiles to reduce memory consumption during latent decoding.
Provides options for consistent RNG behavior across different platforms (CUDA, CPU).
Allows integration of LoRA models for fine-tuning and customization.
1. Download the sd executable from the releases page or build from source.
2. Download model weights (.ckpt, .safetensors, or .gguf) from a source like Hugging Face.
3. Place the model weights in a designated models directory.
4. Use the command line interface (CLI) with the `-m` flag to specify the model path and `-p` flag for the prompt.
5. Run the command, e.g., `./bin/sd-cli -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"`.
6. Refer to the CLI documentation for advanced options and performance tuning.
All Set
Ready to go
Verified feedback from other users.
"Highly efficient and versatile C++ implementation of Stable Diffusion, praised for its performance and cross-platform compatibility."
Post questions, share tips, and help other users.

Inference platform built for speed and control, enabling deployment of any model anywhere with tailored optimization and efficient scaling.

The agency-grade AI website builder designed to scale professional web design workflows.

AI-powered content creation platform for SEO-optimized articles, images, and internal links.
Build, deploy, and manage AI solutions at scale with a comprehensive suite of AI services, infrastructure, and tools.

The end-to-end AI cloud that simplifies building and deploying models.

AI Inference platform offering developer-friendly APIs for performance and cost-efficiency.

AI-powered platform for generating on-brand images, videos, 3D assets, and audio for gaming, media, and marketing.

A fully-managed, unified AI development platform for building and using generative AI, enhanced by Gemini models.