
One-2-3-45
High-fidelity 3D mesh generation from single images in under 45 seconds
Efficient 3D mesh generation from single images using sparse-view large reconstruction models.

InstantMesh represents a significant leap in the feed-forward 3D reconstruction domain, leveraging a dual-stage architecture to transform single 2D images into high-fidelity 3D meshes in under 10 seconds. Built upon a Sparse-view Large Reconstruction Model (LRM), the framework first utilizes a multi-view diffusion model to generate spatially consistent views from a single input. These views are then processed by a transformer-based architecture that predicts a triplane representation for volumetric rendering and subsequent mesh extraction. As of 2026, InstantMesh has become a cornerstone for rapid prototyping in game development and AR/VR workflows due to its superior balance between inference speed and geometric accuracy compared to previous optimization-based methods like DreamFusion. Its architecture is specifically optimized for NVIDIA's Ada Lovelace and Blackwell architectures, ensuring minimal latency when deployed on high-end consumer GPUs or enterprise-grade H100/B200 clusters. The open-source nature of the project allows for deep integration into DCC (Digital Content Creation) tools like Blender and Unreal Engine 5, providing a robust pipeline for procedural asset generation.
InstantMesh represents a significant leap in the feed-forward 3D reconstruction domain, leveraging a dual-stage architecture to transform single 2D images into high-fidelity 3D meshes in under 10 seconds.
Explore all tools that specialize in generate 3d mesh from a single 2d image. This domain focus ensures InstantMesh delivers optimized results for this specific requirement.
Explore all tools that specialize in synthesize spatially consistent multi-view images. This domain focus ensures InstantMesh delivers optimized results for this specific requirement.
Explore all tools that specialize in integrate with blender and unreal engine. This domain focus ensures InstantMesh delivers optimized results for this specific requirement.
Uses a transformer-based Large Reconstruction Model to infer 3D structure from only a handful of generated views.
Employs a fine-tuned Stable Diffusion model to ensure generated perspectives of an object are spatially aligned.
Integrates an efficient marching cubes implementation to extract topology from triplane volumes.
Capable of generating basic Physically Based Rendering maps including roughness and metallic properties.
Automatically generates UV coordinates for the extracted mesh based on generated textures.
Weights are distributed in safetensors format for compatibility across different inference engines.
Optimizes 3D generation within the latent space of the diffusion model for faster convergence.
Clone the official GitHub repository and navigate to the project root.
Create a virtual environment using Python 3.10+ and install PyTorch with CUDA support.
Install dependencies via pip install -r requirements.txt, including xformers and diffusers.
Download the pre-trained model weights for the diffusion and LRM components from Hugging Face.
Configure the config.yaml file to specify hardware acceleration parameters (e.g., half-precision/FP16).
Pre-process input images by removing backgrounds using tools like rembg or Segment Anything (SAM).
Execute the inference script to generate multi-view consistent images.
Run the reconstruction pipeline to synthesize the 3D volume and extract the mesh.
Optional: Use the provided Gradio script for a local web-based UI experience.
Export the generated .obj or .glb files into your preferred 3D software for final cleanup.
All Set
Ready to go
Verified feedback from other users.
"Highly praised for its industry-leading speed and ability to handle complex textures compared to SV3D."
Post questions, share tips, and help other users.

High-fidelity 3D mesh generation from single images in under 45 seconds

Edit 3D scenes with text instructions using Iterative Dataset Updates and Diffusion Models.

Generative Efficient Textured 3D Mesh Synthesis for High-Fidelity 2026 Digital Twins
Edit 3D scenes and NeRFs with natural language instructions while maintaining multi-view consistency.