Sourcify
Effortlessly find and manage open-source dependencies for your projects.

The industry-standard open-source implementation of Contrastive Language-Image Pre-training (CLIP).

OpenCLIP is a high-performance, open-source reproduction of OpenAI's CLIP (Contrastive Language-Image Pre-training) architecture, maintained primarily by the MLFoundations team and contributors from the LAION project. As of 2026, it serves as the foundational framework for building state-of-the-art multimodal systems, enabling researchers and developers to train and deploy models on massive datasets like LAION-5B. The technical architecture supports a vast array of vision backbones, including Vision Transformers (ViT) up to giant scales (ViT-g/G) and ResNet variants. It is designed for massive parallelization across GPU clusters using PyTorch, providing the backbone for 2026-era applications in semantic image search, automated content moderation, and generative AI guidance. By democratizing access to weights and training code, OpenCLIP has surpassed original proprietary benchmarks, offering superior zero-shot performance on ImageNet and robust robustness across out-of-distribution datasets. Its modular design allows for seamless integration into production pipelines via Hugging Face Transformers or direct implementation, making it the primary choice for enterprises seeking to avoid vendor lock-in with closed-source vision APIs.
OpenCLIP is a high-performance, open-source reproduction of OpenAI's CLIP (Contrastive Language-Image Pre-training) architecture, maintained primarily by the MLFoundations team and contributors from the LAION project.
Explore all tools that specialize in classify images. This domain focus ensures OpenCLIP delivers optimized results for this specific requirement.
Explore all tools that specialize in extract visual features. This domain focus ensures OpenCLIP delivers optimized results for this specific requirement.
Explore all tools that specialize in zero-shot image classification. This domain focus ensures OpenCLIP delivers optimized results for this specific requirement.
Ability to classify images into arbitrary categories without specific training on those labels by leveraging natural language descriptions.
Supports ViT-B, ViT-L, ViT-H, ViT-g, and ConvNeXt architectures for varying performance/latency trade-offs.
Access to weights trained on the largest publicly available image-text dataset.
Optimized DistributedDataParallel (DDP) and FSDP support for training across hundreds of GPUs.
Support for specialized tokenizers beyond the standard CLIP tokenizer for domain-specific applications.
Integration with multilingual text encoders to support image-text matching in 100+ languages.
Built-in tools to freeze the backbone and train a simple linear classifier for downstream tasks.
Environment setup using Python 3.10+ and PyTorch 2.x installation.
Repository cloning via git clone https://github.com/mlfoundations/open_clip.
Installation of dependencies including timm, ftfy, and regex via pip.
Selection of a pre-trained model variant (e.g., ViT-L-14) using open_clip.create_model_and_transforms.
Loading weights from sources like Hugging Face Hub or OpenAI directly.
Image preprocessing using the provided transform pipeline to match training distribution.
Text tokenization using the open_clip.get_tokenizer for semantic alignment.
Inference execution to generate image and text features in a shared latent space.
Similarity calculation using cosine similarity between image and text tensors.
Model quantization or export to ONNX/TensorRT for production deployment.
All Set
Ready to go
Verified feedback from other users.
"Universally praised by ML engineers for its reproducibility and the quality of pre-trained weights. It is considered the 'gold standard' for open multimodal research."
Post questions, share tips, and help other users.
Effortlessly find and manage open-source dependencies for your projects.

End-to-end typesafe APIs made easy.

Page speed monitoring with Lighthouse, focusing on user experience metrics and data visualization.

Topcoder is a pioneer in crowdsourcing, connecting businesses with a global talent network to solve technical challenges.

Explore millions of Discord Bots and Discord Apps.

Build internal tools 10x faster with an open-source low-code platform.

Open-source RAG evaluation tool for assessing accuracy, context quality, and latency of RAG systems.

AI-powered synthetic data generation for software and AI development, ensuring compliance and accelerating engineering velocity.