
CIFAR-10 and CIFAR-100 Datasets
Labeled subsets of the 80 million tiny images dataset for machine learning research.

A large-sized Vision Transformer model pre-trained on ImageNet for image classification tasks.
The Vision Transformer (ViT) Large model is a transformer encoder model pre-trained on ImageNet-21k (14 million images, 21,843 classes) and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes), both at a resolution of 224x224. It processes images as a sequence of fixed-size patches (16x16) which are then linearly embedded and fed into the transformer encoder, enhanced with a classification token ([CLS]) and positional embeddings. The model's architecture leverages the attention mechanism to capture global relationships within the image, making it suitable for various downstream image classification tasks. The model weights were converted from JAX to PyTorch by Ross Wightman.
The Vision Transformer (ViT) Large model is a transformer encoder model pre-trained on ImageNet-21k (14 million images, 21,843 classes) and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes), both at a resolution of 224x224.
Explore all tools that specialize in image classification. This domain focus ensures Vision Transformer (ViT) Large delivers optimized results for this specific requirement.
Explore all tools that specialize in feature extraction. This domain focus ensures Vision Transformer (ViT) Large delivers optimized results for this specific requirement.
Open side-by-side comparison first, then move to deeper alternatives guidance.
Verified feedback from other users.
No reviews yet. Be the first to rate this tool.

Labeled subsets of the 80 million tiny images dataset for machine learning research.

A pure ConvNet model constructed entirely from standard ConvNet modules, designed for the 2020s.

A suite of libraries, tools, and APIs for applying AI and ML techniques across multiple platforms and modalities.

Vision Transformer and MLP-Mixer architectures for image recognition and processing.
Discover and deploy pre-trained AI models for fashion-related tasks.
Pre-trained Vision Transformer models for fashion image classification and analysis.