BoT-SORT
Robust Associations Multi-Pedestrian Tracking using motion and appearance information with camera-motion compensation.
Pre-trained Vision Transformer models for fashion image classification and analysis.
Hugging Face hosts a variety of Vision Transformer (ViT) models specifically fine-tuned for fashion-related image classification tasks. These models, often built using PyTorch, TensorFlow, or JAX frameworks, are designed to analyze and understand visual attributes within fashion imagery. Key use cases include identifying clothing types, detecting perspectives, determining gender and age associations, and categorizing pack types. The architecture typically leverages pre-trained ViT backbones, optimized for tasks like FashionMNIST classification. Users can access and deploy these models through the Hugging Face Hub, utilizing libraries such as Transformers and Diffusers. Inference can be performed using various providers like Groq, Novita, and Cerebras, offering options for both CPU and GPU-based deployments. The platform supports safetensors for secure weight storage and provides tools for training and optimization, including PEFT and bitsandbytes.
Hugging Face hosts a variety of Vision Transformer (ViT) models specifically fine-tuned for fashion-related image classification tasks.
Explore all tools that specialize in image classification. This domain focus ensures Hugging Face Fashion ViT Models delivers optimized results for this specific requirement.
Explore all tools that specialize in object detection. This domain focus ensures Hugging Face Fashion ViT Models delivers optimized results for this specific requirement.
Open side-by-side comparison first, then move to deeper alternatives guidance.
Verified feedback from other users.
No reviews yet. Be the first to rate this tool.
Robust Associations Multi-Pedestrian Tracking using motion and appearance information with camera-motion compensation.
Pluggable SOTA multi-object tracking modules for segmentation, object detection, and pose estimation models.

A simple, fast, and strong multi-object tracker that associates every detection box.

Labeled subsets of the 80 million tiny images dataset for machine learning research.

A large-scale street fashion dataset with polygon annotations for computer vision research.

A pure ConvNet model constructed entirely from standard ConvNet modules, designed for the 2020s.