by Alibaba· Released January 2025· Cutoff September 2024
Qwen2.5-VL 72B is Alibaba's flagship multimodal model, excelling in vision-language tasks such as image and video understanding, document parsing, and visual reasoning. It builds on the Qwen2.5 language model with enhanced visual perception and dynamic resolution support.
Input cost
Free (open source)
Output cost
Free (open source)
Context window
131072 tokens
Max output
8192 tokens
Modalities
Parameters
72B
License
Apache-2.0
Complex multimodal reasoning tasks requiring high accuracy in visual understanding and document analysis.