by Alibaba· Released January 2025· Cutoff December 2024
Qwen2.5-VL 7B is a multimodal vision-language model from Alibaba's Qwen series, supporting image and video understanding. It excels in visual reasoning, document parsing, and real-time video analysis, offering strong performance in a compact 7B parameter size.
Input cost
Free (open source)
Output cost
Free (open source)
Context window
131072 tokens
Max output
8192 tokens
Modalities
Parameters
7B
License
Apache-2.0
Visual reasoning tasks such as document parsing, video analysis, and multimodal chat applications.