by Alibaba· Released January 2025
Qwen2.5-VL 3B is a compact multimodal vision-language model from Alibaba's Qwen series, designed for efficient image and video understanding. It excels in tasks like visual question answering, document parsing, and video analysis while maintaining a small footprint for deployment on edge devices.
Input cost
Free (open source)
Output cost
Free (open source)
Context window
128K tokens
Max output
—
Modalities
Parameters
3B
License
Apache-2.0
Efficient multimodal tasks requiring vision-language understanding on resource-constrained devices.