by Alibaba· Released August 2023· Cutoff June 2023
Qwen-VL is a multimodal large language model developed by Alibaba Cloud, capable of understanding and generating text based on visual inputs such as images. It integrates vision and language understanding, enabling tasks like image captioning, visual question answering, and document understanding. As part of the Qwen series, it offers strong performance in both Chinese and English contexts.
Input cost
Free (open source)
Output cost
Free (open source)
Context window
32K tokens
Max output
2048 tokens
Modalities
Parameters
7B
License
Apache-2.0
Multimodal tasks requiring understanding of images and text, such as visual question answering and image captioning.