by Microsoft· Released August 2024· Cutoff August 2024
Phi-3.5 Vision is a lightweight, state-of-the-art multimodal model that processes both text and images. It excels in reasoning over images, extracting information from charts and tables, and understanding video frames. As part of the Phi-3 family, it offers strong performance in a compact size, suitable for resource-constrained environments.
Input cost
Free (open source)
Output cost
Free (open source)
Context window
128K tokens
Max output
—
Modalities
Parameters
4.2B
License
MIT
Multimodal reasoning tasks requiring image understanding in a small, efficient model.