by Google· Released May 2024· Cutoff Early 2024
Gemini 1.5 Pro is Google's most advanced multimodal model, capable of understanding and processing text, images, audio, video, and code. It features a breakthrough 1 million token context window, enabling analysis of extremely long documents, videos, or codebases. It excels at complex reasoning, long-context tasks, and multimodal understanding.
Input cost
$3.50 per 1M tokens (text up to 128K tokens), $7.00 per 1M tokens (text over 128K tokens), $10.50 per 1M tokens (audio/image/video up to 128K tokens), $21.00 per 1M tokens (audio/image/video over 128K tokens)
Output cost
$10.50 per 1M tokens (text up to 128K tokens), $21.00 per 1M tokens (text over 128K tokens)
Context window
1,048,576 tokens
Max output
8192 tokens
Modalities
License
proprietary
Complex reasoning tasks requiring understanding of very long documents, videos, or multimodal data.