Activefrontiermultimodal Proprietary

Gemini 1.5 Pro

by Google· Released May 2024· Cutoff Early 2024

Gemini 1.5 Pro is Google's most advanced multimodal model, capable of understanding and processing text, images, audio, video, and code. It features a breakthrough 1 million token context window, enabling analysis of extremely long documents, videos, or codebases. It excels at complex reasoning, long-context tasks, and multimodal understanding.

Official Site API Docs

Input cost

$3.50 per 1M tokens (text up to 128K tokens), $7.00 per 1M tokens (text over 128K tokens), $10.50 per 1M tokens (audio/image/video up to 128K tokens), $21.00 per 1M tokens (audio/image/video over 128K tokens)

Output cost

$10.50 per 1M tokens (text up to 128K tokens), $21.00 per 1M tokens (text over 128K tokens)

Context window

1,048,576 tokens

Max output

8192 tokens

Modalities

textimageaudiovideo

License

proprietary

Capabilities

Function CallingVisionCode GenerationStreamingJSON ModeLong ContextMultimodal UnderstandingAudio Processing

Best For

Complex reasoning tasks requiring understanding of very long documents, videos, or multimodal data.

Strengths

Industry-leading 1M token context window
Strong multimodal understanding across text, image, audio, and video
Excellent at complex reasoning and long-context tasks

Limitations

Higher latency compared to smaller models
Pricing increases significantly for long context and multimodal inputs
Not open source; proprietary model

Use Cases

Analyzing entire codebases or long documents

Summarizing hours-long videos or meetings

Multimodal question answering over large datasets

Complex data extraction from mixed media

Long-form content generation with consistent context

Advanced chatbot with memory of entire conversation history

Research and analysis of large scientific papers

Improvements Over Previous Model

Context window increased from 32K (Gemini 1.0 Pro) to 1M tokens
Native multimodal support added (text, image, audio, video) vs text-only predecessor
Significantly improved reasoning and coding benchmarks
Lower pricing compared to Gemini 1.0 Pro
Faster inference and lower latency than predecessor

Back to all models