by OpenAI· Released October 2024· Cutoff October 2023
GPT-4o-audio-preview is a multimodal model that extends GPT-4o with native audio input and output capabilities, enabling real-time voice interactions and audio processing. It is designed for applications requiring low-latency speech-to-speech or audio understanding, such as voice assistants and audio transcription with reasoning.
Input cost
$5.00 per 1M tokens
Output cost
$15.00 per 1M tokens
Context window
128K tokens
Max output
4096 tokens
Modalities
License
proprietary
Real-time voice applications and audio-based interactions requiring low-latency multimodal understanding and generation.