Browse and compare AI models across providers, modalities, and use cases.
Showing 20 of 84 models
Isolate audio tracks using ElevenLabs advanced audio isolation technology.
Generate text from speech using ElevenLabs advanced speech-to-text model.
Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!
State-of-the-art multilingual voice changer model (Speech to Speech)
This is a preview release of the GPT-4o Audio models. These models accept audio inputs and outputs, and can be used in the Chat Completions REST API.
Pricing
Input: $2.50 / 1M tokensOutput: $10.00 / 1M tokens
Context
128.0K
This is a preview release of the GPT-4o Realtime model, capable of responding to audio and text inputs in realtime over WebRTC or a WebSocket interface.
Pricing
Input: $5.00 / 1M tokensOutput: $20.00 / 1M tokens
Context
128.0K
GPT-4o Transcribe is a speech-to-text model that uses GPT-4o to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.
Pricing
Input: $2.50 / 1M tokensOutput: $10.00 / 1M tokens
Context
16.0K
This is a preview release of the smaller GPT-4o Audio mini model. It's designed to input audio or create audio outputs via the REST API.
Pricing
Input: $0.15 / 1M tokensOutput: $0.60 / 1M tokens
Context
128.0K
This is a preview release of the GPT-4o-mini Realtime model, capable of responding to audio and text inputs in realtime over WebRTC or a WebSocket interface.
Pricing
Input: $0.60 / 1M tokensOutput: $2.40 / 1M tokens
Context
128.0K
GPT-4o mini Transcribe is a speech-to-text model that uses GPT-4o mini to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.
Pricing
Input: $1.25 / 1M tokensOutput: $5.00 / 1M tokens
Context
16.0K
Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks.
Pricing
Input: $0.07 / 1M tokensOutput: $0.30 / 1M tokens
Context
1.0M
Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks.
Pricing
Input: $0.04 / 1M tokensOutput: $0.15 / 1M tokens
Context
1.0M
Try Gemini 2.5 Pro Preview, our most advanced Gemini model to date.
Pricing
Input: $1.25 / 1M tokensOutput: $5.00 / 1M tokens
Context
2.1M