Browse and compare AI models across providers, modalities, and use cases.
Showing 20 of 24 models
Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.
A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.
Isolate audio tracks using ElevenLabs advanced audio isolation technology.
Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.
Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.
Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.
MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.
Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.
Add immersive sound effects and background music to your videos using PixVerse sound effects generation
Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure
Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.