Browse and compare AI models across providers, modalities, and use cases.
Showing 20 of 115 models
Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.
Stable, production-ready model, recommended for most users. Offers reliable performance with well-tested features.
Experimental model with highly conversational output, natural pacing, better filler words, and instant voice cloning. Higher latency than Aurora.
Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step
A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
Enhance speech audio by removing background noise and upsampling to 48KHz
Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.
Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.
DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.
Isolate audio tracks using ElevenLabs advanced audio isolation technology.
Generate sound effects using ElevenLabs advanced sound effects model.