Models Directory

Browse and compare AI models across providers, modalities, and use cases.

Showing 20 of 115 models

Advanced Search

Active Filters

Out: Audio

PlayAI Text-to-Speech Dialog

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

View Details

Aurora

Stable, production-ready model, recommended for most users. Offers reliable performance with well-tested features.

View Details

Blizzard

Experimental model with highly conversational output, natural pacing, better filler words, and instant voice cloning. Higher latency than Aurora.

View Details

ACE-Step

Generate music from a lyrics and example audio using ACE-Step

View Details

ACE-Step

Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step

View Details

ACE-Step

Generate music from a simple prompt using ACE-Step

View Details

ACE-Step

Modify a portion of provided audio with lyrics and/or style using ACE-Step

View Details

ACE-Step

Generate music with lyrics from text using ACE-Step

View Details

Audio Understanding

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

View Details

CSM-1B

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

View Details

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

View Details

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

View Details

Chatterboxhd

Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.

View Details

DeepFilterNet 3

Enhance speech audio by removing background noise and upsampling to 48KHz

View Details

Demucs

SOTA stemming model for voice, drums, bass, guitar and more.

View Details

Dia

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.

View Details

Dia Tts

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

View Details

DiffRhythm: Lyrics to Song

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

View Details

ElevenLabs Audio Isolation

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

View Details

ElevenLabs Sound Effects

Generate sound effects using ElevenLabs advanced sound effects model.

View Details

Showing 20 of 115 models

Advanced Search

Active Filters

Out: Audio

PlayAI Text-to-Speech Dialog

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

View Details

Aurora

Stable, production-ready model, recommended for most users. Offers reliable performance with well-tested features.

View Details

Blizzard

Experimental model with highly conversational output, natural pacing, better filler words, and instant voice cloning. Higher latency than Aurora.

View Details

ACE-Step

Generate music from a lyrics and example audio using ACE-Step

View Details

ACE-Step

Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step

View Details

ACE-Step

Generate music from a simple prompt using ACE-Step

View Details

ACE-Step

Modify a portion of provided audio with lyrics and/or style using ACE-Step

View Details

ACE-Step

Generate music with lyrics from text using ACE-Step

View Details

Audio Understanding

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

View Details

CSM-1B

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

View Details

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

View Details

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

View Details

Chatterboxhd

Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.

View Details

DeepFilterNet 3

Enhance speech audio by removing background noise and upsampling to 48KHz

View Details

Demucs

SOTA stemming model for voice, drums, bass, guitar and more.

View Details

Dia

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.

View Details

Dia Tts

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

View Details

DiffRhythm: Lyrics to Song

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

View Details

ElevenLabs Audio Isolation

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

View Details

ElevenLabs Sound Effects

Generate sound effects using ElevenLabs advanced sound effects model.

View Details

Models Directory

Advanced Search

Active Filters

Use Cases

ModalityOut: 1

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

PlayAI Text-to-Speech Dialog

Aurora

Blizzard

ACE-Step

ACE-Step

ACE-Step

ACE-Step

ACE-Step

Audio Understanding

CSM-1B

Chatterbox

Chatterbox

Chatterboxhd

DeepFilterNet 3

Demucs

Dia

Dia Tts

DiffRhythm: Lyrics to Song

ElevenLabs Audio Isolation

ElevenLabs Sound Effects

Advanced Search

Active Filters

Use Cases

ModalityOut: 1

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

PlayAI Text-to-Speech Dialog

Aurora

Blizzard

ACE-Step

ACE-Step

ACE-Step

ACE-Step

ACE-Step

Audio Understanding

CSM-1B

Chatterbox

Chatterbox

Chatterboxhd

DeepFilterNet 3

Demucs

Dia

Dia Tts

DiffRhythm: Lyrics to Song

ElevenLabs Audio Isolation

ElevenLabs Sound Effects