Models Directory

Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.

View Details

MMAudio V2 Text to Audio

MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.

View Details

Minimax Music

Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

View Details

Music Generation

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

View Details

Pixverse

Add immersive sound effects and background music to your videos using PixVerse sound effects generation

View Details

Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure

View Details

Sora 2

Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Sora 2

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Sora 2

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Sora 2

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Showing 20 of 24 models

Advanced Search

Active Filters

audio

PlayAI Text-to-Speech Dialog

Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.

View Details

Audio Understanding

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

View Details

Bytedance

Generate videos with audio with Seedance 1.5

View Details

Bytedance

Generate videos with audio with Seedance 1.5 (supports start & end frame)

View Details

Demucs

SOTA stemming model for voice, drums, bass, guitar and more.

View Details

ElevenLabs Audio Isolation

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

View Details

ElevenLabs TTS Multilingual v2

Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.

View Details

ElevenLabs TTS Turbo v2.5

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.

View Details

Elevenlabs

Generate text-to-speech audio using Eleven-v3 from ElevenLabs.

View Details

Elevenlabs

Generate realistic audio dialogues using Eleven-v3 from ElevenLabs.

View Details

Kling TTS

Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.

View Details

MMAudio V2 Text to Audio

MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.

View Details

Minimax Music

Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

View Details

Music Generation

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

View Details

Pixverse

Add immersive sound effects and background music to your videos using PixVerse sound effects generation

View Details

Sora 2

View Details

Sora 2

Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Sora 2

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Sora 2

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Sora 2

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

View Details

Models Directory

Advanced Search

Active Filters

Use Cases1

Modality

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

PlayAI Text-to-Speech Dialog

Audio Understanding

Bytedance

Bytedance

Demucs

ElevenLabs Audio Isolation

ElevenLabs TTS Multilingual v2

ElevenLabs TTS Turbo v2.5

Elevenlabs

Elevenlabs

Kling TTS

MMAudio V2 Text to Audio

Minimax Music

Music Generation

Pixverse

Sora 2

Sora 2

Sora 2

Sora 2

Sora 2

Advanced Search

Active Filters

Use Cases1

Modality

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

PlayAI Text-to-Speech Dialog

Audio Understanding

Bytedance

Bytedance

Demucs

ElevenLabs Audio Isolation

ElevenLabs TTS Multilingual v2

ElevenLabs TTS Turbo v2.5

Elevenlabs

Elevenlabs

Kling TTS

MMAudio V2 Text to Audio

Minimax Music

Music Generation

Pixverse

Sora 2

Sora 2

Sora 2

Sora 2

Sora 2