Models Directory

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

View Details

ElevenLabs Audio Isolation

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

View Details

ElevenLabs Speech to Text

Generate text from speech using ElevenLabs advanced speech-to-text model.

View Details

ElevenLabs Speech to Text - Scribe V2

Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!

View Details

ElevenLabs Voice Changer

Change the voices in your audios with voices in ElevenLabs!

View Details

FFmpeg API [Merge Audios]

Merge audios into a single audio using FFmpeg API!

View Details

Kling Video Create Voice

Create Voices to be used with Kling Models Voice Control

View Details

Nemotron

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

View Details

Nemotron

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

View Details

Nova SR

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

View Details

Pipecat's Smart Turn model

An open source, community-driven and native audio turn detection model by Pipecat AI.

View Details

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

View Details

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

View Details

Silero VAD

Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model

View Details

Showing 20 of 110 models

Advanced Search

Active Filters

In: Audio

ACE-Step

Generate music from a lyrics and example audio using ACE-Step

View Details

ACE-Step

Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step

View Details

ACE-Step

Modify a portion of provided audio with lyrics and/or style using ACE-Step

View Details

Audio Understanding

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

View Details

DeepFilterNet 3

Enhance speech audio by removing background noise and upsampling to 48KHz

View Details

Demucs

SOTA stemming model for voice, drums, bass, guitar and more.

View Details

Dia Tts

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

View Details

ElevenLabs Audio Isolation

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

View Details

ElevenLabs Speech to Text

Generate text from speech using ElevenLabs advanced speech-to-text model.

View Details

ElevenLabs Speech to Text - Scribe V2

Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!

View Details

ElevenLabs Voice Changer

Change the voices in your audios with voices in ElevenLabs!

View Details

FFmpeg API [Merge Audios]

Merge audios into a single audio using FFmpeg API!

View Details

Kling Video Create Voice

Create Voices to be used with Kling Models Voice Control

View Details

Nemotron

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

View Details

Nemotron

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

View Details

Nova SR

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

View Details

Pipecat's Smart Turn model

An open source, community-driven and native audio turn detection model by Pipecat AI.

View Details

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

View Details

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

View Details

Silero VAD

Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model

View Details

Models Directory

Advanced Search

Active Filters

Use Cases

ModalityIn: 1

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

ACE-Step

ACE-Step

ACE-Step

Audio Understanding

DeepFilterNet 3

Demucs

Dia Tts

ElevenLabs Audio Isolation

ElevenLabs Speech to Text

ElevenLabs Speech to Text - Scribe V2

ElevenLabs Voice Changer

FFmpeg API [Merge Audios]

Kling Video Create Voice

Nemotron

Nemotron

Nova SR

Pipecat's Smart Turn model

Sam Audio

Sam Audio

Silero VAD

Advanced Search

Active Filters

Use Cases

ModalityIn: 1

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

ACE-Step

ACE-Step

ACE-Step

Audio Understanding

DeepFilterNet 3

Demucs

Dia Tts

ElevenLabs Audio Isolation

ElevenLabs Speech to Text

ElevenLabs Speech to Text - Scribe V2

ElevenLabs Voice Changer

FFmpeg API [Merge Audios]

Kling Video Create Voice

Nemotron

Nemotron

Nova SR

Pipecat's Smart Turn model

Sam Audio

Sam Audio

Silero VAD