Browse and compare AI models across providers, modalities, and use cases.
Showing 20 of 110 models
Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step
A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.
Enhance speech audio by removing background noise and upsampling to 48KHz
Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.
Isolate audio tracks using ElevenLabs advanced audio isolation technology.
Generate text from speech using ElevenLabs advanced speech-to-text model.
Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!
An open source, community-driven and native audio turn detection model by Pipecat AI.
Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model