Models Directory

Browse and compare AI models across providers, modalities, and use cases.

Showing 20 of 31 models

Advanced Search

Active Filters

multimodal

Any VLM

Use any vision language model from our selected catalogue (powered by OpenRouter)

View Details

Bagel

Bagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images.

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Isaac 0.1

Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.

View Details

Isaac 0.1 [OpenAI Compatible Endpoint]

OpenAI spec compatible endpoint of Isaac-01 which is a multimodal vision-language model from Perceptron for various vision language tasks.

View Details

LLaVA v1.6 34B

Vision

View Details

MiniCPM-V 2.6

Multimodal vision-language model for single/multi image understanding

View Details

Showing 20 of 31 models

Advanced Search

Active Filters

multimodal

Any VLM

Use any vision language model from our selected catalogue (powered by OpenRouter)

View Details

Bagel

Bagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images.

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

View Details

Isaac 0.1

Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.

View Details

Isaac 0.1 [OpenAI Compatible Endpoint]

OpenAI spec compatible endpoint of Isaac-01 which is a multimodal vision-language model from Perceptron for various vision language tasks.

View Details

LLaVA v1.6 34B

Vision

View Details

MiniCPM-V 2.6

Multimodal vision-language model for single/multi image understanding

View Details

Models Directory

Advanced Search

Active Filters

Use Cases1

Modality

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

Any VLM

Bagel

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Isaac 0.1

Isaac 0.1 [OpenAI Compatible Endpoint]

LLaVA v1.6 34B

MiniCPM-V 2.6

Advanced Search

Active Filters

Use Cases1

Modality

License

Inference Medium

Provider

Languages

Context Length

Parameter Range

Input Price

Output Price

Any VLM

Bagel

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Florence-2 Large

Isaac 0.1

Isaac 0.1 [OpenAI Compatible Endpoint]

LLaVA v1.6 34B

MiniCPM-V 2.6