Home
PlaygroundNEW
ComputeNEW
Community
HELPContact UsDocumentationQuick Start GuideAPI StatusJoin our Discord ChannelProduct Roadmap
Explore
Models
Workflows
Apps / Templates
Modules

Explore The World's AI™

Use any open and closed source models and workflows from leading partners and the community.

🔥 Trending Models

Model display image

DeepSeek-R1-0528-Qwen3-8B

DeepSeek-R1-0528 improves reasoning and logic via better computation and optimization, nearing the performance of top models like O3 and Gemini 2.5 Pro.

Llama-3_2-3B-Instruct

Llama 3.2 (3B) is a multilingual, instruction-tuned LLM by Meta, optimized for dialogue, retrieval, and summarization. It uses an autoregressive transformer with SFT and RLHF for improved alignment and outperforms many industry models.
Model display image

claude-sonnet-4

Claude Sonnet 4 is a state-of-the-art large language model from Anthropic, available on the Clarifai platform. It supports text and inputs and can generate high-quality, context-aware text completions, summaries, and more.
Model display image

Qwen3-14B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements.
Model display image

Devstral-Small-2505_gguf-4bit

Devstral is an agentic LLM for software engineering, developed by Mistral AI and All Hands AI. It’s designed to explore codebases, edit multiple files, and support engineering agents.

general-image-recognition

Identifies a variety of concepts in images and video including objects, themes, and more. Trained with over 10,000 concepts and 20M images.

grok-3

Grok-3 is a state-of-the-art large language model (LLM) developed by XAI, their most sophisticated model to date, which combines robust reasoning with vast pretraining knowledge.

gpt-4o

GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features
Model display image

gpt-4_1

GPT-4.1 is OpenAI's advanced LLM, optimized for coding, instruction following, and processing extended contexts up to 1 million tokens. It delivers enhanced performance, making it ideal for complex and long-form content generation tasks
Model display image

gemini-2_5-flash

Gemini 2.5 Flash Preview is the next iteration in the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal, reasoning models. Gemini 2.5 Flash Preview is Google’s first fully hybrid reasoning model.
Model display image

claude-3_5-haiku

Claude 3.5 Haiku is a fast, cost-effective large language model from Anthropic, available on the Clarifai platform. It supports text and multimodal inputs and can generate high-quality, context-aware text completions, summaries, and more.
Model display image

Qwen3-30B-A3B-GGUF

Qwen3 is the newest in the Qwen series, featuring dense and MoE models with major improvements in reasoning, instruction-following, agent tasks, and multilingual support.
Model display image

gemini-2_0-flash

Gemini 2.0 Flash is a fast, low-latency multimodal model with enhanced performance and new capabilities
Model display image

gemma-3-12b-it

Gemma 3 (12B) is a multilingual, multimodal open model by Google, handling text and image inputs with a 128K context window. It excels in tasks like QA and summarization while being efficient for deployment on limited-resource devices.

Phi-4-reasoning-plus

Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning.

MiniCPM3-4B

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

phi-4-mini-instruct

Phi-4-mini-instruct is a lightweight open model from the Phi-4 family, optimized for reasoning with high-quality data. It supports a 128K context window and uses fine-tuning for precise instruction adherence and safety.

Qwen2_5-VL-7B-Instruct

Qwen2.5-VL is a vision-language model designed for AI agents, finance, and commerce. It excels in visual recognition, reasoning, long video analysis, object localization, and structured data extraction.

Large Language Models

Last Updated
SORT BY
Last Updated
Last Created
Model Name
Model display image

pixtral-12b

Pixtral 12B is a natively multimodal model excelling in multimodal reasoning, instruction following, and text benchmarks with a 12B parameter architecture supporting variable image sizes and long context inputs
Model display image

gpt-4o

GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features
Model display image

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B is a 7B-parameter dense model distilled from DeepSeek-R1 based on Qwen-7B.
Model display image

claude-3_5-sonnet

Claude 3.5 Sonnet is a high-speed, advanced AI model excelling in reasoning, knowledge, coding, and visual tasks, ideal for complex applications.
Model display image

llama-3_2-11b-vision-instruct

Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters
Model display image

DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B is a 32B-parameter dense model distilled from DeepSeek-R1 based on Qwen-32B.

Vision Language Models

Last Updated
SORT BY
Last Updated
Last Created
Model Name
Model display image

pixtral-12b

Pixtral 12B is a natively multimodal model excelling in multimodal reasoning, instruction following, and text benchmarks with a 12B parameter architecture supporting variable image sizes and long context inputs
Model display image

gpt-4o

GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features
Model display image

gemini-2_0-flash-lite

Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
Model display image

gemini-2_0-flash

Gemini 2.0 Flash is a fast, low-latency multimodal model with enhanced performance and new capabilities
Model display image

llama-3_2-11b-vision-instruct

Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters

minicpm-o-2_6

MiniCPM-o is the latest series of end-side multimodal LLMs (MLLMs) ungraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text and speech outputs in an end-to-end fashion

Popular Workflows

Last Updated
SORT BY
Last Updated
Last Created
Model Name
Model display image

General

A general image workflow that combines detection, classification, and embedding to identify general concepts including objects, themes, moods, etc.
Model display image

rag-agent-gpt4-turbo-React-few-shot

RAG Agent uses GPT-4 Turbo LLM model with ReAct prompting, optimizing dynamic reasoning and action planning.

Face-Sentiment

Multi-model workflow that combines face detection and sentiment classification of 7 concepts: anger, disgust, fear, neutral, happiness, sadness, contempt, and surprise.

Demographics

Multi-model workflow that detects, crops, and recognizes demographic characteristics of faces. Visually classifies age, gender, and multi-culture characteristics.
© 2025 Clarifai. All rights reserved.This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
  • Clarifai Docs
  • System Status