Explore The World's AI™
Use any open and closed source models and workflows from leading partners and the community.
🔥 Trending Models
DeepSeek-R1-0528-Qwen3-8B
DeepSeek-R1-0528 improves reasoning and logic via better computation and optimization, nearing the performance of top models like O3 and Gemini 2.5 Pro.
Llama-3_2-3B-Instruct
Llama 3.2 (3B) is a multilingual, instruction-tuned LLM by Meta, optimized for dialogue, retrieval, and summarization. It uses an autoregressive transformer with SFT and RLHF for improved alignment and outperforms many industry models.
claude-sonnet-4
Claude Sonnet 4 is a state-of-the-art large language model from Anthropic, available on the Clarifai platform. It supports text and inputs and can generate high-quality, context-aware text completions, summaries, and more.
Qwen3-14B
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements.
Devstral-Small-2505_gguf-4bit
Devstral is an agentic LLM for software engineering, developed by Mistral AI and All Hands AI. It’s designed to explore codebases, edit multiple files, and support engineering agents.
general-image-recognition
Identifies a variety of concepts in images and video including objects, themes, and more. Trained with over 10,000 concepts and 20M images.
grok-3
Grok-3 is a state-of-the-art large language model (LLM) developed by XAI, their most sophisticated model to date, which combines robust reasoning with vast pretraining knowledge.
gpt-4o
GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features
gpt-4_1
GPT-4.1 is OpenAI's advanced LLM, optimized for coding, instruction following, and processing extended contexts up to 1 million tokens. It delivers enhanced performance, making it ideal for complex and long-form content generation tasks
gemini-2_5-flash
Gemini 2.5 Flash Preview is the next iteration in the Gemini 2.0 series of models, a suite of highly-capable, natively multimodal, reasoning models. Gemini 2.5 Flash Preview is Google’s first fully hybrid reasoning model.
claude-3_5-haiku
Claude 3.5 Haiku is a fast, cost-effective large language model from Anthropic, available on the Clarifai platform. It supports text and multimodal inputs and can generate high-quality, context-aware text completions, summaries, and more.
Qwen3-30B-A3B-GGUF
Qwen3 is the newest in the Qwen series, featuring dense and MoE models with major improvements in reasoning, instruction-following, agent tasks, and multilingual support.
gemini-2_0-flash
Gemini 2.0 Flash is a fast, low-latency multimodal model with enhanced performance and new capabilities
gemma-3-12b-it
Gemma 3 (12B) is a multilingual, multimodal open model by Google, handling text and image inputs with a 128K context window. It excels in tasks like QA and summarization while being efficient for deployment on limited-resource devices.
Phi-4-reasoning-plus
Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning.
MiniCPM3-4B
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
phi-4-mini-instruct
Phi-4-mini-instruct is a lightweight open model from the Phi-4 family, optimized for reasoning with high-quality data. It supports a 128K context window and uses fine-tuning for precise instruction adherence and safety.
Qwen2_5-VL-7B-Instruct
Qwen2.5-VL is a vision-language model designed for AI agents, finance, and commerce. It excels in visual recognition, reasoning, long video analysis, object localization, and structured data extraction.
Large Language Models
Last Updated
pixtral-12b
Pixtral 12B is a natively multimodal model excelling in multimodal reasoning, instruction following, and text benchmarks with a 12B parameter architecture supporting variable image sizes and long context inputs
gpt-4o
GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Qwen-7B is a 7B-parameter dense model distilled from DeepSeek-R1 based on Qwen-7B.
claude-3_5-sonnet
Claude 3.5 Sonnet is a high-speed, advanced AI model excelling in reasoning, knowledge, coding, and visual tasks, ideal for complex applications.
llama-3_2-11b-vision-instruct
Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Qwen-32B is a 32B-parameter dense model distilled from DeepSeek-R1 based on Qwen-32B.
Vision Language Models
Last Updated
pixtral-12b
Pixtral 12B is a natively multimodal model excelling in multimodal reasoning, instruction following, and text benchmarks with a 12B parameter architecture supporting variable image sizes and long context inputs
gpt-4o
GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features
gemini-2_0-flash-lite
Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
gemini-2_0-flash
Gemini 2.0 Flash is a fast, low-latency multimodal model with enhanced performance and new capabilities
llama-3_2-11b-vision-instruct
Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters
minicpm-o-2_6
MiniCPM-o is the latest series of end-side multimodal LLMs (MLLMs) ungraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text and speech outputs in an end-to-end fashion
Popular Workflows
Last Updated
General
A general image workflow that combines detection, classification, and embedding to identify general concepts including objects, themes, moods, etc.
rag-agent-gpt4-turbo-React-few-shot
RAG Agent uses GPT-4 Turbo LLM model with ReAct prompting, optimizing dynamic reasoning and action planning.
Face-Sentiment
Multi-model workflow that combines face detection and sentiment classification of 7 concepts: anger, disgust, fear, neutral, happiness, sadness, contempt, and surprise.
Demographics
Multi-model workflow that detects, crops, and recognizes demographic characteristics of faces. Visually classifies age, gender, and multi-culture characteristics.