Model Explorer

Explore benchmark performance of various AI models

Models

Claude 3 Opus
Claude 3 Sonnet
Claude 3.5 Haiku
Claude 3.5 Sonnet
Claude 3.7 Sonnet
Claude 3.7 Sonnet (Thinking)
Claude 4.0 Opus
Claude 4.0 Opus (Thinking)
Claude 4.0 Sonnet
Claude 4.0 Sonnet (Thinking)
Claude 4.1 Opus
Claude 4.1 Opus (Thinking)
Claude Haiku 4.5
Claude Haiku 4.5 (Thinking)
Claude Opus 4.5
Claude Opus 4.5 (Thinking)
Claude Opus 4.6
Claude Opus 4.6 (Thinking)
Claude Sonnet 4.5
Claude Sonnet 4.5 (Thinking)
Claude Sonnet 4.6
Cohere Command A
Cohere Command R+
DeepSeek R1
DeepSeek V3
DeepSeek V3 (Mar 2025)
DeepSeek V3.2
DeepSeek V3.2 (Thinking)
Devstral 2
Devstral Small 2
GLM-4.7
GLM-5
GPT-3.5 Turbo
GPT-4o Mini
GPT-4.1
GPT-4.5
GPT-4o
GPT-5
GPT-5 (Thinking)
GPT-5 Mini
GPT-5 Mini (Thinking)
GPT-5 Nano
GPT-5 Nano (Thinking)
GPT-5.1 Codex Max
GPT-5.2
GPT-5.2 Codex
GPT-5.3 Codex
GPT-5.4
GPT-5.4 Mini
GPT-5.4 Nano
GPT OSS 120B
Gemini 1.5 Pro
Gemini 2.0 Flash
Gemini 2.0 Flash (Thinking)
Gemini 2.0 Flash Thinking (Jan 2025)
Gemini 2.0 Pro
Gemini 2.5 Flash
Gemini 2.5 Flash (Thinking)
Gemini 2.5 Pro
Gemini 2.5 Pro (Jun 2025)
Gemini 2.5 Pro (Thinking)
Gemini 3.0 Flash
Gemini 3.1 Flash Lite Preview
Gemini 3.1 Pro Preview
Granite 3.0
Grok 3
Grok 3 Mini
Grok 3 (Thinking)
Grok 4
Grok 4 (Thinking)
Grok 4.20
Grok 4.20 (Reasoning)
Kimi K2
Kimi K2.5
Llama 2 13B
Llama 2 70B
Llama 2 7B
Llama 3.1 405B
Llama 3.3 70B
Llama 4 Maverick 17B
Magistral Medium 3.1
MiMo V2 Flash
MiniMax M2.1
MiniMax M2.5
MiniMax M2.7
Mistral Large
Mistral Large 2
Mistral Large 3
Mistral Medium 3.1
OpenAI o1
OpenAI o1 Mini
OpenAI o3 (High Effort)
OpenAI o3 (Medium Effort)
OpenAI o3 Mini (High Effort)
OpenAI o3 Mini (Medium Effort)
OpenAI o4 Mini (High Effort)
OpenAI o4 Mini (Medium Effort)
Phi-4
Qwen 3
Qwen 3 Max Preview
Qwen 3 Max (Thinking)
Qwen 3 (Thinking)

OpenAI o4 Mini (Medium Effort)

OpenAI's O4 mini model with medium compute resources
Released: 2025-05-15
Balanced Benchmark Score: 65.3%
⚡ Speed: 122.0 tokens/second
💰 Cost: $1.93 per 1M tokens

Performance by Benchmark

Safety Benchmarks

Capability & Safety Benchmarks

96.0%