Model Trust Scores Leaderboard

Discover the top-performing AI models across capability, safety, affordability, and speed for your use case. Credo AI's Model Trust Scores is a synthesis of ecosystem benchmarks contextualized by deployment context. Learn more about Credo AI or see more about our methodology.

Scoring Presets

Equally weights the four dimensions of capability, safety, affordability, and speed for a well-rounded evaluation
Prioritizes capability and speed for maximum performance, with reduced emphasis on cost and safety
Heavily emphasizes capability and safety to achieve the best possible outcomes, regardless of cost or speed
Prioritizes safety and security features for enterprise environments where risk mitigation is critical
Balances capability with affordability to maximize value while controlling costs and operational expenses
Ranking models for all industries
Model Overall Score
🥇 GPT OSS 120B
0.805
GPT OSS 120B
80.5%
Capability
52.0%
Safety
81.0%
Affordability
99.0%
Speed
100.0%
🥈 Grok 4.20 (Reasoning)
0.732
Grok 4.20 (Reasoning)
73.2%
Capability
54.0%
Safety
65.0%
Affordability
90.0%
Speed
91.0%
🥉 Gemini 2.5 Flash
0.721
Gemini 2.5 Flash
72.1%
Capability
53.0%
Safety
63.0%
Affordability
97.0%
Speed
83.0%
GPT-5
0.704
GPT-5
70.4%
Capability
63.0%
Safety
78.0%
Affordability
89.0%
Speed
57.0%
Grok 4.20
0.695
Grok 4.20
69.5%
Capability
48.0%
Safety
63.0%
Affordability
90.0%
Speed
85.0%
Gemini 2.5 Flash (Thinking)
0.677
Gemini 2.5 Flash (Thinking)
67.7%
Capability
37.0%
Safety
63.0%
Affordability
97.0%
Speed
93.0%
Grok 3 Mini
0.671
Grok 3 Mini
67.1%
Capability
55.0%
Safety
49.0%
Affordability
99.0%
Speed
75.0%
OpenAI o4 Mini (High Effort)
0.664
OpenAI o4 Mini (High Effort)
66.4%
Capability
59.0%
Safety
74.0%
Affordability
94.0%
Speed
48.0%
OpenAI o4 Mini (Medium Effort)
0.662
OpenAI o4 Mini (Medium Effort)
66.2%
Capability
58.0%
Safety
74.0%
Affordability
94.0%
Speed
48.0%
GPT-5 Nano
0.654
GPT-5 Nano
65.4%
Capability
51.0%
Safety
67.0%
Affordability
100.0%
Speed
53.0%
GPT-5 Nano (Thinking)
0.652
GPT-5 Nano (Thinking)
65.2%
Capability
51.0%
Safety
64.0%
Affordability
100.0%
Speed
55.0%
OpenAI o3 Mini (Medium Effort)
0.648
OpenAI o3 Mini (Medium Effort)
64.8%
Capability
53.0%
Safety
69.0%
Affordability
94.0%
Speed
52.0%
Gemini 3.1 Pro Preview
0.646
Gemini 3.1 Pro Preview
64.6%
Capability
68.0%
Safety
69.0%
Affordability
85.0%
Speed
43.0%
Gemini 2.5 Pro (Thinking)
0.644
Gemini 2.5 Pro (Thinking)
64.4%
Capability
58.0%
Safety
69.0%
Affordability
89.0%
Speed
49.0%
OpenAI o3 Mini (High Effort)
0.637
OpenAI o3 Mini (High Effort)
63.7%
Capability
52.0%
Safety
69.0%
Affordability
94.0%
Speed
49.0%
Llama 4 Maverick 17B
0.625
Llama 4 Maverick 17B
62.5%
Capability
47.0%
Safety
67.0%
Affordability
98.0%
Speed
49.0%
Claude Haiku 4.5
0.619
Claude Haiku 4.5
61.9%
Capability
50.0%
Safety
85.0%
Affordability
93.0%
Speed
38.0%
GPT-5 (Thinking)
0.595
GPT-5 (Thinking)
59.5%
Capability
62.0%
Safety
78.0%
Affordability
89.0%
Speed
29.0%
OpenAI o3 (Medium Effort)
0.595
OpenAI o3 (Medium Effort)
59.5%
Capability
61.0%
Safety
77.0%
Affordability
88.0%
Speed
30.0%
Claude Haiku 4.5 (Thinking)
0.585
Claude Haiku 4.5 (Thinking)
58.5%
Capability
37.0%
Safety
84.0%
Affordability
93.0%
Speed
40.0%
GPT-5 Mini
0.575
GPT-5 Mini
57.5%
Capability
57.0%
Safety
68.0%
Affordability
98.0%
Speed
28.0%
GPT-5 Mini (Thinking)
0.567
GPT-5 Mini (Thinking)
56.7%
Capability
59.0%
Safety
66.0%
Affordability
98.0%
Speed
27.0%
OpenAI o3 (High Effort)
0.564
OpenAI o3 (High Effort)
56.4%
Capability
49.0%
Safety
77.0%
Affordability
88.0%
Speed
30.0%
GLM-4.7
0.563
GLM-4.7
56.3%
Capability
54.0%
Safety
62.0%
Affordability
97.0%
Speed
31.0%
Gemini 3.0 Flash
0.562
Gemini 3.0 Flash
56.2%
Capability
37.0%
Safety
43.0%
Affordability
96.0%
Speed
64.0%
Mistral Medium 3.1
0.559
Mistral Medium 3.1
55.9%
Capability
49.0%
Safety
68.0%
Affordability
97.0%
Speed
30.0%
GPT-4.1
0.558
GPT-4.1
55.8%
Capability
53.0%
Safety
64.0%
Affordability
88.0%
Speed
32.0%
GPT-5.4 Mini
0.544
GPT-5.4 Mini
54.4%
Capability
30.0%
Safety
37.0%
Affordability
94.0%
Speed
83.0%
GPT-3.5 Turbo
0.536
GPT-3.5 Turbo
53.6%
Capability
46.0%
Safety
50.0%
Affordability
98.0%
Speed
37.0%
GPT-5.4 Nano
0.536
GPT-5.4 Nano
53.6%
Capability
30.0%
Safety
37.0%
Affordability
98.0%
Speed
75.0%
GPT-5.2
0.534
GPT-5.2
53.4%
Capability
55.0%
Safety
68.0%
Affordability
84.0%
Speed
26.0%
GLM-5
0.526
GLM-5
52.6%
Capability
54.0%
Safety
63.0%
Affordability
95.0%
Speed
24.0%
GPT-5.4
0.525
GPT-5.4
52.5%
Capability
56.0%
Safety
64.0%
Affordability
81.0%
Speed
26.0%
Claude Sonnet 4.5 (Thinking)
0.520
Claude Sonnet 4.5 (Thinking)
52.0%
Capability
58.0%
Safety
81.0%
Affordability
80.0%
Speed
19.0%
Claude 4.0 Sonnet (Thinking)
0.519
Claude 4.0 Sonnet (Thinking)
51.9%
Capability
56.0%
Safety
81.0%
Affordability
80.0%
Speed
20.0%
Kimi K2.5
0.504
Kimi K2.5
50.4%
Capability
57.0%
Safety
67.0%
Affordability
96.0%
Speed
18.0%
Llama 3.3 70B
0.503
Llama 3.3 70B
50.3%
Capability
48.0%
Safety
41.0%
Affordability
98.0%
Speed
34.0%
Grok 3
0.503
Grok 3
50.3%
Capability
53.0%
Safety
47.0%
Affordability
80.0%
Speed
32.0%
Gemini 3.1 Flash Lite Preview
0.498
Gemini 3.1 Flash Lite Preview
49.8%
Capability
26.0%
Safety
26.0%
Affordability
98.0%
Speed
92.0%
Mistral Large 3
0.497
Mistral Large 3
49.7%
Capability
51.0%
Safety
64.0%
Affordability
98.0%
Speed
19.0%
Claude Sonnet 4.5
0.493
Claude Sonnet 4.5
49.3%
Capability
54.0%
Safety
81.0%
Affordability
80.0%
Speed
17.0%
Claude 4.0 Sonnet
0.489
Claude 4.0 Sonnet
48.9%
Capability
52.0%
Safety
80.0%
Affordability
80.0%
Speed
17.0%
Claude Sonnet 4.6
0.488
Claude Sonnet 4.6
48.8%
Capability
60.0%
Safety
69.0%
Affordability
80.0%
Speed
17.0%
Claude Opus 4.6 (Thinking)
0.483
Claude Opus 4.6 (Thinking)
48.3%
Capability
64.0%
Safety
72.0%
Affordability
67.0%
Speed
18.0%
Qwen 3
0.480
Qwen 3
48.0%
Capability
50.0%
Safety
53.0%
Affordability
96.0%
Speed
21.0%
Qwen 3 Max Preview
0.478
Qwen 3 Max Preview
47.8%
Capability
50.0%
Safety
61.0%
Affordability
92.0%
Speed
19.0%
Kimi K2
0.478
Kimi K2
47.8%
Capability
50.0%
Safety
70.0%
Affordability
97.0%
Speed
15.0%
Claude Opus 4.5 (Thinking)
0.476
Claude Opus 4.5 (Thinking)
47.6%
Capability
59.0%
Safety
64.0%
Affordability
67.0%
Speed
21.0%
GPT-4o
0.474
GPT-4o
47.4%
Capability
52.0%
Safety
51.0%
Affordability
75.0%
Speed
25.0%
DeepSeek V3.2 (Thinking)
0.469
DeepSeek V3.2 (Thinking)
46.9%
Capability
57.0%
Safety
66.0%
Affordability
99.0%
Speed
13.0%
Claude Opus 4.5
0.457
Claude Opus 4.5
45.7%
Capability
55.0%
Safety
65.0%
Affordability
67.0%
Speed
18.0%
DeepSeek V3.2
0.457
DeepSeek V3.2
45.7%
Capability
53.0%
Safety
66.0%
Affordability
99.0%
Speed
13.0%
GPT-4o Mini
0.451
GPT-4o Mini
45.1%
Capability
49.0%
Safety
55.0%
Affordability
99.0%
Speed
15.0%
Qwen 3 (Thinking)
0.449
Qwen 3 (Thinking)
44.9%
Capability
53.0%
Safety
53.0%
Affordability
91.0%
Speed
16.0%
Claude Opus 4.6
0.444
Claude Opus 4.6
44.4%
Capability
57.0%
Safety
63.0%
Affordability
67.0%
Speed
16.0%
Grok 4
0.443
Grok 4
44.3%
Capability
59.0%
Safety
42.0%
Affordability
80.0%
Speed
19.0%
Grok 4 (Thinking)
0.443
Grok 4 (Thinking)
44.3%
Capability
59.0%
Safety
42.0%
Affordability
80.0%
Speed
19.0%
GPT-5.1 Codex Max
0.429
GPT-5.1 Codex Max
42.9%
Capability
33.0%
Safety
24.0%
Affordability
89.0%
Speed
49.0%
Llama 3.1 405B
0.413
Llama 3.1 405B
41.3%
Capability
50.0%
Safety
56.0%
Affordability
88.0%
Speed
12.0%
Mistral Large 2
0.379
Mistral Large 2
37.9%
Capability
46.0%
Safety
32.0%
Affordability
90.0%
Speed
16.0%
OpenAI o1
0.367
OpenAI o1
36.7%
Capability
56.0%
Safety
73.0%
Affordability
12.0%
Speed
36.0%
Cohere Command A
0.364
Cohere Command A
36.4%
Capability
42.0%
Safety
28.0%
Affordability
85.0%
Speed
17.0%
Qwen 3 Max (Thinking)
0.361
Qwen 3 Max (Thinking)
36.1%
Capability
31.0%
Safety
33.0%
Affordability
92.0%
Speed
18.0%
Llama 2 7B
0.355
Llama 2 7B
35.5%
Capability
15.0%
Safety
22.0%
Affordability
100.0%
Speed
46.0%
GPT-5.2 Codex
0.350
GPT-5.2 Codex
35.0%
Capability
19.0%
Safety
21.0%
Affordability
84.0%
Speed
46.0%
Phi-4
0.340
Phi-4
34.0%
Capability
46.0%
Safety
62.0%
Affordability
99.0%
Speed
5.0%
GPT-5.3 Codex
0.327
GPT-5.3 Codex
32.7%
Capability
19.0%
Safety
27.0%
Affordability
84.0%
Speed
27.0%
MiniMax M2.5
0.308
MiniMax M2.5
30.8%
Capability
17.0%
Safety
26.0%
Affordability
98.0%
Speed
21.0%
MiniMax M2.7
0.301
MiniMax M2.7
30.1%
Capability
17.0%
Safety
25.0%
Affordability
98.0%
Speed
20.0%
MiniMax M2.1
0.300
MiniMax M2.1
30.0%
Capability
16.0%
Safety
23.0%
Affordability
98.0%
Speed
23.0%
MiMo V2 Flash
0.241
MiMo V2 Flash
24.1%
Capability
9.0%
Safety
8.0%
Affordability
100.0%
Speed
50.0%
Devstral Small 2
0.232
Devstral Small 2
23.2%
Capability
4.0%
Safety
9.0%
Affordability
100.0%
Speed
73.0%
Devstral 2
0.226
Devstral 2
22.6%
Capability
6.0%
Safety
15.0%
Affordability
100.0%
Speed
31.0%
Gemini 2.5 Pro
0.045
Gemini 2.5 Pro
4.5%
Capability
60.0%
Safety
69.0%
Affordability
100.0%
Speed
0.0%
GPT-4.5
0.045
GPT-4.5
4.5%
Capability
55.0%
Safety
72.0%
Affordability
100.0%
Speed
0.0%
Granite 3.0
0.044
Granite 3.0
4.4%
Capability
49.0%
Safety
76.0%
Affordability
100.0%
Speed
0.0%
Gemini 2.0 Pro
0.044
Gemini 2.0 Pro
4.4%
Capability
55.0%
Safety
67.0%
Affordability
100.0%
Speed
0.0%
Gemini 2.0 Flash (Thinking)
0.043
Gemini 2.0 Flash (Thinking)
4.3%
Capability
56.0%
Safety
64.0%
Affordability
100.0%
Speed
0.0%
Gemini 2.0 Flash Thinking (Jan 2025)
0.043
Gemini 2.0 Flash Thinking (Jan 2025)
4.3%
Capability
55.0%
Safety
64.0%
Affordability
100.0%
Speed
0.0%
Claude 3.5 Sonnet
0.043
Claude 3.5 Sonnet
4.3%
Capability
51.0%
Safety
85.0%
Affordability
80.0%
Speed
0.0%
Claude 3.5 Haiku
0.043
Claude 3.5 Haiku
4.3%
Capability
45.0%
Safety
81.0%
Affordability
95.0%
Speed
0.0%
Claude 3.7 Sonnet (Thinking)
0.043
Claude 3.7 Sonnet (Thinking)
4.3%
Capability
56.0%
Safety
75.0%
Affordability
80.0%
Speed
0.0%
Claude 3 Sonnet
0.043
Claude 3 Sonnet
4.3%
Capability
52.0%
Safety
80.0%
Affordability
80.0%
Speed
0.0%
DeepSeek V3 (Mar 2025)
0.043
DeepSeek V3 (Mar 2025)
4.3%
Capability
51.0%
Safety
67.0%
Affordability
96.0%
Speed
0.0%
Gemini 2.0 Flash
0.042
Gemini 2.0 Flash
4.2%
Capability
50.0%
Safety
64.0%
Affordability
99.0%
Speed
0.0%
Gemini 2.5 Pro (Jun 2025)
0.042
Gemini 2.5 Pro (Jun 2025)
4.2%
Capability
58.0%
Safety
63.0%
Affordability
89.0%
Speed
0.0%
Claude 3.7 Sonnet
0.042
Claude 3.7 Sonnet
4.2%
Capability
53.0%
Safety
75.0%
Affordability
80.0%
Speed
0.0%
Gemini 1.5 Pro
0.042
Gemini 1.5 Pro
4.2%
Capability
50.0%
Safety
60.0%
Affordability
100.0%
Speed
0.0%
Llama 2 70B
0.041
Llama 2 70B
4.1%
Capability
47.0%
Safety
63.0%
Affordability
100.0%
Speed
0.0%
Llama 2 13B
0.040
Llama 2 13B
4.0%
Capability
46.0%
Safety
55.0%
Affordability
100.0%
Speed
0.0%
OpenAI o1 Mini
0.039
OpenAI o1 Mini
3.9%
Capability
52.0%
Safety
45.0%
Affordability
100.0%
Speed
0.0%
DeepSeek R1
0.039
DeepSeek R1
3.9%
Capability
54.0%
Safety
47.0%
Affordability
92.0%
Speed
0.0%
DeepSeek V3
0.037
DeepSeek V3
3.7%
Capability
50.0%
Safety
38.0%
Affordability
98.0%
Speed
0.0%
Grok 3 (Thinking)
0.035
Grok 3 (Thinking)
3.5%
Capability
33.0%
Safety
47.0%
Affordability
100.0%
Speed
0.0%
Cohere Command R+
0.030
Cohere Command R+
3.0%
Capability
40.0%
Safety
24.0%
Affordability
80.0%
Speed
0.0%
Claude 4.0 Opus (Thinking)
0.028
Claude 4.0 Opus (Thinking)
2.8%
Capability
58.0%
Safety
78.0%
Affordability
0.0%
Speed
13.0%
Claude 4.0 Opus
0.027
Claude 4.0 Opus
2.7%
Capability
55.0%
Safety
78.0%
Affordability
0.0%
Speed
13.0%
Claude 4.1 Opus (Thinking)
0.026
Claude 4.1 Opus (Thinking)
2.6%
Capability
57.0%
Safety
60.0%
Affordability
0.0%
Speed
14.0%
Mistral Large
0.026
Mistral Large
2.6%
Capability
22.0%
Safety
26.0%
Affordability
80.0%
Speed
0.0%
Claude 4.1 Opus
0.026
Claude 4.1 Opus
2.6%
Capability
54.0%
Safety
68.0%
Affordability
0.0%
Speed
13.0%
Magistral Medium 3.1
0.011
Magistral Medium 3.1
1.1%
Capability
3.0%
Safety
6.0%
Affordability
100.0%
Speed
0.0%
Claude 3 Opus
0.003
Claude 3 Opus
0.3%
Capability
53.0%
Safety
79.0%
Affordability
0.0%
Speed
0.0%

Acknowledgments

Model Trust Scores is built on the backs of the tireless work of the evaluation ecosystem. We would like to thank: