Model Trust Scores Leaderboard

Discover the top-performing AI models across capability, safety, affordability, and speed for your use case. Credo AI's Model Trust Scores is a synthesis of ecosystem benchmarks contextualized by deployment context. Learn more about Credo AI or see more about our methodology.

Scoring Presets

Equally weights the four dimensions of capability, safety, affordability, and speed for a well-rounded evaluation
Prioritizes capability and speed for maximum performance, with reduced emphasis on cost and safety
Heavily emphasizes capability and safety to achieve the best possible outcomes, regardless of cost or speed
Prioritizes safety and security features for enterprise environments where risk mitigation is critical
Balances capability with affordability to maximize value while controlling costs and operational expenses
Ranking models for all industries
Model Overall Score
🥇 OpenAI-O3-medium
0.774
OpenAI-O3-medium
77.4%
Capability
67.0%
Safety
76.0%
Affordability
88.0%
Speed
79.0%
🥈 OpenAI-O3-high
0.760
OpenAI-O3-high
76.0%
Capability
68.0%
Safety
70.0%
Affordability
88.0%
Speed
79.0%
🥉 Gemini-2.5-Flash-Thinking
0.753
Gemini-2.5-Flash-Thinking
75.3%
Capability
59.0%
Safety
56.0%
Affordability
97.0%
Speed
100.0%
Gemini-2.5-Flash
0.744
Gemini-2.5-Flash
74.4%
Capability
59.0%
Safety
62.0%
Affordability
97.0%
Speed
87.0%
GPT-5-nano
0.732
GPT-5-nano
73.2%
Capability
57.0%
Safety
59.0%
Affordability
100.0%
Speed
87.0%
GPT-OSS-120B
0.721
GPT-OSS-120B
72.1%
Capability
54.0%
Safety
81.0%
Affordability
99.0%
Speed
62.0%
OpenAI-O3-mini-medium
0.703
OpenAI-O3-mini-medium
70.3%
Capability
59.0%
Safety
69.0%
Affordability
94.0%
Speed
64.0%
Gemini-2.0-Flash
0.692
Gemini-2.0-Flash
69.2%
Capability
53.0%
Safety
64.0%
Affordability
99.0%
Speed
68.0%
OpenAI-O3-mini-high
0.677
OpenAI-O3-mini-high
67.7%
Capability
59.0%
Safety
65.0%
Affordability
94.0%
Speed
58.0%
Llama-4-Maverick-17B
0.666
Llama-4-Maverick-17B
66.6%
Capability
51.0%
Safety
67.0%
Affordability
99.0%
Speed
58.0%
OpenAI-O4-mini-medium
0.664
OpenAI-O4-mini-medium
66.4%
Capability
64.0%
Safety
72.0%
Affordability
94.0%
Speed
45.0%
Gemini-2.5-Pro-0325
0.664
Gemini-2.5-Pro-0325
66.4%
Capability
66.0%
Safety
67.0%
Affordability
89.0%
Speed
49.0%
Grok-3-Mini-Beta
0.662
Grok-3-Mini-Beta
66.2%
Capability
61.0%
Safety
46.0%
Affordability
99.0%
Speed
70.0%
GPT-5
0.660
GPT-5
66.0%
Capability
68.0%
Safety
66.0%
Affordability
89.0%
Speed
48.0%
Gemini-2.5-Pro-Thinking
0.655
Gemini-2.5-Pro-Thinking
65.5%
Capability
64.0%
Safety
67.0%
Affordability
89.0%
Speed
48.0%
OpenAI-O4-mini-high
0.651
OpenAI-O4-mini-high
65.1%
Capability
65.0%
Safety
66.0%
Affordability
94.0%
Speed
45.0%
GPT-5-mini
0.650
GPT-5-mini
65.0%
Capability
67.0%
Safety
67.0%
Affordability
98.0%
Speed
41.0%
Gemini-2.5-Pro-0605
0.649
Gemini-2.5-Pro-0605
64.9%
Capability
66.0%
Safety
62.0%
Affordability
89.0%
Speed
49.0%
OpenAI-O1-mini
0.645
OpenAI-O1-mini
64.5%
Capability
56.0%
Safety
38.0%
Affordability
94.0%
Speed
86.0%
Llama-2-7B
0.626
Llama-2-7B
62.6%
Capability
52.0%
Safety
66.0%
Affordability
100.0%
Speed
45.0%
GPT-5-nano-Thinking
0.619
GPT-5-nano-Thinking
61.9%
Capability
54.0%
Safety
64.0%
Affordability
100.0%
Speed
43.0%
GPT-4.1
0.614
GPT-4.1
61.4%
Capability
58.0%
Safety
62.0%
Affordability
88.0%
Speed
45.0%
Claude-4.0-Sonnet
0.589
Claude-4.0-Sonnet
58.9%
Capability
60.0%
Safety
80.0%
Affordability
80.0%
Speed
31.0%
Magistral-Medium-3.1
0.586
Magistral-Medium-3.1
58.6%
Capability
50.0%
Safety
53.0%
Affordability
91.0%
Speed
50.0%
Cohere-Command-A
0.584
Cohere-Command-A
58.4%
Capability
47.0%
Safety
53.0%
Affordability
85.0%
Speed
55.0%
GPT-5-mini-Thinking
0.571
GPT-5-mini-Thinking
57.1%
Capability
63.0%
Safety
77.0%
Affordability
98.0%
Speed
22.0%
Claude-3.7-Sonnet-Thinking
0.570
Claude-3.7-Sonnet-Thinking
57.0%
Capability
62.0%
Safety
75.0%
Affordability
80.0%
Speed
29.0%
GPT-3.5-Turbo
0.559
GPT-3.5-Turbo
55.9%
Capability
49.0%
Safety
48.0%
Affordability
98.0%
Speed
42.0%
Claude-3.5-Sonnet-1022
0.552
Claude-3.5-Sonnet-1022
55.2%
Capability
53.0%
Safety
85.0%
Affordability
80.0%
Speed
26.0%
Claude-3.7-Sonnet
0.549
Claude-3.7-Sonnet
54.9%
Capability
57.0%
Safety
75.0%
Affordability
80.0%
Speed
26.0%
Llama-3.3-70B
0.540
Llama-3.3-70B
54.0%
Capability
49.0%
Safety
45.0%
Affordability
98.0%
Speed
40.0%
GPT-5-Thinking
0.535
GPT-5-Thinking
53.5%
Capability
67.0%
Safety
56.0%
Affordability
89.0%
Speed
25.0%
Claude-4.0-Sonnet-Thinking
0.529
Claude-4.0-Sonnet-Thinking
52.9%
Capability
64.0%
Safety
80.0%
Affordability
80.0%
Speed
19.0%
Qwen-3
0.503
Qwen-3
50.3%
Capability
62.0%
Safety
52.0%
Affordability
91.0%
Speed
22.0%
Kimi-K2-Instruct
0.492
Kimi-K2-Instruct
49.2%
Capability
59.0%
Safety
70.0%
Affordability
96.0%
Speed
15.0%
Mistral-Medium-3.1
0.486
Mistral-Medium-3.1
48.6%
Capability
52.0%
Safety
54.0%
Affordability
97.0%
Speed
20.0%
GPT-4-mini
0.485
GPT-4-mini
48.5%
Capability
49.0%
Safety
55.0%
Affordability
99.0%
Speed
21.0%
GPT-4o-0513
0.483
GPT-4o-0513
48.3%
Capability
52.0%
Safety
51.0%
Affordability
75.0%
Speed
28.0%
Claude-3.5-Haiku
0.483
Claude-3.5-Haiku
48.3%
Capability
46.0%
Safety
57.0%
Affordability
95.0%
Speed
22.0%
Gemini-2.0-Pro-0121
0.478
Gemini-2.0-Pro-0121
47.8%
Capability
55.0%
Safety
66.0%
Affordability
100.0%
Speed
15.0%
Qwen-3-Thinking
0.472
Qwen-3-Thinking
47.2%
Capability
60.0%
Safety
52.0%
Affordability
91.0%
Speed
17.0%
Grok-4
0.467
Grok-4
46.7%
Capability
64.0%
Safety
38.0%
Affordability
80.0%
Speed
24.0%
Phi-4
0.448
Phi-4
44.8%
Capability
49.0%
Safety
57.0%
Affordability
99.0%
Speed
15.0%
Grok-3-Beta
0.434
Grok-3-Beta
43.4%
Capability
59.0%
Safety
43.0%
Affordability
80.0%
Speed
17.0%
OpenAI-O1-1217
0.427
OpenAI-O1-1217
42.7%
Capability
60.0%
Safety
72.0%
Affordability
12.0%
Speed
62.0%
Llama-3.1-405B
0.417
Llama-3.1-405B
41.7%
Capability
53.0%
Safety
56.0%
Affordability
89.0%
Speed
12.0%
DeepSeek-V3-0324
0.410
DeepSeek-V3-0324
41.0%
Capability
57.0%
Safety
63.0%
Affordability
98.0%
Speed
8.0%
Grok-4-Thinking
0.405
Grok-4-Thinking
40.5%
Capability
64.0%
Safety
38.0%
Affordability
80.0%
Speed
14.0%
Mistral-Large-2
0.389
Mistral-Large-2
38.9%
Capability
51.0%
Safety
32.0%
Affordability
90.0%
Speed
16.0%
DeepSeek-R1
0.376
DeepSeek-R1
37.6%
Capability
61.0%
Safety
47.0%
Affordability
97.0%
Speed
7.0%
Cohere-Command-R-Plus
0.358
Cohere-Command-R-Plus
35.8%
Capability
42.0%
Safety
24.0%
Affordability
80.0%
Speed
20.0%
Claude-4.1-Opus-Thinking
0.025
Claude-4.1-Opus-Thinking
2.5%
Capability
64.0%
Safety
74.0%
Affordability
0.0%
Speed
9.0%
Claude-3-Opus
0.024
Claude-3-Opus
2.4%
Capability
50.0%
Safety
78.0%
Affordability
0.0%
Speed
9.0%
Claude-4.0-Opus-Thinking
0.024
Claude-4.0-Opus-Thinking
2.4%
Capability
63.0%
Safety
78.0%
Affordability
0.0%
Speed
7.0%
Claude-4.0-Opus
0.024
Claude-4.0-Opus
2.4%
Capability
60.0%
Safety
78.0%
Affordability
0.0%
Speed
7.0%

Acknowledgments

Model Trust Scores is built on the backs of the tireless work of the evaluation ecosystem. We would like to thank: