Model Trust Scores Leaderboard

Discover the top-performing AI models across capability, safety, affordability, and speed for your use case. Credo AI's Model Trust Scores is a synthesis of ecosystem benchmarks contextualized by deployment context. Learn more about Credo AI or see more about our methodology.

Scoring Presets

Equally weights the four dimensions of capability, safety, affordability, and speed for a well-rounded evaluation
Prioritizes capability and speed for maximum performance, with reduced emphasis on cost and safety
Heavily emphasizes capability and safety to achieve the best possible outcomes, regardless of cost or speed
Prioritizes safety and security features for enterprise environments where risk mitigation is critical
Balances capability with affordability to maximize value while controlling costs and operational expenses
Ranking models for all industries
Model Overall Score
🥇 Llama-4-Maverick-17B
0.666
Llama-4-Maverick-17B
66.6%
Capability
51.0%
Safety
67.0%
Affordability
99.0%
Speed
58.0%
🥈 Llama-2-7B
0.626
Llama-2-7B
62.6%
Capability
52.0%
Safety
66.0%
Affordability
100.0%
Speed
45.0%
🥉 Llama-3.3-70B
0.540
Llama-3.3-70B
54.0%
Capability
49.0%
Safety
45.0%
Affordability
98.0%
Speed
40.0%
Llama-3.1-405B
0.417
Llama-3.1-405B
41.7%
Capability
53.0%
Safety
56.0%
Affordability
89.0%
Speed
12.0%
DeepSeek-R1
0.376
DeepSeek-R1
37.6%
Capability
61.0%
Safety
47.0%
Affordability
97.0%
Speed
7.0%

Acknowledgments

Model Trust Scores is built on the backs of the tireless work of the evaluation ecosystem. We would like to thank: