About Credo AI Labs

Credo AI Labs creates cutting-edge approaches to AI governance through experimental applications that showcase innovative AI governance concepts, risk assessment methodologies, and compliance frameworks. Labs is the innovation arm of Credo AI, the leading enterprise AI governance platform—enabling generative AI, AI agents, and third-party AI vendors with proven governance, risk management, and compliance.

Four Key Dimensions

🎯

Capability

Task performance, accuracy, and effectiveness. Measures how well the model performs its intended functions across various benchmarks.

🛡️

Safety

Risk mitigation, ethical alignment, and bias prevention. Evaluates the model's ability to operate safely and responsibly.

💰

Affordability

Cost per million tokens. Direct operational cost metrics from third-party providers to help optimize your AI spending.

⚡

Speed

Processing throughput in tokens per second. Real-world performance metrics to ensure your applications meet latency requirements.

Model Trust Scores Methodology

📊 Multi-Source Data Integration

We aggregate benchmark data from leading evaluation providers to create a comprehensive model × benchmark matrix. This includes data from vals.ai, AIR-Bench, Artificial Analysis, LiveBench, and other trusted sources, covering over 90 models and dozens of benchmarks.

🤖 AI-Powered Relevance Scoring

Our system uses advanced AI to determine how relevant each benchmark is to your specific use case. Each benchmark receives separate relevance scores for capability and safety dimensions, ensuring context-aware evaluation.

📈 Smart Imputation

Missing benchmark data is intelligently estimated using statistical models. This allows us to provide early insights on new models before complete third-party evaluations are available, with clear confidence indicators.

⚖️ Weighted Aggregation

Trust scores are calculated using relevance-weighted aggregation, and the overall score are calculated with geometric mean, preventing any single dimension from dominating. Weighting over the key dimensions are customizable.

🎯 Use Case Specific

Evaluations are tailored to specific use cases across industries including Healthcare, Finance, Legal, Manufacturing, and more. Each use case has unique relevance weights that reflect its specific requirements.

🔍 Confidence Scoring

Every score includes a confidence indicator based on data completeness and relevance. This transparency helps you understand the reliability of each evaluation and make informed decisions.

For a deeper dive into our methodology and latest research insights, check out our blog post.

Data Sources

Model Trust Scores is built on the backs of the tireless work of the evaluation ecosystem.

Have Feedback?

If you have feedback, please send to Ian Eisenberg at ian@credo.ai