Credo AI Labs creates cutting-edge approaches to AI governance through experimental applications that showcase innovative AI governance concepts, risk assessment methodologies, and compliance frameworks. Labs is the innovation arm of Credo AI, the leading enterprise AI governance platform—enabling generative AI, AI agents, and third-party AI vendors with proven governance, risk management, and compliance.
We aggregate benchmark data from leading evaluation providers to create a comprehensive model × benchmark matrix. This includes data from vals.ai, AIR-Bench, Artificial Analysis, LiveBench, and other trusted sources, covering over 90 models and dozens of benchmarks.
Our system uses advanced AI to determine how relevant each benchmark is to your specific use case. Each benchmark receives separate relevance scores for capability and safety dimensions, ensuring context-aware evaluation.
Missing benchmark data is intelligently estimated using statistical models. This allows us to provide early insights on new models before complete third-party evaluations are available, with clear confidence indicators.
Trust scores are calculated using relevance-weighted aggregation, and the overall score are calculated with geometric mean, preventing any single dimension from dominating. Weighting over the key dimensions are customizable.
Evaluations are tailored to specific use cases across industries including Healthcare, Finance, Legal, Manufacturing, and more. Each use case has unique relevance weights that reflect its specific requirements.
Every score includes a confidence indicator based on data completeness and relevance. This transparency helps you understand the reliability of each evaluation and make informed decisions.
For a deeper dive into our methodology and latest research insights, check out our blog post.
If you have feedback, please send to Ian Eisenberg at ian@credo.ai