Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

SimpleBench

A multiple-choice text benchmark designed to test basic reasoning capabilities where non-specialized humans (high school level) consistently outperform state-of-the-art language models, covering spatio-temporal reasoning, social intelligence, and linguistic adversarial robustness.
Source:

Model Performance