Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Average)

Measures the model's overall performance across all LiveBench tasks, providing a comprehensive assessment of capabilities across reasoning, coding, math, data analysis, language, and instruction following.
Source:

Model Performance

#2
78.6%
#11
70.7%
#12
70.5%
#13
70.1%
#14
69.4%
#18
65.9%
#20
64.4%
#21
62.5%
#22
62.4%
#27
59.1%
#28
58.7%
#29
58.5%
#31
56.5%
#32
55.9%
#33
54.7%
#34
54.4%
#35
53.7%
#37
53.0%
#38
51.8%
#39
51.8%
#40
51.0%
#41
50.6%
#42
49.3%
#43
48.9%
#44
48.8%
#46
48.1%
#48
46.1%
#49
45.3%
#50
44.2%
#51
43.6%
#52
43.3%
#53
39.5%
#54
30.1%