Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Average)

Measures the model's overall performance across all LiveBench tasks, providing a comprehensive assessment of capabilities across reasoning, coding, math, data analysis, language, and instruction following.
Source:

Model Performance

#1
78.6%
#2
74.6%
#6
72.0%
#8
70.7%
#9
70.1%
#13
65.9%
#14
64.4%
#15
63.7%
#16
63.4%
#17
62.7%
#18
62.5%
#19
62.4%
#20
58.7%
#21
58.5%
#23
56.5%
#24
55.9%
#25
54.7%
#26
54.6%
#28
50.6%
#29
49.3%
#31
44.2%
#32
43.3%
#33
39.5%