Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

MMLU Pro

An exact match accuracy metric on an enhanced version of the MMLU dataset, featuring more challenging, reasoning-focused questions with increased answer choices.
Source:

Model Performance

#2
88.3%
#4
87.3%
#5
87.1%
#10
85.7%
#11
85.6%
#12
85.0%
#13
85.0%
#15
84.6%
#16
84.4%
#17
84.2%
#18
84.0%
#21
83.7%
#22
83.7%
#24
82.8%
#26
82.3%
#29
81.7%
#30
81.7%
#31
81.3%
#32
81.2%
#34
80.6%
#35
80.6%
#37
80.4%
#38
79.3%
#39
78.9%
#40
78.9%
#41
78.6%
#44
78.3%
#45
78.3%
#46
78.2%
#47
78.2%
#48
77.8%
#49
77.7%
#50
77.7%
#53
77.1%
#55
76.8%
#61
76.0%
#62
75.4%
#64
74.9%
#65
74.6%
#66
72.6%
#67
72.0%
#68
71.6%
#69
71.3%
#70
70.9%
#71
68.2%
#72
67.9%
#73
66.6%
#74
66.3%
#75
66.2%
#76
64.4%
#77
61.3%
#78
60.1%
#79
58.6%
#80
55.6%
#81
53.2%
#83
43.8%
#85
40.2%
#86
37.7%
#87
34.1%
#88
34.0%