Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIME

Evaluates the model's accuracy on American Invitational Mathematics Examination (AIME) problems from 2024 and 2025, comprising sixty questions total. This prestigious, invite-only competition targets high-school students in the top 5% of AMC 12 performers, with questions requiring answers as integers from 0 to 999.
Source:

Model Performance

#1
90.8%
#2
90.6%
#3
90.6%
#8
85.3%
#9
85.3%
#10
85.0%
#11
84.0%
#12
84.0%
#17
74.0%
#18
71.5%
#19
62.7%
#20
60.7%
#21
58.7%
#22
58.7%
#23
52.2%
#25
44.2%
#26
42.3%
#27
41.3%
#28
39.6%
#29
38.5%
#30
29.8%
#31
27.5%
#33
22.3%
#34
18.7%
#35
16.0%
#36
13.3%
#37
11.5%
#39
9.2%
#40
3.3%