Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

Chatbot Arena AAII

Evaluates model performance on the AAII (Artificial Analysis Intelligence Index), an aggregate intelligence benchmark that combines results from multiple challenging evaluations to produce a single composite score. Displayed on the Chatbot Arena leaderboard alongside Arena Elo ratings.
Source:

Model Performance

#4
72.0%
#5
71.0%
#6
70.0%
#8
70.0%
#9
69.0%
#10
68.0%
#11
68.0%
#12
66.0%
#13
66.0%
#14
66.0%
#17
65.0%
#18
65.0%
#23
63.0%
#25
62.0%
#26
62.0%
#27
62.0%
#29
60.0%
#31
59.0%
#34
56.0%
#35
56.0%
#38
52.0%
#39
52.0%
#41
50.0%
#42
50.0%
#43
49.0%
#44
48.0%
#45
47.0%
#46
47.0%
#47
45.0%
#48
45.0%
#50
44.0%
#51
44.0%
#52
44.0%
#53
42.0%
#54
42.0%
#56
41.0%
#58
40.0%
#60
40.0%
#61
40.0%
#63
37.0%
#64
36.0%
#65
36.0%
#66
35.0%
#67
32.0%
#68
31.0%
#69
30.0%
#70
28.0%
#71
26.0%
#72
25.0%
#73
22.0%
#74
21.0%
#75
14.0%
#76
9.0%
#78
6.0%
#79
6.0%