Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

Chatbot Arena (Win Rate)

Measures the model's performance in head-to-head comparisons with other models in the Chatbot Arena, where human judges evaluate the quality of responses.
Source:

Model Performance

#3
99.4%
#5
98.6%
#7
95.0%
#8
94.9%
#9
94.6%
#11
94.3%
#12
94.3%
#14
93.1%
#15
93.0%
#16
92.1%
#17
92.0%
#18
91.7%
#19
91.7%
#21
90.1%
#22
89.7%
#25
89.0%
#27
88.6%
#28
88.4%
#29
88.3%
#30
88.0%
#31
88.0%
#32
87.9%
#33
87.6%
#34
87.4%
#35
85.4%
#38
83.0%
#39
82.9%
#40
82.6%
#42
82.4%
#45
82.1%
#46
81.9%
#47
81.4%
#48
81.3%
#49
81.1%
#50
80.9%
#51
80.9%
#52
80.9%
#56
77.1%
#58
76.9%
#59
76.4%
#60
76.1%
#62
75.3%
#63
74.4%
#64
74.3%
#66
74.0%
#68
71.7%
#69
71.6%
#70
71.3%
#72
67.0%
#73
66.4%
#74
65.1%
#75
61.9%
#76
60.4%
#77
60.3%
#78
49.3%
#79
46.0%
#80
44.0%
#81
41.9%