Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

Chatbot Arena Coding

Evaluates the model's coding abilities through head-to-head comparisons in the Chatbot Arena platform, where human judges assess code quality, correctness, and implementation approach.
Source:

Model Performance

#5
94.6%
#6
94.6%
#7
91.9%
#8
91.9%
#9
91.7%
#10
91.7%
#11
91.6%
#12
91.3%
#14
88.6%
#17
86.6%
#18
86.4%
#19
85.1%
#21
84.4%
#22
84.4%
#23
83.9%
#26
83.4%
#29
83.1%
#31
82.9%
#32
82.6%
#33
81.6%
#34
81.4%
#35
80.9%
#38
79.9%
#40
77.9%
#41
77.3%
#43
76.6%
#44
73.0%
#45
72.4%
#46
67.0%
#47
61.7%
#49
52.0%
#50
39.6%