Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Agentic Coding)

Evaluates the model's ability to operate in a real development environment to solve real-world repository issues.
Source:

Model Performance

#1
43.3%
#2
36.7%
#5
31.7%
#8
28.3%
#9
28.3%
#10
26.7%
#12
25.0%
#14
21.7%
#15
21.7%
#16
20.0%
#17
18.3%
#18
18.3%
#19
15.0%
#21
15.0%
#22
15.0%
#23
13.3%
#25
13.3%
#26
10.0%
#27
10.0%
#28
6.7%
#30
5.0%
#31
5.0%
#34
1.7%