Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Agentic Coding)

Evaluates the model's ability to operate in a real development environment to solve real-world repository issues.
Source:

Model Performance

#3
63.3%
#8
53.3%
#9
51.7%
#10
48.3%
#13
46.7%
#14
43.3%
#17
40.0%
#19
38.3%
#21
33.3%
#24
31.7%
#25
31.7%
#28
28.3%
#29
26.7%
#34
21.7%
#35
21.7%
#36
18.3%
#37
18.3%
#38
16.7%
#39
15.0%
#40
15.0%
#41
15.0%
#42
15.0%
#43
13.3%
#44
13.3%
#45
13.3%
#46
13.3%
#47
6.7%
#49
5.0%
#50
5.0%
#51
3.3%
#54
1.7%