Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Language)

Evaluates language comprehension through tasks including Connections word puzzles, typo removal, and movie synopsis unscrambling from recent sources.
Source:

Model Performance

#1
80.8%
#2
76.1%
#3
76.0%
#6
73.5%
#9
68.8%
#11
67.2%
#13
64.8%
#14
64.8%
#15
63.9%
#16
63.2%
#18
59.8%
#19
59.1%
#20
57.0%
#21
55.1%
#22
54.5%
#25
51.0%
#27
49.4%
#28
44.7%
#29
43.1%
#30
40.5%
#31
39.7%
#32
36.7%
#34
30.7%