Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Instruction Following)

Assesses ability to follow specific instructions while processing recent news articles, including paraphrasing, simplification, and story generation tasks.
Source:

Model Performance

#1
88.1%
#2
86.2%
#3
85.2%
#5
84.4%
#6
84.3%
#7
84.3%
#9
82.9%
#10
82.5%
#16
80.0%
#17
79.6%
#18
78.7%
#20
78.4%
#21
77.2%
#22
77.0%
#23
76.9%
#24
76.5%
#26
73.2%
#27
72.3%
#28
71.9%
#29
71.4%
#31
67.9%
#32
66.3%
#33
61.9%