Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LiveBench (Data Analysis)

Measures performance on data analysis tasks using recent datasets from Kaggle and Socrata, including table reformatting, join prediction, and column type annotation.
Source:

Model Performance

#1
71.9%
#2
71.6%
#4
71.5%
#7
69.8%
#12
68.3%
#13
68.2%
#14
67.0%
#15
66.5%
#16
66.5%
#17
66.4%
#19
65.4%
#20
64.7%
#21
64.6%
#22
64.4%
#23
63.4%
#24
60.2%
#25
60.1%
#26
60.0%
#27
56.8%
#29
54.7%
#30
54.2%
#31
54.1%
#33
48.5%