Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LegalBench

A comprehensive legal reasoning benchmark that evaluates models across six fundamental categories of legal analysis: issue-spotting, rule-recall, rule-conclusion, rule-application, interpretation, and rhetorical understanding. This crowd-sourced collection of legal reasoning tasks provides a thorough assessment of a model's ability to handle complex legal reasoning scenarios, from identifying relevant legal principles to analyzing their application and understanding legal argumentation.
Source:

Model Performance

#2
86.9%
#3
86.0%
#4
86.0%
#7
83.6%
#9
83.5%
#10
83.4%
#11
83.4%
#13
82.8%
#14
82.8%
#17
82.1%
#18
82.0%
#19
82.0%
#20
82.0%
#21
81.9%
#23
81.7%
#25
81.7%
#26
81.5%
#27
81.4%
#31
80.5%
#32
80.3%
#34
80.1%
#36
79.7%
#38
79.2%
#39
79.2%
#40
79.1%
#43
79.0%
#45
78.8%
#46
78.7%
#47
78.7%
#48
78.1%
#49
78.0%
#50
77.9%
#51
77.6%
#52
77.5%
#54
76.2%
#56
75.9%
#57
73.0%
#58
72.3%
#59
71.6%
#62
70.5%
#63
69.9%
#64
66.0%
#65
64.8%
#66
61.3%
#67
59.4%
#68
51.6%
#69
49.4%
#71
49.3%