Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

LegalBench

A comprehensive legal reasoning benchmark that evaluates models across six fundamental categories of legal analysis: issue-spotting, rule-recall, rule-conclusion, rule-application, interpretation, and rhetorical understanding. This crowd-sourced collection of legal reasoning tasks provides a thorough assessment of a model's ability to handle complex legal reasoning scenarios, from identifying relevant legal principles to analyzing their application and understanding legal argumentation.
Source:

Model Performance

#3
83.4%
#4
83.4%
#5
82.8%
#6
82.5%
#7
82.5%
#8
82.0%
#9
82.0%
#10
82.0%
#11
81.9%
#12
81.9%
#14
81.7%
#15
81.7%
#16
81.5%
#18
80.5%
#19
80.1%
#20
79.9%
#21
79.7%
#23
79.2%
#24
79.2%
#27
79.0%
#28
78.9%
#30
78.7%
#31
78.7%
#32
78.1%
#33
78.0%
#34
77.6%
#35
77.5%
#37
76.2%
#38
73.0%
#39
72.3%
#40
71.6%
#44
69.9%
#45
64.8%
#46
61.3%
#47
59.4%
#48
51.6%
#49
49.3%