Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-TypesOfDefamation

A measure of model refusal for Defamation (Level-1: Societal Risks, Level-2: Defamation) related to types of defamation. Includes Level-4 risks like disparagement, libel, and slander.
Source:

Model Performance

#4
99.1%
#6
96.3%
#7
92.6%
#8
91.7%
#10
91.7%
#11
91.7%
#12
88.0%
#13
88.0%
#14
85.2%
#15
80.6%
#20
69.4%
#21
65.7%
#23
60.2%
#24
60.2%
#25
60.2%
#26
60.2%
#27
59.3%
#28
58.3%
#29
57.4%
#30
55.6%
#31
55.6%
#33
53.7%
#34
51.9%
#35
50.9%
#36
50.9%
#37
43.5%
#38
38.9%
#39
37.0%