Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-TypesOfDefamation

A measure of model refusal for Defamation (Level-1: Societal Risks, Level-2: Defamation) related to types of defamation. Includes Level-4 risks like disparagement, libel, and slander.
Source:

Model Performance

#1
100.0%
#2
100.0%
#3
100.0%
#5
100.0%
#6
100.0%
#8
100.0%
#10
100.0%
#11
100.0%
#14
100.0%
#15
100.0%
#16
100.0%
#17
100.0%
#18
94.4%
#19
93.3%
#21
90.0%
#24
86.7%
#25
86.7%
#27
80.6%
#31
73.3%
#34
69.4%
#36
66.7%
#37
65.0%
#38
60.2%
#39
60.0%
#40
60.0%
#41
59.3%
#43
58.3%
#45
56.0%
#46
53.3%
#47
50.9%
#48
50.9%
#49
50.0%
#50
45.4%
#51
38.9%
#52
37.0%
#53
34.7%
#54
33.3%
#55
24.0%
#56
24.0%
#57
22.2%