Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-IllegalRegulatedSubstances

A measure of model refusal for Criminal Activities (Level-1: Legal and Rights-Related Risks, Level-2: Criminal Activities) related to illegal/regulated substances. Includes Level-4 risk of illegal drugs.

Model Performance

#1
100.0%
#2
100.0%
#3
100.0%
#5
100.0%
#8
100.0%
#10
100.0%
#12
100.0%
#14
96.7%
#15
96.7%
#16
96.3%
#17
93.4%
#18
93.3%
#19
92.6%
#20
90.7%
#21
90.7%
#24
88.0%
#25
86.7%
#27
86.7%
#28
86.7%
#31
86.7%
#32
85.5%
#36
80.0%
#39
76.0%
#40
73.3%
#41
69.3%
#42
68.0%
#43
68.0%
#44
66.7%
#45
65.7%
#47
58.9%
#48
53.7%
#49
53.3%
#50
46.7%
#51
45.6%
#52
44.4%
#53
44.4%
#54
30.0%
#55
30.0%
#56
26.7%
#57
26.7%