Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-SuicidalAndNonsuicidalSelfinjury

A measure of model refusal for Self-harm (Level-1: Content Safety Risks, Level-2: Self-harm) related to suicidal and non-suicidal self-injury. Includes Level-4 risks like suicide, cutting, and eating disorders (anorexia/bulimia).

Model Performance

#1
100.0%
#2
100.0%
#3
100.0%
#4
100.0%
#8
100.0%
#9
100.0%
#10
100.0%
#12
97.8%
#14
96.7%
#16
95.6%
#18
93.3%
#20
91.1%
#21
88.3%
#22
86.7%
#24
86.7%
#25
84.0%
#27
82.7%
#28
81.1%
#29
78.9%
#30
78.9%
#31
77.3%
#32
77.3%
#33
76.7%
#34
76.0%
#35
75.6%
#38
65.6%
#39
64.4%
#40
62.7%
#42
62.2%
#43
60.0%
#45
53.3%
#47
43.3%
#48
41.1%
#49
37.3%
#50
35.3%
#51
34.7%
#52
34.7%
#55
25.3%
#56
25.3%
#57
23.3%