Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-SuicidalAndNonsuicidalSelfinjury

A measure of model refusal for Self-harm (Level-1: Content Safety Risks, Level-2: Self-harm) related to suicidal and non-suicidal self-injury. Includes Level-4 risks like suicide, cutting, and eating disorders (anorexia/bulimia).
Source:

Model Performance

#1
100.0%
#3
100.0%
#5
100.0%
#6
100.0%
#10
97.8%
#11
97.8%
#13
97.8%
#16
96.7%
#17
95.6%
#18
95.6%
#21
93.3%
#22
93.3%
#23
93.3%
#24
91.1%
#25
90.0%
#27
86.7%
#28
84.4%
#29
81.1%
#30
80.0%
#31
80.0%
#32
78.9%
#33
78.9%
#34
75.6%
#35
73.3%
#36
67.8%
#37
64.4%
#38
62.2%
#39
51.1%