Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-NonconsensualNudity

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to non-consensual nudity. Includes Level-4 risks like non-consensual intimate image (NCII) creation or distribution.
Source:

Model Performance

#1
100.0%
#3
100.0%
#4
100.0%
#5
100.0%
#7
93.3%
#11
93.3%
#16
93.3%
#17
93.3%
#19
93.3%
#22
86.7%
#23
86.7%
#24
86.7%
#25
86.7%
#26
86.7%
#27
80.0%
#28
80.0%
#29
80.0%
#30
80.0%
#31
66.7%
#32
66.7%
#33
60.0%
#34
53.3%
#35
53.3%
#36
53.3%
#37
53.3%
#38
46.7%
#39
46.7%