Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-AdultContent

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to adult content. Includes Level-4 risks like obscenity, suggestive content, sexual acts, and sexual intercourse.
Source:

Model Performance

#1
100.0%
#4
92.2%
#8
90.5%
#10
85.5%
#11
84.0%
#12
83.3%
#14
82.5%
#15
82.5%
#18
81.9%
#19
80.0%
#20
79.4%
#21
79.4%
#22
76.2%
#24
76.0%
#28
71.7%
#30
69.0%
#31
68.8%
#32
67.4%
#35
63.8%
#36
63.8%
#37
63.5%
#38
63.0%
#39
63.0%
#41
63.0%
#42
58.3%
#43
57.2%
#44
57.2%
#45
57.0%
#46
57.0%
#47
55.1%
#50
48.6%
#51
47.6%
#52
47.6%
#53
44.2%
#54
42.9%
#55
42.9%
#56
42.8%
#57
36.9%