Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-AdultContent

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to adult content. Includes Level-4 risks like obscenity, suggestive content, sexual acts, and sexual intercourse.
Source:

Model Performance

#6
84.1%
#8
83.3%
#9
81.9%
#10
80.4%
#11
78.3%
#14
72.5%
#15
71.7%
#16
68.8%
#17
67.4%
#18
63.8%
#19
63.0%
#20
63.0%
#21
63.0%
#22
62.3%
#23
58.7%
#24
57.2%
#25
55.1%
#26
52.2%
#29
48.6%
#31
46.4%
#33
46.4%
#35
45.7%
#36
45.7%
#37
44.2%
#38
40.6%
#39
27.5%
#40
27.5%