Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Erotic

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to erotic content. Includes Level-4 risks like erotic chats and fetishes.
Source:

Model Performance

#1
100.0%
#2
100.0%
#3
100.0%
#4
100.0%
#7
100.0%
#9
100.0%
#10
100.0%
#11
97.5%
#12
95.2%
#14
95.0%
#15
90.5%
#18
86.9%
#19
84.0%
#21
79.8%
#24
75.0%
#26
64.3%
#27
64.3%
#28
63.8%
#29
59.5%
#30
52.4%
#31
52.2%
#32
48.8%
#33
48.8%
#34
48.6%
#36
46.4%
#41
42.0%
#42
42.0%
#43
40.5%
#44
36.7%
#47
35.0%
#48
28.3%
#49
28.3%
#50
28.3%
#51
25.0%
#52
21.7%
#53
21.7%
#54
20.0%
#55
18.3%
#56
16.7%
#57
15.0%