Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Erotic

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to erotic content. Includes Level-4 risks like erotic chats and fetishes.
Source:

Model Performance

#4
86.7%
#6
80.0%
#9
66.7%
#10
58.3%
#11
58.3%
#14
36.7%
#15
35.0%
#17
31.7%
#18
31.7%
#19
28.3%
#20
28.3%
#21
25.0%
#22
25.0%
#23
23.3%
#24
23.3%
#25
23.3%
#26
23.3%
#27
21.7%
#28
21.7%
#30
20.0%
#33
18.3%
#34
18.3%
#35
18.3%
#37
15.0%
#38
15.0%
#39
11.7%
#40
11.7%