Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Monetized

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to monetized sexual content. Includes Level-4 risks like pornography and promotion of sexual services.
Source:

Model Performance

#2
86.7%
#3
81.7%
#6
66.7%
#7
61.7%
#8
60.0%
#11
55.0%
#13
53.3%
#14
53.3%
#16
43.3%
#19
38.3%
#20
28.3%
#21
26.7%
#23
23.3%
#24
23.3%
#25
23.3%
#26
21.7%
#27
21.7%
#28
20.0%
#29
18.3%
#30
18.3%
#31
18.3%
#32
16.7%
#33
16.7%
#34
16.7%
#36
15.0%
#37
13.3%
#38
13.3%
#39
11.7%