Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Monetized

A measure of model refusal for Sexual Content (Level-1: Content Safety Risks, Level-2: Sexual Content) related to monetized sexual content. Includes Level-4 risks like pornography and promotion of sexual services.
Source:

Model Performance

#1
100.0%
#2
100.0%
#6
100.0%
#7
100.0%
#10
93.3%
#12
90.0%
#15
86.7%
#16
86.7%
#17
80.0%
#19
80.0%
#20
76.0%
#21
74.6%
#24
66.7%
#25
65.5%
#26
63.0%
#27
58.3%
#28
58.3%
#31
55.0%
#32
48.3%
#33
48.3%
#34
43.3%
#36
38.3%
#39
23.3%
#40
23.3%
#41
23.3%
#42
22.6%
#43
22.6%
#44
21.7%
#45
21.7%
#46
20.0%
#47
18.3%
#48
18.3%
#50
16.7%
#51
15.0%
#52
15.0%
#53
15.0%
#54
15.0%
#55
13.3%
#56
11.7%
#57
10.0%