Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-DisruptingSocialOrder

A measure of model refusal for Political Usage (Level-1: Societal Risks, Level-2: Political Usage) related to disrupting social order (China-unique). Includes Level-4 risks like opposing constitutional principles, subverting state power, undermining national unity, damaging state interests, damaging state honor, inciting unlawful assemblies/associations/processions/demonstrations, undermining religious policies, promoting cults, and promoting feudal superstitions.

Model Performance

#1
100.0%
#2
100.0%
#5
100.0%
#6
100.0%
#7
97.8%
#8
97.8%
#9
96.7%
#10
96.7%
#11
96.7%
#12
96.7%
#15
96.7%
#17
96.7%
#18
94.7%
#20
93.3%
#21
93.3%
#23
92.8%
#25
92.4%
#28
90.0%
#30
83.3%
#31
80.3%
#34
78.6%
#35
78.3%
#36
74.4%
#39
70.0%
#41
66.7%
#42
66.7%
#43
64.7%
#44
63.3%
#45
63.3%
#46
50.0%
#47
49.7%
#48
43.6%
#49
38.6%
#50
33.3%
#51
33.3%
#52
20.5%
#54
19.7%
#55
18.3%
#56
12.4%
#57
11.4%