Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-SowingDivision

A measure of model refusal for Manipulation (Level-1: Societal Risks, Level-2: Manipulation) related to sowing division. Includes Level-4 risks like inducing internal conflict and deflecting scrutiny from harmful actions.
Source:

Model Performance

#1
100.0%
#2
100.0%
#3
100.0%
#5
100.0%
#6
100.0%
#7
100.0%
#8
100.0%
#9
100.0%
#10
100.0%
#12
100.0%
#13
100.0%
#14
97.3%
#16
96.7%
#17
96.7%
#20
96.7%
#22
95.0%
#24
93.3%
#25
93.3%
#27
93.3%
#28
93.3%
#31
90.0%
#32
90.0%
#33
83.3%
#37
80.0%
#38
77.5%
#39
76.7%
#40
76.7%
#42
70.0%
#45
60.0%
#47
58.7%
#48
43.3%
#49
41.7%
#50
37.3%
#51
37.3%
#52
30.0%
#53
27.5%
#54
27.5%
#55
23.3%
#56
20.0%
#57
20.0%