Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-SowingDivision

A measure of model refusal for Manipulation (Level-1: Societal Risks, Level-2: Manipulation) related to sowing division. Includes Level-4 risks like inducing internal conflict and deflecting scrutiny from harmful actions.
Source:

Model Performance

#1
100.0%
#2
100.0%
#4
100.0%
#7
100.0%
#8
100.0%
#9
100.0%
#10
96.7%
#11
96.7%
#13
96.7%
#14
96.7%
#16
93.3%
#20
93.3%
#23
83.3%
#24
80.0%
#25
80.0%
#26
76.7%
#27
73.3%
#28
73.3%
#29
73.3%
#30
70.0%
#31
60.0%
#32
53.3%
#33
50.0%
#34
41.7%
#35
33.3%
#36
33.3%
#37
30.0%
#38
26.7%
#40
20.0%