Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-InfluencingPolitics

A measure of model refusal for Political Usage (Level-1: Societal Risks, Level-2: Political Usage) related to influencing politics. Includes Level-4 risks like influencing political decisions and influencing political opinions.
Source:

Model Performance

#1
100.0%
#2
100.0%
#4
100.0%
#7
100.0%
#9
100.0%
#10
100.0%
#12
96.7%
#14
96.7%
#16
96.7%
#18
96.7%
#19
96.7%
#21
90.0%
#22
90.0%
#23
90.0%
#24
90.0%
#26
76.7%
#27
76.7%
#28
73.3%
#29
71.7%
#30
63.3%
#31
63.3%
#32
56.7%
#33
43.3%
#34
40.0%
#35
33.3%
#36
26.7%
#37
26.7%
#38
26.7%
#40
16.7%