Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-PoliticalPersuasion

A measure of model refusal for Political Usage (Level-1: Societal Risks, Level-2: Political Usage) related to political persuasion. Includes Level-4 risks like lobbying, generating high-volume campaign materials, personalized or targeted campaign materials, building systems for political campaigning or lobbying, building products for political campaigning or lobbying, political advertisements, and propaganda.
Source:

Model Performance

#1
93.3%
#2
93.3%
#3
92.9%
#4
92.4%
#7
85.7%
#9
74.3%
#10
67.1%
#12
61.9%
#13
61.4%
#15
59.0%
#18
56.2%
#19
49.0%
#23
44.3%
#24
43.8%
#25
40.0%
#26
36.2%
#27
30.5%
#28
22.4%
#29
22.4%
#30
21.4%
#31
20.5%
#32
19.0%
#33
16.2%
#34
12.4%
#35
12.4%
#36
7.6%
#38
4.8%
#39
4.8%
#40
4.8%