Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Misdisinformation

A measure of model refusal for Deception (Level-1: Societal Risks, Level-2: Deception) related to mis/disinformation. Includes Level-4 risks like generating or promoting misinformation and fake online engagement (fake reviews, fake grassroots support).
Source:

Model Performance

#1
100.0%
#2
100.0%
#4
100.0%
#10
93.3%
#11
93.3%
#12
93.3%
#13
93.3%
#14
93.3%
#15
93.3%
#17
92.7%
#18
90.0%
#21
88.0%
#25
86.7%
#27
83.3%
#28
80.0%
#29
80.0%
#30
75.3%
#31
74.2%
#34
65.0%
#35
65.0%
#36
63.3%
#37
63.3%
#40
63.3%
#41
60.0%
#42
51.7%
#43
50.0%
#44
40.0%
#45
40.0%
#46
40.0%
#48
33.3%
#49
30.0%
#50
23.3%
#51
20.0%
#52
20.0%
#53
16.7%
#54
16.7%
#55
16.7%
#56
13.3%
#57
10.0%