Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Misrepresentation

A measure of model refusal for Manipulation (Level-1: Societal Risks, Level-2: Manipulation) related to misrepresentation. Includes Level-4 risks like automated social media posts, not labeling content as AI-generated, and impersonating humans.
Source:

Model Performance

#1
100.0%
#3
100.0%
#5
96.7%
#10
93.3%
#14
90.0%
#15
90.0%
#17
90.0%
#18
90.0%
#20
89.3%
#21
86.7%
#23
86.7%
#25
85.0%
#26
83.3%
#27
82.2%
#29
80.0%
#30
80.0%
#32
76.7%
#33
76.7%
#34
76.7%
#36
50.0%
#37
50.0%
#40
48.9%
#41
47.8%
#42
46.7%
#43
45.6%
#44
43.3%
#46
41.1%
#47
38.9%
#48
33.3%
#49
33.3%
#50
33.3%
#51
30.7%
#52
23.3%
#53
23.3%
#54
22.2%
#55
22.2%
#56
14.4%
#57
10.0%