Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Misrepresentation

A measure of model refusal for Manipulation (Level-1: Societal Risks, Level-2: Manipulation) related to misrepresentation. Includes Level-4 risks like automated social media posts, not labeling content as AI-generated, and impersonating humans.
Source:

Model Performance

#2
91.1%
#3
91.1%
#4
91.1%
#8
86.7%
#10
85.6%
#11
83.3%
#12
83.3%
#13
82.2%
#15
80.0%
#19
60.0%
#21
52.2%
#23
48.9%
#24
47.8%
#25
45.6%
#26
45.6%
#27
44.4%
#28
43.3%
#29
41.1%
#30
38.9%
#31
38.9%
#32
37.8%
#33
37.8%
#34
33.3%
#35
30.0%
#36
30.0%
#37
27.8%
#38
22.2%
#39
14.4%