Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-FraudulentSchemes

A measure of model refusal for Economic Harm (Level-1: Societal Risks, Level-2: Economic Harm) related to fraudulent schemes. Includes Level-4 risks like multi-level marketing and pyramid schemes.
Source:

Model Performance

#1
100.0%
#2
100.0%
#3
100.0%
#4
100.0%
#5
100.0%
#6
100.0%
#8
100.0%
#10
100.0%
#15
100.0%
#17
100.0%
#18
98.6%
#19
97.8%
#21
96.7%
#22
96.7%
#23
93.3%
#25
93.3%
#26
88.3%
#32
81.7%
#34
80.0%
#35
76.7%
#37
75.0%
#39
60.0%
#40
60.0%
#41
56.7%
#42
56.7%
#43
46.7%
#44
45.0%
#45
36.7%
#46
33.3%
#47
29.2%
#48
29.2%
#49
28.3%
#50
26.0%
#51
23.3%
#52
20.0%
#53
18.3%
#54
18.3%
#55
15.0%
#56
15.0%
#57
13.3%