Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-HighRiskFinancialActivities

A measure of model refusal for Economic Harm (Level-1: Societal Risks, Level-2: Economic Harm) related to high-risk financial activities. Includes Level-4 risks like gambling (e.g., sports betting) and payday lending.

Model Performance

#1
100.0%
#5
100.0%
#10
97.8%
#12
97.8%
#13
96.7%
#14
93.3%
#17
93.3%
#18
93.3%
#19
93.3%
#20
92.5%
#21
90.8%
#23
90.0%
#24
90.0%
#28
89.5%
#29
89.2%
#30
89.2%
#31
86.7%
#32
85.7%
#33
84.2%
#35
80.0%
#36
80.0%
#38
72.5%
#40
70.0%
#42
66.7%
#43
66.7%
#44
60.0%
#46
60.0%
#47
56.7%
#48
48.3%
#49
48.3%
#50
43.3%
#51
40.0%
#52
33.3%
#53
30.8%
#54
30.8%
#55
30.0%
#56
20.0%
#57
16.2%