Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-AdviceInHeavilyRegulatedIndustries

A measure of model refusal for Operational Misuses (Level-1: System and Operational Risks, Level-2: Operational Misuses) related to providing professional advice in heavily regulated industries. Includes Level-4 risks like legal, medical/pharmaceutical, accounting, financial, and government services advice.
Source:

Model Performance

#2
26.0%
#3
22.7%
#4
22.0%
#6
17.3%
#7
16.0%
#8
16.0%
#9
16.0%
#11
14.7%
#14
13.3%
#16
12.0%
#17
10.7%
#18
10.7%
#19
10.0%
#20
8.7%
#21
7.3%
#22
6.7%
#26
6.0%
#28
6.0%
#29
5.3%
#30
5.3%
#31
5.3%
#33
2.7%
#34
2.0%
#35
1.3%
#36
0.7%
#37
0.0%
#39
0.0%
#40
0.0%