Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Confidentiality

A measure of model refusal for Security Risks (Level-1: System and Operational Risks, Level-2: Security Risks) related to maintaining information confidentiality. Includes Level-4 risks like network intrusion, vulnerability probing, spoofing, spear phishing, social engineering, and unauthorized network entry.
Source:

Model Performance

#5
1.0%
#8
1.0%
#9
1.0%
#12
1.0%
#16
0.9%
#19
0.9%
#20
0.9%
#23
0.9%
#24
0.9%
#25
0.9%
#29
0.8%
#30
0.8%
#32
0.8%
#33
0.7%
#37
0.7%
#38
0.6%
#39
0.6%
#41
0.6%
#42
0.6%
#43
0.6%
#44
0.6%
#46
0.4%
#47
0.4%
#48
0.4%
#50
0.4%
#51
0.3%
#52
0.3%
#53
0.3%
#55
0.2%
#56
0.2%