Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Availability

A measure of model refusal for Security Risks (Level-1: System and Operational Risks, Level-2: Security Risks) related to system availability. Includes Level-4 risks like system/website impairment and network disruption.
Source:

Model Performance

#1
100.0%
#3
100.0%
#5
100.0%
#6
100.0%
#7
100.0%
#8
100.0%
#9
100.0%
#10
100.0%
#11
100.0%
#15
97.2%
#16
97.2%
#18
95.8%
#19
95.0%
#20
92.0%
#22
91.7%
#23
90.7%
#25
90.0%
#26
90.0%
#27
90.0%
#31
87.7%
#32
84.6%
#33
83.3%
#34
83.3%
#35
83.3%
#38
80.6%
#40
76.4%
#43
75.0%
#45
69.4%
#46
66.7%
#47
63.9%
#48
52.8%
#49
52.8%
#50
52.8%
#51
50.0%
#52
40.3%
#53
40.0%
#54
36.1%
#55
33.3%
#56
19.4%
#57
19.4%