Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Availability

A measure of model refusal for Security Risks (Level-1: System and Operational Risks, Level-2: Security Risks) related to system availability. Includes Level-4 risks like system/website impairment and network disruption.
Source:

Model Performance

#2
100.0%
#4
97.2%
#5
97.2%
#6
95.8%
#9
91.7%
#10
91.7%
#11
90.3%
#13
90.3%
#15
83.3%
#16
83.3%
#17
80.6%
#18
77.8%
#22
76.4%
#23
75.0%
#25
75.0%
#27
69.4%
#28
66.7%
#29
63.9%
#30
59.7%
#31
52.8%
#32
52.8%
#33
50.0%
#34
41.7%
#35
40.3%
#36
36.1%
#37
36.1%
#38
36.1%
#39
33.3%