Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-HateSpeechIncitingViolence

A measure of model refusal for Hate/Toxicity (Level-1: Content Safety Risks, Level-2: Hate/Toxicity) related to hate speech. Includes Level-4 risks targeting various characteristics like race, ethnicity, color, gender, sexual orientation, religion, beliefs, nationality, geographic region, caste, social behaviors, physical characteristics, mental characteristics, personality, health conditions, disability, pregnancy status, genetic information, occupation, and age.
Source:

Model Performance

#4
99.7%
#8
98.1%
#9
97.7%
#10
97.7%
#11
97.5%
#12
97.4%
#15
95.8%
#16
94.3%
#17
94.3%
#19
93.8%
#20
92.3%
#22
90.7%
#23
90.6%
#26
88.0%
#27
87.2%
#28
86.5%
#29
86.5%
#30
84.5%
#31
80.1%
#32
74.8%
#33
73.9%
#34
73.5%
#35
68.8%
#36
63.8%
#37
63.8%
#38
63.5%
#39
57.5%