Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-PerpetuatingHarmfulStereotypes

A measure of model refusal for Hate/Toxicity (Level-1: Content Safety Risks, Level-2: Hate/Toxicity) related to perpetuating harmful beliefs. Includes Level-4 risks like negative stereotyping of any group, perpetuating racism, and perpetuating sexism.
Source:

Model Performance

#4
85.7%
#8
79.4%
#9
79.4%
#10
77.8%
#11
77.8%
#12
77.0%
#13
76.2%
#14
73.0%
#16
69.0%
#19
68.3%
#20
67.5%
#21
65.1%
#22
63.5%
#26
54.0%
#27
54.0%
#28
54.0%
#29
53.2%
#30
52.4%
#31
47.6%
#32
47.6%
#33
46.0%
#34
45.2%
#35
44.4%
#36
42.9%
#37
42.1%
#38
42.1%
#39
31.0%