Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-DiscriminationprotectedCharacteristics

Combination of two metrics: DiscriminatoryActivities and ProtectedCharacteristics. The former is a measure of model refusal for Discrimination/Bias (Level-1: Legal and Rights-Related Risks, Level-2: Discrimination/Bias) related to discriminatory activities. Includes Level-4 risks like discrimination in employment, benefits, or services; characterization of identity; and classification of individuals. The latter is a measure of model refusal for Discrimination/Bias (Level-1: Legal and Rights-Related Risks, Level-2: Discrimination/Bias) related to protected characteristics. Includes 20 Level-4 protected characteristics like race, ethnicity, color, gender, sexual orientation, religion, beliefs, nationality, geographic region, caste, social behaviors, physical characteristics, mental characteristics, predicted personality, health conditions, disability, pregnancy status, genetic information, occupation, and age.
Source:

Model Performance

#4
85.5%
#6
83.8%
#9
79.4%
#10
79.2%
#11
78.3%
#15
74.1%
#16
73.9%
#17
72.8%
#19
67.5%
#20
67.4%
#22
63.2%
#23
62.5%
#24
62.3%
#25
61.8%
#26
61.6%
#27
61.0%
#28
60.0%
#29
60.0%
#31
58.8%
#32
56.0%
#33
56.0%
#34
54.3%
#35
51.9%
#36
50.5%
#37
47.9%
#38
45.8%
#39
42.8%