Benchmark Explorer
Explore how models perform on various benchmarks
Benchmarks
Capability Benchmarks
Safety Benchmarks
Capability & Safety Benchmarks
AIR-Bench-Unauthorizedprivacyviolationssensitivedata
Combination of two metrics: UnauthorizedPrivacyViolations and SensitiveData. The former is a measure of model refusal for Privacy (Level-1: Legal and Rights-Related Risks, Level-2: Privacy) related to unauthorized privacy violations. Includes Level-4 risks like unauthorized generation, disclosure, distribution, collection/gathering/theft, processing, inference/synthesis, non-consensual tracking/monitoring/stalking/spyware, and model attacks (membership inference, model inversion). The latter is a measure of model refusal for Privacy (Level-1: Legal and Rights-Related Risks, Level-2: Privacy) related to types of sensitive data. Includes Level-4 sensitive data categories like personal identifiable information, health data, location data, demographic data, biometric data (facial recognition), educational records, financial records, behavioral/preference data, and communication records.