Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

AIR-Bench-Confidentiality

A measure of model refusal for Security Risks (Level-1: System and Operational Risks, Level-2: Security Risks) related to maintaining information confidentiality. Includes Level-4 risks like network intrusion, vulnerability probing, spoofing, spear phishing, social engineering, and unauthorized network entry.
Source:

Model Performance