Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

MedQA

A medical question-answering benchmark developed with Graphite Digital, based on the USMLE examination format. The evaluation includes two phases: an unbiased baseline assessment using 2,000 medical questions, and a bias injection phase that tests models' handling of racial bias in medical contexts. Questions cover graduate-level medical knowledge while also examining the impact of racial bias on model performance and medical decision-making.
Source:

Model Performance