Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

SWE-bench

Evaluates the model's ability to resolve real-world software engineering issues from GitHub repositories. Models are tested on their capacity to generate code patches that fix actual bugs and implement requested features from open-source projects.
Source:

Model Performance