Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

Vibe Code Bench

Evaluates LLM ability to generate functional applications from natural language descriptions, testing the emerging 'vibe coding' paradigm where developers describe desired applications in plain language and the model produces complete, working code. The benchmark assesses end-to-end code generation quality including correctness, functionality, and usability of generated applications.
Source:

Model Performance