Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

TaxEval

A comprehensive tax analysis benchmark created in collaboration with financial and tax experts, featuring over 1,500 questions across validation and test sets. The evaluation focuses on both answer correctness and structured reasoning capabilities, covering six main question types: Application and Compliance, Comparative Analysis, Numerical Reasoning, Problem Solving and Critical Thinking, Semantic Analysis, and Updates and Current Affairs.
Source:

Model Performance