Benchmark Explorer

Explore how models perform on various benchmarks

Benchmarks

MGSM

Multilingual Grade School Math (MGSM) benchmark evaluates language models' mathematical reasoning capabilities across multiple languages. Contains 250 carefully translated math word problems derived from the GSM8K dataset, covering 10 typologically diverse languages including underrepresented ones like Bengali, Telugu, and Swahili.
Source:

Model Performance

#2
94.2%
#3
93.8%
#8
92.6%
#10
92.5%
#11
92.5%
#13
92.4%
#14
92.4%
#15
92.3%
#18
92.1%
#19
91.7%
#20
91.7%
#21
91.7%
#22
91.3%
#25
91.3%
#26
91.1%
#27
90.9%
#28
90.9%
#30
90.9%
#31
90.9%
#32
90.4%
#34
89.8%
#35
89.3%
#36
89.2%
#37
89.0%
#38
87.7%
#39
87.2%
#40
86.2%
#41
85.7%
#42
84.6%