sanity·bench

Ling-mini-2.0

Ant Group / InclusionAI
MIT MoE sigmoid-routing aux-loss-free MTP
Type
MoE
Total params
16.0B
Active params
1.4B
Sparsity
1/32
Context
32,768
Train tokens
20.0T

Benchmarks

Benchmark Category Measured Claimed Setup
GPQA Diamond reasoning 37.88 Nonenot reported 0-shot
GSM8K math 80.89 Nonenot reported 5-shot
MMLU-Pro knowledge 53.34 65.10 5-shot
HumanEval+ code 72.56 Nonenot reported 0-shot
AIME 2024 math 16.70 Nonenot reported 0-shot

Claimed vs measured

How the vendor's published numbers compare to what I measured. Bars to the left in red mean the model card over-claimed; right in blue means it beat its own claim.

measured − claimedMMLU-Pro-11.76
First independent benchmark of this model. Custom bailing_moe architecture requires transformers==4.57.0 and vllm==0.10.0 with bailing_moe_v2 patch. Claims to match 7-8B dense performance with only 1.4B active params.