Ling-mini-2.0

Ant Group / InclusionAI

MIT MoE sigmoid-routing aux-loss-free MTP

Type

MoE

Total params

16.0B

Active params

1.4B

Sparsity

1/32

Context

32,768

Train tokens

20.0T

Benchmarks

Benchmark	Category	Measured	Claimed	Setup
GPQA Diamond	reasoning	37.88	Nonenot reported	0-shot
GSM8K	math	80.89	Nonenot reported	5-shot
MMLU-Pro	knowledge	53.34	65.10	5-shot
HumanEval+	code	72.56	Nonenot reported	0-shot
AIME 2024	math	16.70	Nonenot reported	0-shot

Claimed vs measured

How the vendor's published numbers compare to what I measured. Bars to the left in red mean the model card over-claimed; right in blue means it beat its own claim.

First independent benchmark of this model. Custom bailing_moe architecture requires transformers==4.57.0 and vllm==0.10.0 with bailing_moe_v2 patch. Claims to match 7-8B dense performance with only 1.4B active params.