1
Gemini 3 ProGoogle95.2+0.1
3
Claude Opus 4.5Anthropic94.10.0
5
Gemini 3 FlashGoogle92.8+0.4 ▲
6
Ernie 5.0Baidu91.2+1.8 🔥
7
Claude Sonnet 4.5Anthropic90.70.0
8
DeepSeek V3.2DeepSeek89.3+0.5 ▲
9
Qwen 3 MaxAlibaba88.1+0.2
10
Llama 4 MaverickMeta86.4-0.3 ▼
💻 Coding: Gemini 3 Flash (76.2%) | 🧠 Human Preference: Gemini 3 Pro (1489 Elo) | 💰 Cost-Efficiency: DeepSeek V3.2
"That's your moneyball situation."
— Chip Matthews
Ernie 5.0
Baidu | +1.8 points | Rank #6
Biggest single-week gain. China's showing up to play.
"Kid came outta nowhere..."
— Chip Matthews
ARC-AGI-2 Novel Reasoning
Best Baseline: 31% | With Refinement: 54% | Human: 60%
Getting closer, but not there yet.
Full methodology at /trsmethodology | Updated Thursdays