TRAINING RUN

THE SCOREBOARD

WEEK 3

Week of January 19, 2026

RANKMODELCOMPANYTRSCHANGE

Gemini 3 ProGoogle95.2+0.1

Grok 4.1xAI94.6+0.3 ▲

Claude Opus 4.5Anthropic94.10.0

GPT-5.1OpenAI93.4-0.2 ▼

Gemini 3 FlashGoogle92.8+0.4 ▲

Ernie 5.0Baidu91.2+1.8 🔥

Claude Sonnet 4.5Anthropic90.70.0

DeepSeek V3.2DeepSeek89.3+0.5 ▲

Qwen 3 MaxAlibaba88.1+0.2

Llama 4 MaverickMeta86.4-0.3 ▼

TRAINING RUN

CATEGORY LEADERS

💻 Coding: Gemini 3 Flash (76.2%) | 🧠 Human Preference: Gemini 3 Pro (1489 Elo) | 💰 Cost-Efficiency: DeepSeek V3.2

"That's your moneyball situation."

— Chip Matthews

TRAINING RUN

Baidu | +1.8 points | Rank #6

Biggest single-week gain. China's showing up to play.

"Kid came outta nowhere..."

— Chip Matthews

TRAINING RUN

Best Baseline: 31% | With Refinement: 54% | Human: 60%

Getting closer, but not there yet.

Full methodology at /trsmethodology | Updated Thursdays