eXalt Value presents
Benchmarks Business
Comparez les LLM sur des tâches métier concrètes : qualité de traduction, analyse de longs documents, raisonnement cognitif et intelligence tableur.
| # |
Modèle |
Prix (In/Out) |
BLEU ↓ |
COMET ↓ |
Contexte |
| 🥇 |
Gemini 3.1 Pro
|
$2,00/$12,00 |
62,0
|
91,5
|
10M |
| 🥈 |
Gemini 3 Pro
|
$2,00/$12,00 |
61,0
|
91,2
|
10M |
| 🥉 |
GPT-5.2
|
$1,75/$14,00 |
61,2
|
91,0
|
400K |
| 4 |
GPT-5
|
$1,25/$10,00 |
60,5
|
90,2
|
400K |
| 5 |
Gemini 2.5 Pro
|
$1,25/$10,00 |
59,5
|
90,0
|
1M |
| 6 |
Claude Opus 4.6
|
$5,00/$25,00 |
58,3
|
89,0
|
200K |
| 7 |
Grok 4
|
$3,00/$15,00 |
58,0
|
88,8
|
256K |
| 8 |
Claude Opus 4.5
|
$5,00/$25,00 |
57,8
|
88,5
|
200K |
| 9 |
Qwen3 Coder 480B
|
$0,90/$0,90 |
57,0
|
88,0
|
262K |
| 10 |
Claude Sonnet 4.6
|
$3,00/$15,00 |
56,5
|
87,8
|
1M |
| 11 |
DeepSeek V3.2 Thinking
|
$0,27/$1,10 |
56,0
|
87,5
|
128K |
| 12 |
OpenAI o3
|
$2,00/$8,00 |
56,8
|
87,5
|
200K |
| 13 |
Gemini 3 Flash
|
$0,50/$3,00 |
56,0
|
87,0
|
1M |
| 14 |
DeepSeek V3.2
|
$0,27/$0,41 |
55,5
|
87,0
|
128K |
| 15 |
Claude Sonnet 4.5
|
$3,00/$15,00 |
55,0
|
86,9
|
200K |
| 16 |
GPT-4.1
|
$2,00/$8,00 |
55,0
|
86,5
|
1M |
| 17 |
Mistral Large 3
|
$2,00/$6,00 |
54,5
|
86,0
|
131K |
| 18 |
Kimi K2.5
|
$0,40/$1,75 |
54,0
|
85,5
|
256K |
| 19 |
GPT-5 Mini
|
$0,25/$2,00 |
53,0
|
85,5
|
200K |
| 20 |
Kimi K2 Thinking
|
$0,40/$1,75 |
53,5
|
85,0
|
256K |
| 21 |
Gemini 2.5 Flash
|
$0,15/$0,60 |
52,0
|
84,5
|
1M |
| 22 |
Claude Haiku 4.5
|
$1,00/$5,00 |
51,2
|
84,0
|
200K |
| 23 |
GLM 4.7
|
$0,38/$1,75 |
52,0
|
84,0
|
203K |
| 24 |
MiniMax M2.5
|
$0,30/$1,20 |
50,0
|
83,0
|
1M |
| 25 |
MiniMax M2.1
|
$0,23/$0,90 |
48,0
|
81,5
|
197K |
| # |
Modèle |
Prix (In/Out) |
LongBench v2 ↓ |
RULER ↓ |
Contexte |
| 🥇 |
Gemini 3.1 Pro
|
$2,00/$12,00 |
60,5%
|
95,8%
|
10M |
| 🥈 |
GPT-5.2
|
$1,75/$14,00 |
58,5%
|
98,0%
|
400K |
| 🥉 |
Gemini 3 Pro
|
$2,00/$12,00 |
58,0%
|
95,0%
|
10M |
| 4 |
GPT-5.2 Codex
|
$1,75/$14,00 |
57,0%
|
96,0%
|
400K |
| 5 |
GPT-5
|
$1,25/$10,00 |
56,8%
|
92,0%
|
400K |
| 6 |
Gemini 2.5 Pro
|
$1,25/$10,00 |
55,0%
|
95,8%
|
1M |
| 7 |
Claude Opus 4.6
|
$5,00/$25,00 |
54,2%
|
76,0%
|
200K |
| 8 |
Gemini 3 Flash
|
$0,50/$3,00 |
52,5%
|
88,0%
|
1M |
| 9 |
Claude Opus 4.5
|
$5,00/$25,00 |
52,1%
|
72,0%
|
200K |
| 10 |
OpenAI o3
|
$2,00/$8,00 |
51,0%
|
80,0%
|
200K |
| 11 |
Claude Sonnet 4.6
|
$3,00/$15,00 |
50,8%
|
82,0%
|
1M |
| 12 |
Grok 4
|
$3,00/$15,00 |
50,5%
|
78,0%
|
256K |
| 13 |
DeepSeek V3.2 Thinking
|
$0,27/$1,10 |
50,0%
|
72,0%
|
128K |
| 14 |
Qwen3 Coder 480B
|
$0,90/$0,90 |
49,5%
|
76,0%
|
262K |
| 15 |
GPT-5.1 Codex
|
$1,25/$10,00 |
49,0%
|
85,0%
|
200K |
| 16 |
DeepSeek V3.2
|
$0,27/$0,41 |
48,7%
|
70,0%
|
128K |
| 17 |
Claude Sonnet 4.5
|
$3,00/$15,00 |
48,5%
|
68,0%
|
200K |
| 18 |
Kimi K2.5
|
$0,40/$1,75 |
48,0%
|
76,0%
|
256K |
| 19 |
Gemini 2.5 Flash
|
$0,15/$0,60 |
48,0%
|
82,0%
|
1M |
| 20 |
Kimi K2 Thinking
|
$0,40/$1,75 |
47,0%
|
74,0%
|
256K |
| 21 |
GPT-4.1
|
$2,00/$8,00 |
46,5%
|
80,0%
|
1M |
| 22 |
MiniMax M2.5
|
$0,30/$1,20 |
46,0%
|
72,0%
|
1M |
| 23 |
GPT-5 Mini
|
$0,25/$2,00 |
45,2%
|
78,0%
|
200K |
| 24 |
GLM 4.7
|
$0,38/$1,75 |
44,5%
|
68,0%
|
203K |
| 25 |
Claude Haiku 4.5
|
$1,00/$5,00 |
44,0%
|
60,0%
|
200K |
| 26 |
Mistral Large 3
|
$2,00/$6,00 |
43,0%
|
62,0%
|
131K |
| 27 |
MiniMax M2.1
|
$0,23/$0,90 |
42,0%
|
65,0%
|
197K |
| # |
Modèle |
Prix (In/Out) |
MMLU-Pro ↓ |
GPQA Diamond ↓ |
Contexte |
| 🥇 |
Gemini 3.1 Pro
|
$2,00/$12,00 |
90,2%
|
94,3%
|
10M |
| 🥈 |
Gemini 3 Pro
|
$2,00/$12,00 |
89,8%
|
91,9%
|
10M |
| 🥉 |
Claude Opus 4.5
|
$5,00/$25,00 |
89,5%
|
86,5%
|
200K |
| 4 |
Claude Opus 4.6
|
$5,00/$25,00 |
88,2%
|
89,0%
|
200K |
| 5 |
Claude Sonnet 4.6
|
$3,00/$15,00 |
87,5%
|
89,9%
|
1M |
| 6 |
Claude Sonnet 4.5
|
$3,00/$15,00 |
87,5%
|
84,2%
|
200K |
| 7 |
GPT-5.2
|
$1,75/$14,00 |
87,4%
|
92,4%
|
400K |
| 8 |
Gemini 2.5 Pro
|
$1,25/$10,00 |
87,2%
|
85,0%
|
1M |
| 9 |
GPT-5
|
$1,25/$10,00 |
87,1%
|
87,0%
|
400K |
| 10 |
GPT-5.1 Codex
|
$1,25/$10,00 |
87,0%
|
88,1%
|
200K |
| 11 |
GPT-4.1
|
$2,00/$8,00 |
86,5%
|
82,0%
|
1M |
| 12 |
Grok 4
|
$3,00/$15,00 |
86,4%
|
88,9%
|
256K |
| 13 |
OpenAI o3
|
$2,00/$8,00 |
86,0%
|
85,0%
|
200K |
| 14 |
DeepSeek V3.2 Thinking
|
$0,27/$1,10 |
85,9%
|
85,3%
|
128K |
| 15 |
Kimi K2 Thinking
|
$0,40/$1,75 |
84,6%
|
80,0%
|
256K |
| 16 |
Gemini 3 Flash
|
$0,50/$3,00 |
84,5%
|
90,4%
|
1M |
| 17 |
Qwen3 Coder 480B
|
$0,90/$0,90 |
83,0%
|
80,0%
|
262K |
| 18 |
DeepSeek V3.2
|
$0,27/$0,41 |
82,5%
|
78,0%
|
128K |
| 19 |
Mistral Large 3
|
$2,00/$6,00 |
81,0%
|
75,0%
|
131K |
| 20 |
GPT-5 Mini
|
$0,25/$2,00 |
79,5%
|
74,0%
|
200K |
| 21 |
Kimi K2.5
|
$0,40/$1,75 |
78,5%
|
75,0%
|
256K |
| 22 |
Claude Haiku 4.5
|
$1,00/$5,00 |
78,0%
|
72,5%
|
200K |
| 23 |
GLM 4.7
|
$0,38/$1,75 |
78,0%
|
72,0%
|
203K |
| 24 |
MiniMax M2.5
|
$0,30/$1,20 |
76,0%
|
70,0%
|
1M |
| 25 |
Gemini 2.5 Flash
|
$0,15/$0,60 |
76,0%
|
70,0%
|
1M |
| 26 |
MiniMax M2.1
|
$0,23/$0,90 |
72,0%
|
65,0%
|
197K |
| # |
Modèle |
Prix (In/Out) |
SpreadsheetBench ↓ |
FinSheet-Bench ↓ |
Contexte |
| 🥇 |
Gemini 3.1 Pro
|
$2,00/$12,00 |
52,0%
|
82,4%
|
10M |
| 🥈 |
GPT-5.2
|
$1,75/$14,00 |
48,2%
|
80,4%
|
400K |
| 🥉 |
Claude Opus 4.6
|
$5,00/$25,00 |
42,9%
|
80,2%
|
200K |
| 4 |
Gemini 3 Pro
|
$2,00/$12,00 |
50,5%
|
80,2%
|
10M |
| 5 |
GPT-5
|
$1,25/$10,00 |
45,5%
|
78,5%
|
400K |
| 6 |
Claude Opus 4.5
|
$5,00/$25,00 |
41,5%
|
78,0%
|
200K |
| 7 |
Claude Sonnet 4.6
|
$3,00/$15,00 |
40,2%
|
76,5%
|
1M |
| 8 |
Gemini 2.5 Pro
|
$1,25/$10,00 |
46,0%
|
75,5%
|
1M |
| 9 |
Grok 4
|
$3,00/$15,00 |
41,0%
|
74,5%
|
256K |
| 10 |
OpenAI o3
|
$2,00/$8,00 |
42,0%
|
74,0%
|
200K |
| 11 |
Claude Sonnet 4.5
|
$3,00/$15,00 |
38,5%
|
73,0%
|
200K |
| 12 |
Gemini 3 Flash
|
$0,50/$3,00 |
44,0%
|
72,0%
|
1M |
| 13 |
DeepSeek V3.2 Thinking
|
$0,27/$1,10 |
38,0%
|
70,0%
|
128K |
| 14 |
GPT-4.1
|
$2,00/$8,00 |
40,0%
|
70,0%
|
1M |
| 15 |
DeepSeek V3.2
|
$0,27/$0,41 |
36,0%
|
68,0%
|
128K |
| 16 |
Qwen3 Coder 480B
|
$0,90/$0,90 |
37,0%
|
66,0%
|
262K |
| 17 |
Kimi K2 Thinking
|
$0,40/$1,75 |
34,0%
|
66,0%
|
256K |
| 18 |
GPT-5 Mini
|
$0,25/$2,00 |
35,0%
|
65,0%
|
200K |
| 19 |
Kimi K2.5
|
$0,40/$1,75 |
35,0%
|
64,0%
|
256K |
| 20 |
Claude Haiku 4.5
|
$1,00/$5,00 |
32,0%
|
62,0%
|
200K |
| 21 |
Mistral Large 3
|
$2,00/$6,00 |
34,0%
|
62,0%
|
131K |
| 22 |
Gemini 2.5 Flash
|
$0,15/$0,60 |
38,0%
|
60,0%
|
1M |
| 23 |
MiniMax M2.5
|
$0,30/$1,20 |
33,0%
|
58,0%
|
1M |
| 24 |
GLM 4.7
|
$0,38/$1,75 |
32,0%
|
58,0%
|
203K |
| 25 |
MiniMax M2.1
|
$0,23/$0,90 |
30,0%
|
52,0%
|
197K |
■ Meilleur
■ Bon
■ Moyen
■ Faible