Coding LLM Benchmark

Benchmarks

Model Comparison

Compare coding models by benchmark scores and pricing

Find your perfect coding model

Model Value In $/1M Out $/1M LiveCode Aider SWE BFCL Votes ↓ Context
Claude Opus 4.6 $5.00 $25.00 n/a n/a n/a 77.47% 1561 200K
Claude Opus 4.5 Low $5.00 $25.00 73.8% 89.4% 80.9% 73.24% 1469 200K
MiniMax M2.5 $0.30 $1.20 n/a n/a n/a 57.51% 1453
Gemini 3 Pro Mid $2.00 $12.00 79.7% n/a 76.2% 66.46% 1444 10M
Kimi K2 Thinking Good value $0.40 $1.75 83.1% 59.1% 71.3% 59.42% 1442 256K
Gemini 3 Flash Good value $0.50 $3.00 79.7% n/a n/a 60.61% 1441 1M
GPT-5.2 Low $1.75 $14.00 66.9% n/a 80.0% 63.01% 1395 400K
GPT-5 Good value $1.25 $10.00 84.6% 88.0% 74.9% 66.21% 1393 400K
Claude Sonnet 4.5 Low $3.00 $15.00 59.0% n/a 82.0% 60.67% 1386 200K
DeepSeek V3.2 Thinking Best value $0.27 $1.10 89.6% 74.2% n/a 62.11% 1371 128K
GPT-5.1 Codex Mid $1.25 $10.00 84.9% n/a 76.3% 65.18% 1328 200K
DeepSeek V3.2 Best value $0.27 $0.41 59.3% 70.2% n/a 52.56% 1315 128K
Claude Haiku 4.5 $1.00 $5.00 n/a n/a 73.3% 54.84% 1305 200K
Mistral Large 3 $2.00 $6.00 n/a n/a n/a 39.17% 1223 131K
Gemini 2.5 Pro Mid $1.25 $10.00 69.0% 82.2% 59.6% 54.41% 1205 1M
GPT-5 Mini Best value $0.25 $2.00 83.8% n/a n/a 58.29% n/a 200K
OpenAI o3 Good value $2.00 $8.00 80.8% 81.3% 69.1% 68.09% n/a 200K
Grok 4 Low $3.00 $15.00 79.0% 79.6% 75.0% 62.9% n/a 256K
Gemini 2.5 Flash Best value $0.15 $0.60 63.5% 55.1% n/a 45.18% n/a 1M
GPT-4.1 Mid $2.00 $8.00 52.0% 52.4% 55.0% 50.18% n/a 1M
Best Good Mid Low Data sources & methodology →