Coding LLM Benchmark

eXalt Value presents

The Leaderboard of LLM Coding

Compare the models of coding by scores and prices

Find the ideal coding model

Ask what you need to build — the agent will suggest models from the leaderboard.

Top models across programming benchmarks

Overall score

Leaderboard average

Score (%)

30%35%40%45%50%55%60%65%70%75%80%85%

Claude Opus 4.6GPT-5.4Claude Opus 4.5GPT-5.2-CodexQwen3.6-Plus

Issue Resolution

Fixing GitHub bugs

Score (%)

30%35%40%45%50%55%60%65%70%75%80%85%

Claude Opus 4.6Claude Opus 4.5GPT-5.4Minimax-2.7Gemini-3.1-Pro

Frontend

UI with visual context

Score (%)

30%35%40%45%50%55%60%65%70%75%80%85%

Gemini-3.1-ProClaude Opus 4.6Claude Opus 4.5GPT-5.4Claude Sonnet 4.5

Greenfield

Apps from scratch

Score (%)

30%35%40%45%50%55%60%65%70%75%80%85%

Claude Opus 4.6GPT-5.4GPT-5.2-CodexQwen3.6-PlusClaude Sonnet 4.6

Testing

Test generation and quality

Score (%)

30%35%40%45%50%55%60%65%70%75%80%85%

Claude Opus 4.6Claude Opus 4.5GPT-5.2GPT-5.4Gemini-3-Flash

Information Gathering

Research and retrieval

Score (%)

30%35%40%45%50%55%60%65%70%75%80%85%

Claude Opus 4.6GPT-5.4Gemini-3.1-ProClaude Sonnet 4.5Qwen3.6-Plus

Cost / Performance

Overall Cost/Performance

Average score vs. average cost per problem (USD). Lower-right is better value.

Explore

Agentic Leaderboard

Full rankings of coding AI agents: issue resolution, frontend, greenfield, testing and information gathering. Powered by OpenHands Index data.

SWE-bench Commit0 SWT-bench GAIA

View full leaderboard

Business Benchmarks

Compare LLMs on real-world business tasks: translation quality, long document analysis, cognitive reasoning, and spreadsheet intelligence.

FLORES-200 LongBench MMLU-Pro SpreadsheetBench

View business benchmarks

LLM Leaderboard

Best coding assistant Best agentic Reset All

Model	Value	In $/1M	Out $/1M	LiveCode	Aider	SWE	BFCL	Votes ↓	Context
Claude Opus 4.6	—	$5.00	$25.00	n/a	n/a	n/a	77.47%	1561	200K
Claude Opus 4.5	Low	$5.00	$25.00	73.8%	89.4%	80.9%	73.24%	1469	200K
MiniMax M2.5	—	$0.30	$1.20	n/a	n/a	n/a	57.51%	1453	1M
Gemini 3 Pro	Mid	$2.00	$12.00	79.7%	n/a	76.2%	66.46%	1444	10M
Kimi K2 Thinking	Good value	$0.40	$1.75	83.1%	59.1%	71.3%	59.42%	1442	256K
Gemini 3 Flash	Good value	$0.50	$3.00	79.7%	n/a	n/a	60.61%	1441	1M
GPT-5.2	Low	$1.75	$14.00	66.9%	n/a	80.0%	63.01%	1395	400K
GPT-5	Good value	$1.25	$10.00	84.6%	88.0%	74.9%	66.21%	1393	400K
Claude Sonnet 4.5	Low	$3.00	$15.00	59.0%	n/a	82.0%	60.67%	1386	200K
DeepSeek V3.2 Thinking	Best value	$0.27	$1.10	89.6%	74.2%	n/a	62.11%	1371	128K
GPT-5.1 Codex	Mid	$1.25	$10.00	84.9%	n/a	76.3%	65.18%	1328	200K
DeepSeek V3.2	Best value	$0.27	$0.41	59.3%	70.2%	n/a	52.56%	1315	128K
Claude Haiku 4.5	—	$1.00	$5.00	n/a	n/a	73.3%	54.84%	1305	200K
Mistral Large 3	—	$2.00	$6.00	n/a	n/a	n/a	39.17%	1223	131K
Gemini 2.5 Pro	Mid	$1.25	$10.00	69.0%	82.2%	59.6%	54.41%	1205	1M
Claude Sonnet 4.6	—	$3.00	$15.00	n/a	n/a	n/a	n/a	n/a	1M
GPT-5.2 Codex	—	$1.75	$14.00	n/a	n/a	n/a	n/a	n/a	400K
Gemini 3.1 Pro	—	$2.00	$12.00	n/a	n/a	n/a	n/a	n/a	10M
Kimi K2.5	—	$0.40	$1.75	n/a	n/a	n/a	n/a	n/a	256K
MiniMax M2.1	—	$0.23	$0.90	n/a	n/a	n/a	n/a	n/a	197K
Qwen3 Coder 480B	—	$0.90	$0.90	n/a	n/a	n/a	n/a	n/a	262K
GLM 4.7	—	$0.38	$1.75	n/a	n/a	n/a	n/a	n/a	203K
GPT-5 Mini	Best value	$0.25	$2.00	83.8%	n/a	n/a	58.29%	n/a	200K
OpenAI o3	Good value	$2.00	$8.00	80.8%	81.3%	69.1%	68.09%	n/a	200K
Grok 4	Low	$3.00	$15.00	79.0%	79.6%	75.0%	62.9%	n/a	256K
Gemini 2.5 Flash	Best value	$0.15	$0.60	63.5%	55.1%	n/a	45.18%	n/a	1M
GPT-4.1	Mid	$2.00	$8.00	52.0%	52.4%	55.0%	50.18%	n/a	1M

■ Best ■ Good ■ Mid ■ Low