eXalt Value presents
AI Coding Agents Leaderboard
What each category measures:
- Issue Resolution — Fixing bugs in real GitHub issues (SWE-Bench)
- Frontend — UI development with visual context (SWE-Bench Multimodal)
- Greenfield — Building new applications from scratch (Commit0)
- Testing — Test generation and quality (SWT-Bench)
- Information Gathering — Research and information retrieval (GAIA)
Leaderboard
| Model | Avg Score ↓ | Avg Cost / run | Avg Runtime | Context | Bugs | Frontend | Greenfield | Testing | Information |
|---|---|---|---|---|---|---|---|---|---|
|
|
66.7 | $1.14 | 410s | 200K | 76.8% ($0.77) | 41.8% ($2.37) | 56.2% ($1.70) | 78.8% ($0.43) | 80.0% ($0.44) |
|
|
63.8 | $1.82 | 372s | — | 75.6% ($1.36) | 36.8% ($4.24) | 56.2% ($2.19) | 71.4% ($0.55) | 78.8% ($0.74) |
|
|
60.6 | $1.77 | 378s | 200K | 76.6% ($1.82) | 41.2% ($2.54) | 37.5% ($2.54) | 78.5% ($1.38) | 69.1% ($0.55) |
|
|
59.5 | $1.43 | 819s | 400K | 73.8% ($0.94) | 35.9% ($2.97) | 50.0% ($2.02) | 67.0% ($0.66) | 70.9% ($0.55) |
|
|
57.9 | $2.12 | 819s | — | 74.2% ($1.52) | 30.9% ($2.27) | 50.0% ($4.40) | 62.1% ($2.04) | 72.1% ($0.34) |
|
|
56.3 | $1.20 | 596s | 400K | 74.6% ($0.86) | 30.9% ($2.77) | 37.5% ($1.34) | 73.2% ($0.56) | 65.5% ($0.48) |
|
|
55.7 | $0.80 | 883s | 10M | 75.4% ($0.63) | 44.1% ($1.24) | 18.8% ($1.52) | 64.0% ($0.50) | 76.4% ($0.12) |
|
|
53.0 | $1.57 | 583s | 200K | 74.2% ($1.19) | 36.8% ($1.89) | 12.5% ($2.90) | 68.8% ($0.98) | 72.7% ($0.87) |
|
|
49.4 | $0.97 | 1323s | — | 73.4% ($1.06) | 35.3% ($0.58) | 31.2% ($1.96) | 47.3% ($0.91) | 60.0% ($0.36) |
|
|
49.2 | $1.09 | 854s | 256K | 68.8% ($0.48) | 32.8% ($1.58) | 18.8% ($2.86) | 61.9% ($0.42) | 63.6% ($0.13) |
|
|
49.0 | $1.42 | 1091s | 10M | 70.6% ($0.95) | 36.8% ($1.46) | 25.0% ($3.18) | 68.6% ($1.01) | 44.2% ($0.50) |
|
|
49.0 | $0.64 | 726s | 1M | 74.6% ($0.42) | 22.1% ($0.80) | 18.8% ($1.28) | 70.7% ($0.30) | 58.8% ($0.38) |
|
|
46.5 | $0.13 | 715s | 1M | 72.6% ($0.10) | 25.0% ($0.15) | 18.8% ($0.29) | 68.1% ($0.07) | 47.9% ($0.02) |
|
|
44.6 | $0.36 | 1032s | — | 75.6% ($0.17) | 27.9% ($0.33) | 25.0% ($0.94) | 69.1% ($0.13) | 25.5% ($0.21) |
|
|
44.4 | $0.13 | 1124s | 128K | 71.6% ($0.16) | 27.9% ($0.19) | 18.8% ($0.12) | 53.6% ($0.12) | 50.3% ($0.06) |
|
|
43.3 | $1.29 | 501s | 1M | 74.4% ($1.03) | 30.9% ($2.24) | 43.8% ($1.88) | 54.0% ($0.87) | 13.3% ($0.41) |
|
|
41.0 | $0.44 | 968s | 203K | 73.4% ($0.56) | 22.1% ($0.66) | 6.2% ($0.47) | 49.4% ($0.37) | 53.9% ($0.15) |
|
|
39.9 | $0.22 | 1211s | 197K | 68.8% ($0.14) | 16.2% ($0.21) | 12.5% ($0.61) | 61.4% ($0.11) | 40.6% ($0.02) |
|
|
39.7 | $1.76 | 1484s | 256K | 69.2% ($2.00) | 32.4% ($2.31) | 6.2% ($2.47) | 47.3% ($1.39) | 43.6% ($0.65) |
| Nemotron-3-Super | 36.2 | $0.57 | 1329s | — | 62.0% ($0.44) | 20.6% ($0.73) | 12.5% ($1.23) | 45.7% ($0.32) | 40.0% ($0.11) |
|
|
30.9 | $0.92 | 502s | 262K | 62.4% ($1.26) | 23.5% ($2.09) | 0.0% ($0.01) | 34.9% ($0.97) | 33.9% ($0.28) |
■ Best
■ Good
■ Mid
■ Low
OpenHands Team (2025). OpenHands Index: A Comprehensive Leaderboard for AI Coding Agents. index.openhands.dev