#1 80.9%
GPT 5.4 High
Proprietary
💻 Coding 81.8%*
🧠 Reasoning 83.0%*
🤖 Agents & Tools 83.8%*
| Favorite | Rank | Model | Type | 💻 Coding | 🧠 Reasoning | 🤖 Agents & Tools | 💬 Conversation | 🔢 Math | 👁️ Multimodal | 🧠 Knowledge | Price | Speed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
#1 80.9% | GPT 5.4 High | Proprietary | 81.8% * | 83.0% * | 83.8% * | 68.6% * | 88.3% * | 72.8% * | 81.4% * | $8.75 | 75.3 t/s | |
#2 80.3% | Claude Opus 4.6 Thinking | Proprietary | 81.8% * | 78.1% * | — | 79.7% * | — | — | 89.2% * | $15.00 | 67.8 t/s | |
#3 80.2% | Gemini 3.1 Pro Preview | Proprietary | 80.7% * | 81.9% * | — | — | — | 68.2% * | 89.5% * | $7.00 | 130 t/s | |
#7 74.6% | Claude Opus 4.5 Thinking | Proprietary | 80.3% * | 67.6% | 78.0% | 72.3% | 82.6% | 61.0% | 77.6% | $15.00 | 35 t/s | |
#4 77.1% | GPT 5.2 Pro | Proprietary | 80.0% * | 75.7% * | 78.9% | 64.8% * | 88.7% * | 71.3% * | 77.7% * | $94.50 | 28 t/s | |
#9 74.4% | GPT 5.2 | Proprietary | 79.2% * | 73.2% * | 63.8% * | 75.0% * | 85.0% * | 70.2% * | 81.2% * | $7.88 | 187 t/s | |
#12 71.0% | Claude Opus 4.5 | Proprietary | 78.9% * | 64.8% | 70.2% | 69.2% | 79.5% | 56.9% | 73.3% * | $15.00 | 65 t/s | |
#18 68.0% | Claude Sonnet 4.5 Thinking | Proprietary | 78.8% * | 58.4% | 65.8% | 68.5% | 77.2% | 54.4% * | 71.2% | $9.00 | 45 t/s | |
#8 74.5% | Gemini 3 Flash Thinking | Proprietary | 78.6% * | 68.1% | 78.5% * | 72.2% | 78.1% * | 67.5% * | 84.1% * | $1.75 | 180 t/s | |
#14 70.4% | Grok 4.1 Thinking | Proprietary | 78.2% * | 63.1% * | 58.5% * | 67.8% * | 80.6% * | 86.6% * | 78.0% * | $9.00 | 45 t/s | |
#5 74.8% | GPT 5.2 High | Proprietary | 78.2% * | 71.9% | 77.3% | 62.4% | 87.9% | 68.9% | 75.6% | $7.88 | 45 t/s | |
#15 70.2% | Gemini 3 Flash | Proprietary | 78.0% * | 55.6% * | 78.1% * | 71.3% | 69.1% * | 67.3% * | 84.0% | $1.75 | 218 t/s | |
#10 74.2% | Gemini 3 Pro | Proprietary | 77.4% * | 68.3% | 71.5% | 75.2% | 84.5% | 69.8% | 85.7% | $7.00 | 128 t/s | |
#20 65.8% | Claude Sonnet 4.5 | Proprietary | 76.3% * | 54.3% * | 64.4% | 64.1% | 73.9% | 62.8% | 70.9% | $9.00 | 77 t/s | |
#13 70.9% | Kimi K2.5 Thinking | Open Source | 75.4% | 59.4% | — | — | 85.2% * | — | 80.9% * | $1.55 | 45 t/s | |
#17 68.1% | GPT 5.1 High | Proprietary | 75.4% * | 61.7% | 58.8% * | 68.2% | 83.5% | 63.8% | 75.9% | $67.50 | 40 t/s | |
#6 74.7% | Claude Opus 4.6 | Proprietary | 75.2% * | 70.7% * | — * | 78.0% * | — * | — | 89.4% * | $15.00 | 67.8 t/s | |
#16 68.4% | Grok 4.1 | Proprietary | 74.1% * | 58.8% * | 61.1% * | 66.2% * | 79.5% * | 86.2% | 76.6% * | $9.00 | 95 t/s | |
#25 62.7% | Claude Opus 4.1 | Proprietary | 74.0% | 49.3% * | 64.4% | 63.9% | 63.7% | 60.3% * | 66.3% * | $45.00 | 52 t/s | |
#11 72.4% | Gemini 3.1 Pro Preview Base | Proprietary | 73.2% * | 73.7% * | — | — | — * | 61.4% * | 80.9% * | $7.00 | 130 t/s | |
#19 67.7% | Kimi K2 Thinking | Open Source | 73.0% * | 51.7% * | 78.9% * | 61.8% * | 79.4% * | — | 73.6% * | $1.55 | 45 t/s | |
#24 62.9% | OpenAI o3 | Proprietary | 71.4% * | 52.7% * | 58.2% * | 58.3% * | 79.4% * | 59.3% | 76.5% * | $25.00 | 35 t/s | |
#— — | MiniMax M2.1 | Open Source | 70.5% * | — | — | — | — | — | — | $0.75 | 148 t/s | |
#21 65.8% | o4-mini | Proprietary | 70.2% * | 50.1% * | — | — | 84.0% * | 83.0% * | 59.6% * | $10.00 | 100 t/s | |
#23 65.0% | Kimi K2.5 Instant | Open Source | 69.9% * | 54.2% * | — * | — * | 77.2% * | — * | 73.1% * | $1.55 | 85 t/s | |
#29 60.9% | Kimi K2.5 Thinking | Open Source | 68.9% * | 49.1% * | — * | 59.1% * | 76.4% * | — | 48.8% * | $1.55 | 85 t/s | |
#28 61.2% | GPT 5.1 | Proprietary | 68.4% * | 46.8% * | 54.1% * | 73.9% * | 73.0% * | 59.3% * | 79.0% * | $3.75 | 120 t/s | |
#30 60.5% | Qwen3 Max Preview | Proprietary | 68.1% * | 37.0% | 67.4% * | 62.2% * | 76.4% * | 67.0% * | 75.6% * | $3.60 | 85 t/s | |
#27 62.4% | Gemini 2.5 Pro | Proprietary | 67.3% | 55.3% | 53.1% | 64.9% | 77.0% | 61.4% | 78.8% | $3.13 | 165 t/s | |
#26 62.7% | DeepSeek V3.2 Thinking | Open Source | 67.1% | 53.9% | 55.6% | 60.4% | 73.6% | 81.2% | 70.2% | $0.69 | 60 t/s | |
#35 58.8% | MiniMax M2 | Open Source | 65.8% * | 60.2% * | — | 42.2% * | — | — | 55.1% * | $0.75 | 100 t/s | |
#36 58.5% | Kimi K2 | Open Source | 65.6% * | 46.7% * | — | 56.3% * | 76.1% * | — | 46.5% * | $1.55 | 85 t/s | |
#34 59.6% | DeepSeek V3.2 | Open Source | 65.5% | 48.8% | 55.8% | 54.8% | 66.2% | 81.7% | 68.8% | $0.69 | 120 t/s | |
#33 59.6% | Qwen3 235B | Open Source | 65.1% * | 52.9% * | 55.3% * | 59.4% * | 73.2% * | 50.3% * | 73.1% * | Free | 75 t/s | |
#31 60.2% | OpenAI o3-mini | Proprietary | 61.3% * | 53.3% | 58.9% * | — | 79.4% | — | 52.5% * | $2.75 | 115 t/s | |
#38 56.5% | Longcat Flash Chat | Open Source | 60.0% * | 36.5% | 67.1% * | 57.6% * | 81.1% * | — | 42.7% * | $0.45 | 100 t/s | |
#32 59.8% | DeepSeek R1 | Open Source | 59.8% | 49.1% | 55.1% * | 60.6% | 77.4% | 79.3% | 68.1% | $1.37 | 85 t/s | |
#22 65.7% | Claude Sonnet 4.6 Thinking | Proprietary | 57.5% * | 72.5% * | — | — | 60.2% | 69.4% * | 88.3% * | $9.00 | 45 t/s | |
#42 48.8% | Mistral Large 3 | Open Source | 56.6% * | 24.9% | — | 57.0% * | 75.3% * | — | 62.6% * | $1.00 | 90 t/s | |
#41 52.0% | Qwen3 32B | Open Source | 53.8% * | 46.8% * | 48.0% * | 44.9% * | 68.4% * | 62.4% | 53.4% * | Free | 145 t/s | |
#39 53.5% | Gemini 2.5 Flash | Proprietary | 53.8% * | 44.5% * | 51.1% * | 59.8% * | 71.0% * | 49.2% * | 65.2% * | $0.38 | 372 t/s | |
#43 45.6% | Llama 4 Maverick | Open Source | 53.1% * | 38.3% * | 49.7% * | 41.7% * | 46.4% * | 36.5% * | 54.9% * | Free | 155 t/s | |
#37 58.5% | Claude Sonnet 4.6 | Proprietary | 51.9% * | 62.5% * | — * | — | 54.2% * | 62.7% * | 83.7% * | $9.00 | 77 t/s | |
#— — | MiniMax M2.5 | Open Source | 51.4% | 51.1% * | — | — | — | — | — | $0.75 | 39.3 t/s | |
#40 53.2% | GPT-4.5 | Proprietary | 48.7% * | 42.7% * | 51.7% * | 67.2% * | 68.5% * | 55.4% * | 75.0% * | $7.50 | 85 t/s | |
#45 38.3% | Llama 4 Scout | Open Source | 42.5% * | 32.6% * | 42.0% * | 38.6% * | 38.5% * | 28.4% * | 50.3% * | Free | 2.6k t/s | |
#44 42.0% | GPT-4o | Proprietary | 40.7% * | 34.1% * | 48.5% * | 45.6% * | 44.9% * | 40.9% * | 56.5% * | $6.25 | 110 t/s | |
#— — | Grok 4.20 Thinking | Proprietary | 36.2% | — | — | — | — | — | — | $9.00 | 100 t/s | |
#— — | Grok 4.20 | Proprietary | 32.5% * | — * | — | — | — * | — | — * | $9.00 | 100 t/s | |
#— — | MiMo v2 Pro | Proprietary | — | — | — | — | — | — | — | $2.00 | 94.5 t/s | |
#— — | MiniMax M2.7 | Open Source | — | — | — | — | — | — | — | $0.75 | 53.6 t/s | |
#— — | Qwen 3.5 Plus | Proprietary | — | — | — | — | — | — | — | $1.36 | 85.3 t/s |
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Open
Proprietary
Proprietary
Proprietary
Proprietary
Proprietary
Open
Proprietary
Open
Proprietary
Open
Open
Proprietary
Proprietary
Proprietary
Open
Open
Open
Open
Open
Proprietary
Open
Open
Proprietary
Open
Open
Proprietary
Open
Proprietary
Open
Proprietary
Open
Proprietary
Proprietary
Proprietary
Proprietary
Open
Proprietary