Leaderboard
**10.8**
| 33.3
|
| 🥈 OpenAI o3-mini-high | **9.2**
| 34.4
|
| 🥉 OpenAI o3-mini-medium | **9.2**
| 33.0
|
| OpenAI o1-preview | **7.7**
| 28.5
|
| Deepseek-R1 | **4.6**
| 28.5
|
| Claude3.5-Sonnet | **4.6**
| 26.0
|
| Claude3.5-Sonnet (new) | **4.6**
| 25.3
|
| Deepseek-v3 | **3.1**
| 23.7
|
| Deepseek-Coder-v2 | **3.1**
| 21.2
|
| GPT-4o | **1.5**
| 25.0
|
| GPT-4-Turbo | **1.5**
| 22.9
|
| OpenAI o1-mini | **1.5**
| 22.2
|
| Gemini 1.5 Pro | **1.5**
| 21.9
|
| Claude3-Opus | **1.5**
| 21.5
|
| Llama-3.1-405B-Chat | **1.5**
| 19.8
|
| Claude3-Sonnet | **1.5**
| 17.0
|
| Qwen2-72B-Instruct | **1.5**
| 17.0
|
| Llama-3.1-70B-Chat | **0.0**
| 17.0
|
| Mixtral-8x22B-Instruct | **0.0**
| 16.3
|
| Llama-3-70B-Chat | **0.0**
| 14.6
|
**Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.**
How to submit
Want to submit your own model? Submit a request via a Github issue.