Skip to content

Leaderboard

# SciCode Leaderboard | Models | Main Problem Resolve Rate | Subproblem | |--------------------------|-------------------------------------|-------------------------------------| | 🥇 OpenAI o3-mini-low |
**10.8**
|
33.3
| | 🥈 OpenAI o3-mini-high |
**9.2**
|
34.4
| | 🥉 OpenAI o3-mini-medium |
**9.2**
|
33.0
| | OpenAI o1-preview |
**7.7**
|
28.5
| | Deepseek-R1 |
**4.6**
|
28.5
| | Claude3.5-Sonnet |
**4.6**
|
26.0
| | Claude3.5-Sonnet (new) |
**4.6**
|
25.3
| | Deepseek-v3 |
**3.1**
|
23.7
| | Deepseek-Coder-v2 |
**3.1**
|
21.2
| | GPT-4o |
**1.5**
|
25.0
| | GPT-4-Turbo |
**1.5**
|
22.9
| | OpenAI o1-mini |
**1.5**
|
22.2
| | Gemini 1.5 Pro |
**1.5**
|
21.9
| | Claude3-Opus |
**1.5**
|
21.5
| | Llama-3.1-405B-Chat |
**1.5**
|
19.8
| | Claude3-Sonnet |
**1.5**
|
17.0
| | Qwen2-72B-Instruct |
**1.5**
|
17.0
| | Llama-3.1-70B-Chat |
**0.0**
|
17.0
| | Mixtral-8x22B-Instruct |
**0.0**
|
16.3
| | Llama-3-70B-Chat |
**0.0**
|
14.6
| **Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.**

How to submit

Want to submit your own model? Submit a request via a Github issue.