Leaderboard
SciCode Leaderboard
Models | Main Problem Resolve Rate | Subproblem |
---|---|---|
🥇 OpenAI o1-preview | 7.7 |
28.5 |
🥈 Claude3.5-Sonnet | 4.6 |
26.0 |
🥉 Claude3.5-Sonnet (new) | 4.6 |
25.3 |
Deepseek-Coder-v2 | 3.1 |
21.2 |
GPT-4o | 1.5 |
25.0 |
GPT-4-Turbo | 1.5 |
22.9 |
OpenAI o1-mini | 1.5 |
22.2 |
Gemini 1.5 Pro | 1.5 |
21.9 |
Claude3-Opus | 1.5 |
21.5 |
Llama-3.1-405B-Chat | 1.5 |
19.8 |
Claude3-Sonnet | 1.5 |
17.0 |
Qwen2-72B-Instruct | 1.5 |
17.0 |
Llama-3.1-70B-Chat | 0.0 |
17.0 |
Mixtral-8x22B-Instruct | 0.0 |
16.3 |
Llama-3-70B-Chat | 0.0 |
14.6 |
Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.
How to submit
Want to submit your own model? Head over to the documentation.