Skip to content

Leaderboard

SciCode Leaderboard

Models Main Problem Resolve Rate Subproblem
🥇 OpenAI o3-mini-low
10.8
33.3
🥈 OpenAI o3-mini-high
9.2
34.4
🥉 OpenAI o3-mini-medium
9.2
33.0
OpenAI o1-preview
7.7
28.5
Deepseek-R1
4.6
28.5
Claude3.5-Sonnet
4.6
26.0
Claude3.5-Sonnet (new)
4.6
25.3
Deepseek-v3
3.1
23.7
Deepseek-Coder-v2
3.1
21.2
GPT-4o
1.5
25.0
GPT-4-Turbo
1.5
22.9
OpenAI o1-mini
1.5
22.2
Gemini 1.5 Pro
1.5
21.9
Claude3-Opus
1.5
21.5
Llama-3.1-405B-Chat
1.5
19.8
Claude3-Sonnet
1.5
17.0
Qwen2-72B-Instruct
1.5
17.0
Llama-3.1-70B-Chat
0.0
17.0
Mixtral-8x22B-Instruct
0.0
16.3
Llama-3-70B-Chat
0.0
14.6

Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.

How to submit

Want to submit your own model? Submit a request via a Github issue.