Skip to content

Leaderboard

SciCode Leaderboard

Models Main Problem Resolve Rate Subproblem
🥇 OpenAI o1-preview
7.7
28.5
🥈 Claude3.5-Sonnet
4.6
26.0
🥉 Claude3.5-Sonnet (new)
4.6
25.3
Deepseek-Coder-v2
3.1
21.2
GPT-4o
1.5
25.0
GPT-4-Turbo
1.5
22.9
OpenAI o1-mini
1.5
22.2
Gemini 1.5 Pro
1.5
21.9
Claude3-Opus
1.5
21.5
Llama-3.1-405B-Chat
1.5
19.8
Claude3-Sonnet
1.5
17.0
Qwen2-72B-Instruct
1.5
17.0
Llama-3.1-70B-Chat
0.0
17.0
Mixtral-8x22B-Instruct
0.0
16.3
Llama-3-70B-Chat
0.0
14.6

Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.

How to submit

Want to submit your own model? Head over to the documentation.