Skip to content

Commit a1cd2ec

Browse files
Deployed bec0f80 with MkDocs version: 1.6.0
1 parent 9e59bc3 commit a1cd2ec

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

index.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -1133,7 +1133,7 @@ <h1 id="scicode-a-research-coding-benchmark-curated-by-scientists">SciCode: A Re
11331133
<h2 id="introduction">Introduction</h2>
11341134
<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of <strong>6</strong> domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains <strong>338</strong> subproblems decomposed from <strong>80</strong> challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only <strong>4.6%</strong> of the problems in the most realistic setting. </p>
11351135
<h2 id="overview">Overview</h2>
1136-
<p><a class="glightbox" href="https://github.com/scicode-bench/website-draft/blob/main/docs/figures/SciCode_example_problem.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="alt text" src="https://github.com/scicode-bench/website-draft/blob/main/docs/figures/SciCode_example_problem.png" /></a></p>
1136+
<p><a class="glightbox" href="https://github.com/scicode-bench/website-draft/blob/main/docs/figures/SciCode_example_problem.png/600x400/" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" loading="lazy" src="https://github.com/scicode-bench/website-draft/blob/main/docs/figures/SciCode_example_problem.png/600x400/" /></a></p>
11371137
<h2 id="benchmark-statistics">Benchmark Statistics</h2>
11381138
<table>
11391139
<thead>

0 commit comments

Comments
 (0)