Skip to content

Commit f4e280d

Browse files
Deployed 8d9add1 with MkDocs version: 1.6.0
1 parent 9ad2c5d commit f4e280d

File tree

2 files changed

+11
-10
lines changed

2 files changed

+11
-10
lines changed

index.html

+10-9
Original file line numberDiff line numberDiff line change
@@ -593,13 +593,13 @@
593593
</li>
594594

595595
<li class="md-nav__item">
596-
<a href="#example-problem" class="md-nav__link">
596+
<a href="#example-problem-calculate-chern-numbers-for-the-haldane-model" class="md-nav__link">
597597
<span class="md-ellipsis">
598-
Example Problem
598+
Example Problem: Calculate Chern numbers for the Haldane Model
599599
</span>
600600
</a>
601601

602-
<nav class="md-nav" aria-label="Example Problem">
602+
<nav class="md-nav" aria-label="Example Problem: Calculate Chern numbers for the Haldane Model">
603603
<ul class="md-nav__list">
604604

605605
<li class="md-nav__item">
@@ -1016,13 +1016,13 @@
10161016
</li>
10171017

10181018
<li class="md-nav__item">
1019-
<a href="#example-problem" class="md-nav__link">
1019+
<a href="#example-problem-calculate-chern-numbers-for-the-haldane-model" class="md-nav__link">
10201020
<span class="md-ellipsis">
1021-
Example Problem
1021+
Example Problem: Calculate Chern numbers for the Haldane Model
10221022
</span>
10231023
</a>
10241024

1025-
<nav class="md-nav" aria-label="Example Problem">
1025+
<nav class="md-nav" aria-label="Example Problem: Calculate Chern numbers for the Haldane Model">
10261026
<ul class="md-nav__list">
10271027

10281028
<li class="md-nav__item">
@@ -1131,9 +1131,10 @@ <h1 id="scicode-a-research-coding-benchmark-curated-by-scientists">SciCode: A Re
11311131
</ul>
11321132
</div>
11331133
<h2 id="introduction">Introduction</h2>
1134-
<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of <strong>6</strong> domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains <strong>338</strong> subproblems decomposed from <strong>80</strong> challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only <strong>4.6%</strong> of the problems in the most realistic setting. </p>
1134+
<p>SciCode is a challenging benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of <strong>6</strong> domains: Physics, Math, Material Science, Biology, and Chemistry. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains <strong>338</strong> subproblems decomposed from <strong>80</strong> challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only <strong>4.6%</strong> of the problems in the most realistic setting. </p>
11351135
<h2 id="overview">Overview</h2>
1136-
<p><a class="glightbox" href="figures/SciCode_example_problem.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/SciCode_example_problem.png" /></a></p>
1136+
<p>SciCode sources challenging and realistic research-level coding problems across 6 natural science disciplines, covering a total of 16 subfields. This diverse selection ensures a comprehensive representation of the natural sciences, where extensive code development is essential. SciCode is mainly drawn from the scripts that scientists use in their everyday workflow. Many of these have been used in one or more publications, demonstrating their robustness and correctness. Among various coding necessities, Scicode mainly focuses on 1. Numerical methods 2.Simulation of systems 3. Scientific calculation. These are the tasks we believe require intense scientific knowledge and reasoning to optimally test LM’s science capability. In designing test cases for evaluation, we incorporate domain-specific test cases in addition to numerical cases. These tests are extracted from real scientific workflows: scientists must design domain-specific test cases to verify code accuracy by reproducing results published in papers or matching analytical solutions derived from theoretical models. Each problem goes through 3 rounds of validation (i.e. by in-domain scientists, out-of-domain scientists, GPT4) for quality control.
1137+
<a class="glightbox" href="figures/SciCode_example_problem.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/SciCode_example_problem.png" /></a></p>
11371138
<h2 id="benchmark-statistics">Benchmark Statistics</h2>
11381139
<table>
11391140
<thead>
@@ -1262,7 +1263,7 @@ <h3 id="molecular-modeling">Molecular Modeling</h3>
12621263
4.
12631264
5.
12641265
6.</p>
1265-
<h2 id="example-problem">Example Problem</h2>
1266+
<h2 id="example-problem-calculate-chern-numbers-for-the-haldane-model">Example Problem: Calculate Chern numbers for the Haldane Model</h2>
12661267
<h3 id="main-problem-and-dependencies">Main Problem and Dependencies</h3>
12671268
<p><strong>1. Generate an array of Chern numbers for the Haldane model on a hexagonal lattice by sweeping the following parameters: the on-site energy to next-nearest-neighbor coupling constant ratio (<span class="arithmatex">\(m/t_2\)</span> from -6 to 6 with <span class="arithmatex">\(N\)</span> samples) and the phase (<span class="arithmatex">\(\phi\)</span> from -<span class="arithmatex">\(\pi\)</span> to <span class="arithmatex">\(\pi\)</span> with <span class="arithmatex">\(N\)</span> samples) values. Given the lattice spacing <span class="arithmatex">\(a\)</span>, the nearest-neighbor coupling constant <span class="arithmatex">\(t_1\)</span>, the next-nearest-neighbor coupling constant <span class="arithmatex">\(t_2\)</span>, the grid size <span class="arithmatex">\(\delta\)</span> for discretizing the Brillouin zone in the <span class="arithmatex">\(k_x\)</span> and <span class="arithmatex">\(k_y\)</span> directions (assuming the grid sizes are the same in both directions), and the number of sweeping grid points <span class="arithmatex">\(N\)</span> for <span class="arithmatex">\(m/t_2\)</span> and <span class="arithmatex">\(\phi\)</span>.</strong></p>
12681269
<p><div class="highlight"><pre><span></span><code><span class="sd">&#39;&#39;&#39;</span>

0 commit comments

Comments
 (0)