Skip to content

Commit 8c74c76

Browse files
Deployed 69dac59 with MkDocs version: 1.6.0
1 parent 2aae336 commit 8c74c76

File tree

2 files changed

+26
-3
lines changed

2 files changed

+26
-3
lines changed

index.html

+25-2
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,16 @@
440440
</span>
441441
</a>
442442

443-
<nav class="md-nav" aria-label="Benchmark Statistics">
443+
</li>
444+
445+
<li class="md-nav__item">
446+
<a href="#experiment-results" class="md-nav__link">
447+
<span class="md-ellipsis">
448+
Experiment Results
449+
</span>
450+
</a>
451+
452+
<nav class="md-nav" aria-label="Experiment Results">
444453
<ul class="md-nav__list">
445454

446455
<li class="md-nav__item">
@@ -863,7 +872,16 @@
863872
</span>
864873
</a>
865874

866-
<nav class="md-nav" aria-label="Benchmark Statistics">
875+
</li>
876+
877+
<li class="md-nav__item">
878+
<a href="#experiment-results" class="md-nav__link">
879+
<span class="md-ellipsis">
880+
Experiment Results
881+
</span>
882+
</a>
883+
884+
<nav class="md-nav" aria-label="Experiment Results">
867885
<ul class="md-nav__list">
868886

869887
<li class="md-nav__item">
@@ -1170,6 +1188,11 @@ <h2 id="benchmark-statistics">Benchmark Statistics</h2>
11701188
</table>
11711189
<p><a class="glightbox" href="figures/SciCode_chart.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/SciCode_chart.png" /></a>
11721190
<strong>Left:</strong> Distribution of Main Problems <strong>Right:</strong> Distribution of Subproblems</p>
1191+
<h2 id="experiment-results">Experiment Results</h2>
1192+
<p>We evaluate our model using zero-shot prompts. We keep the prompts general and design different ones for different evaluation setups only to inform the model about the tasks. We keep prompts the same across models and fields, and they contain the model’s main and sub-problem instructions and code for previous subproblems. The standard setup means the model is tested without background knowledge and carrying over generated solutions to previous subproblems. The scientists' annotated background provides the necessary knowledge and reasoning steps to solve the problems, shifting the evaluation’s focus more towards the models’ coding and instruction-following capabilities.
1193+
<a class="glightbox" href="figures/Standard_Setup.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/Standard_Setup.png" /></a>
1194+
<a class="glightbox" href="figures/Standard_Background.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/Standard_Background.png" /></a>
1195+
<a class="glightbox" href="figures/Performance_Gain.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="Image Title" src="figures/Performance_Gain.png" /></a></p>
11731196
<h3 id="numerical-linear-algebra">Numerical Linear Algebra</h3>
11741197
<p>1_Conjugate_Gradient</p>
11751198
<p>3_Gauss_Seidel</p>

0 commit comments

Comments
 (0)