You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: faq/index.html
+5-8
Original file line number
Diff line number
Diff line change
@@ -594,15 +594,12 @@ <h1 id="faq">FAQ</h1>
594
594
The role of out-of-domain validation is not to ensure scientific correctness but rather to verify that the problem is presented in a way that is clear, complete, and accessible to someone outside the specific field. The primary focus of out-of-domain validators is to ensure that the combination of the question and its background information provides enough context for someone to solve the problem, even if they lack domain-specific expertise.</p>
595
595
</li>
596
596
<li>
597
-
<p>**It seems that in the prompt template definition, the prompts with and without backgrounds are assigned the other way around:</p>
597
+
<p><strong>It seems that in the prompt template definition, the prompts with and without backgrounds are assigned the other way around:
Are the numbers reported in the paper run with these prompts?</strong></p>
601
+
<p>Yes, DEFAULT_PROMPT_TEMPLATE is our standard setup where we ask the model to generate the related background itself. BACKGOUND_PROMPT_TEMPLATE is the template where we will put in the scientist-annotated background.</p>
<p>Are the numbers reported in the paper run with these prompts?**</p>
603
-
<pre><code>Yes, DEFAULT_PROMPT_TEMPLATE is our standard setup where we ask the model to generate the related background itself. BACKGOUND_PROMPT_TEMPLATE is the template where we will put in the scientist-annotated background.
604
-
</code></pre>
605
-
<ul>
606
603
<li>
607
604
<p><strong>For subproblems 13.6, 62.1, 76.2, it seems like the model-generated outputs are ignored and replaced with the files in the eval folder - is this how the evaluations were run in the paper? And why are these problems evaluated this way?</strong>
608
605
These three problem-code pairs are provided as given context in order to control uncertainty and reduce the degrees of freedom in the evaluation process. By doing so, we limit the model’s randomness in problem-solving. Without this context, the evaluation would allow for too many possible solutions, leading to inconsistent results.</p>
0 commit comments