You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-01-10-paper-review-cot.md
+27-1Lines changed: 27 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,4 +20,30 @@ We have seen a range of benefits when increasing the size of the Language Model,
20
20
2. in-context few-shot learning via prompting where one can "prompt" a model with a few input-output exemplars demonstrating the task (has been successful with question-answering tasks)
21
21
22
22
What is few-shot learning?
23
-
A machine learning technique where a model is trained to learn and make predictions on a very small amount of labeled data.
23
+
A machine learning technique where a model is trained to learn and make predictions on a very small amount of labeled data.
24
+
25
+
Prompt consists of triples: <input, chain-of-thought, output> <br>
26
+
27
+
A chain-of-thought is a series of intermediate natural language reasoning steps that lead to the final output. A prompting only approach is
28
+
important because it does not require a large training dataset and a single model checkpoint can perform many tasks without loss of generality.
29
+
We can think of this prompting in similarity to how humans think when breaking down a problem with multiple layers: <br>
30
+
31
+
*After Jane gives 2 flowers to her mom she has 10...then after she gives 3 to her dad she will have 7...so the answer is 7.*
32
+
33
+
Properties that make this method attractive: <br>
34
+
35
+
* Allows models to decompose multi-step problems into intermediate steps, which means additional computation can be allocated to problems that require more reasoning steps
36
+
* Provides a window into how the model thinks, giving us an idea of how it arrived to a specific output from a specific input
37
+
* Can be used for tasks such as math word problems, commonsense reasoning, and symbolic manipulation
38
+
* Can be readily elicited into models simply by including examples with chain-of-thought prompting
39
+
40
+
Important results
41
+
42
+
The study finds that this prompting is an emergent ability, meaning it only becomes apparent at larger model scales (100 B parameters or more) and smaller models
43
+
do not benefit. The prompting has larger gains for more complex problems seen by the **GSM8K** dataset and performance being negative or not existent with **SingleOp** dataset. <br>
44
+
45
+
Why does CoT prompting work?
46
+
* Clarify complex problems: models reason through each step logically
47
+
* Improve accuracy: errors reduced in larger models
48
+
* Activate pretrained knowledge: sequential reasoning helps the model utilize its prior training
0 commit comments