1
+ description: Grade School Math -- for every problem we generate a plan, then exectute and evaluate it.
2
+ defs:
3
+ problems:
4
+ read: ./test.jsonl
5
+ parser: jsonl
6
+
7
+ MAX_ITERATIONS: 50
8
+
9
+ planning:
10
+ function:
11
+ problem: str
12
+ return:
13
+ text:
14
+ - >
15
+ Please generate a high-level plan for solving the following question.
16
+ As the first step, just say what method and idea you will use to solve the question.
17
+ You can reorganize the information in the question. Do not do the actual calculation.
18
+ Keep your response concise and within 80 words.
19
+ Question:
20
+ - ${ problem }
21
+ - "\nThe plan is:\n"
22
+ - model: ollama/granite3.2:8b
23
+
24
+ solve:
25
+ function:
26
+ plan: str
27
+ return:
28
+ text:
29
+ - ${ plan }
30
+ - >
31
+ The plan looks good! Now, use real numbers and do the calculation. Please solve the question
32
+ step-by-step according to the high-level plan. Give me the final answer. Make your response short.
33
+ - "\nThe answer is:\n"
34
+ - model: ollama/granite3.2:8b
35
+
36
+ extract_final_answer:
37
+ function:
38
+ solution: str
39
+ return:
40
+ lastOf:
41
+ - ${ solution }
42
+ - Extract the result from the above solution into a JSON object with field "result" and a float as value. Remove any dollar signs or other symbols.
43
+ - model: ollama/granite3.2:8b
44
+ parser: json
45
+ def: result
46
+ spec: { "result": float }
47
+ fallback:
48
+ data:
49
+ result: 0
50
+
51
+ compare_to_ground_truth:
52
+ function:
53
+ result: obj
54
+ truth: str
55
+ return:
56
+ lastOf:
57
+ - data: ${ truth }
58
+ parser:
59
+ regex: "(.|\n)*#### (?P<answer>([0-9])*)\n*"
60
+ spec:
61
+ answer: str
62
+ def: ground_truth
63
+ - if: ${ result.result|float == ground_truth.answer|float}
64
+ then:
65
+ 1
66
+ else:
67
+ 0
68
+
69
+ text:
70
+ - for:
71
+ problem: ${ problems }
72
+ repeat:
73
+ call: ${ planning }
74
+ args:
75
+ pdl_context: []
76
+ problem: ${ problem.question }
77
+ max_iterations: ${ MAX_ITERATIONS }
78
+ def: plans
79
+ join:
80
+ as: array
81
+
82
+ - for:
83
+ plan: ${ plans }
84
+ repeat:
85
+ call: ${ solve }
86
+ args:
87
+ pdl_context: []
88
+ plan: ${ plan }
89
+ max_iterations: ${ MAX_ITERATIONS }
90
+ def: solutions
91
+ join:
92
+ as: array
93
+
94
+ - for:
95
+ solution: ${ solutions }
96
+ repeat:
97
+ call: ${ extract_final_answer }
98
+ args:
99
+ pdl_context: []
100
+ solution: ${ solution }
101
+ max_iterations: ${ MAX_ITERATIONS }
102
+ def: results
103
+ join:
104
+ as: array
105
+
106
+ - for:
107
+ result: ${ results }
108
+ problem: ${ problems[:MAX_ITERATIONS] }
109
+ repeat:
110
+ call: ${ compare_to_ground_truth }
111
+ args:
112
+ pdl_context: []
113
+ result: ${ result }
114
+ truth: ${ problem.answer }
115
+ max_iterations: ${ MAX_ITERATIONS }
116
+ def: stats
117
+ join:
118
+ as: array
119
+
120
+ - "\nAccuracy: ${ stats|sum / MAX_ITERATIONS * 100}% "
0 commit comments