Learning Log

This log is organized by learning progress instead of assumed calendar days.

Why this format:

one real-world day may include multiple chapters
a chapter may span multiple sessions
using fixed dates per chapter created inaccurate future-dated entries

Logging rule going forward:

record by Session or Chapter milestone
only use an explicit date when it is certain from the current session
prefer Last updated over inventing a new study date

Last updated: 2026-04-07

Session 01

Focus

Introduction

What I studied

what prompt engineering is and why it matters
the guide's experiment setup: gpt-3.5-turbo, temperature=1, top_p=1
why prompt results vary across models and settings

Key insights

prompt engineering is not just writing better prompts; it is the systematic design of inputs for LLM tasks
prompt quality should be evaluated with an experimental mindset because outputs change with model and sampling settings
temperature controls randomness; top_p controls how much of the probability mass is considered during token sampling

Confusions / open questions

why this guide uses temperature=1 and top_p=1 as defaults
when to change temperature versus when to change top_p

Session 02

Focus

Introduction review
LLM Settings warm-up

What I studied

reviewed Introduction through self-test and explanation in my own words
clarified the difference between temperature and top_p
confirmed when to lower temperature for stable structured outputs
started the LLM Settings chapter warm-up

Key insights

prompt engineering is about designing inputs for real tasks, not only polishing wording
lower temperature is better for predictable outputs such as JSON
top_p limits token choice to a probability-mass subset, while temperature reshapes randomness within sampling
in production, changing multiple decoding knobs at once makes behavior harder to predict and debug

Confusions / open questions

build stronger intuition for top_p through examples instead of definitions alone
learn how model choice, max tokens, and stop sequences affect output quality

Session 03

Focus

Prompt Elements

What I studied

broke prompts into instruction, context, input data, and output indicator
reviewed a sentiment classification prompt and identified which element each line belongs to
passed two rounds of self-test on prompt element identification and failure diagnosis

Key insights

not every prompt needs all four elements
the role of each element is different: task definition, steering information, task payload, and output constraint
output indicators are small but powerful because they shape answer format and task completion
a translation prompt can work with only instruction, input data, and output indicator when the task is simple and unambiguous
audience framing like for an engineering manager usually acts as context because it steers style and relevance rather than defining the base task

Confusions / open questions

when should context be included versus omitted for simpler tasks?
how much output formatting guidance is enough before it becomes over-specified?

Session 04

Focus

General Tips for Designing Prompts

What I studied

reviewed five design principles: start simple, write strong instructions, be specific, avoid imprecise wording, and prefer telling the model what to do
compared weak and improved prompt examples
passed one self-test on prompt design principles and one rewrite-focused review round

Key insights

prompt design is iterative, so a simple starting point is usually better than an overloaded first draft
specificity improves reliability, but only when the details are relevant to the task
direct instructions work better than vague style constraints
negative-only instructions often fail; positive directions are easier for the model to follow
large compound tasks should usually be split into smaller prompts so each step stays clear and testable
vague phrases like briefly, not too technical, and not too long should be replaced by concrete constraints

Confusions / open questions

how specific is too specific before the prompt becomes noisy?
when should a task be split into subtasks instead of adding more detail to one prompt?

Session 05

Focus

Zero-Shot Prompting

What I studied

learned that zero-shot prompting means asking the model to do a task directly without examples
reviewed the sentiment classification example as a zero-shot task
connected zero-shot capability to instruction tuning and instruction-following behavior
confirmed the distinction between zero-shot and few-shot through self-check and terminology review

Key insights

zero-shot works when the model already understands the task pattern from pretraining and instruction tuning
zero-shot prompts rely heavily on clear instructions because there are no demonstrations to steer formatting
if zero-shot performance is weak, the next step is usually few-shot prompting rather than adding random wording
adding output constraints like JSON format does not break zero-shot as long as no examples are provided

Confusions / open questions

how to build stronger intuition for the meaning and naming of zero-shot versus few-shot

Result

zero-shot pass
ready to move on to few-shot

Session 06

Focus

Few-Shot Prompting

What I studied

how few-shot prompting uses in-context demonstrations to guide model behavior
why format and label distribution in examples matter more than label correctness (Min et al. 2022)
the 1-shot word-usage example (whatpu / farduddle) showing in-context learning
why even randomly assigned labels still improve over no examples
the failure case: few-shot cannot reliably solve multi-step reasoning (odd numbers sum)
the connection to chain-of-thought prompting as the next technique

Key insights

few-shot works through in-context learning: no weight updates, just soft conditioning from examples
label correctness matters less than label space and input distribution
demonstrations primarily teach format and pattern, not facts
the failure mode of few-shot is exactly where chain-of-thought helps: intermediate reasoning steps

Confusions / open questions

how many examples is enough before adding more stops helping or hurts?
how to select representative examples when the input distribution is varied?

Result

few-shot pass on 2026-04-02
key insight: "模型学的是范式，不是答案" — model extracts pattern and format, not label correctness
understood why reasoning fails: missing intermediate steps, not missing problem description
ready to move on to Chain-of-Thought prompting

Session 07

Focus

Chain-of-Thought (CoT) Prompting

What I studied

how CoT adds intermediate reasoning steps to fix the failure mode of few-shot on reasoning tasks
three variants: Few-shot CoT, Zero-shot CoT ("Let's think step by step"), and Auto-CoT
why Zero-shot CoT works: explicit instruction + activation of reasoning patterns from pretraining
why CoT is an emergent ability limited to large models
how Auto-CoT uses question clustering + Zero-shot CoT to auto-generate diverse demonstrations

Key insights

CoT teaches the model how to solve problems, not just what format to answer in
"Let's think step by step" activates pretraining patterns, not just issuing an instruction
Zero-shot CoT vs Few-shot CoT is a trade-off between diversity/autonomy and control/convergence
Auto-CoT's clustering step is critical — diversity prevents error propagation across demonstrations

Confusions / open questions

how to evaluate whether a CoT reasoning chain is actually correct vs. plausible-sounding but wrong?

Result

CoT pass on 2026-04-02
ready to move on to Self-Consistency prompting

Session 08

Focus

Meta Prompting

What I studied

how Meta Prompting focuses on structure and syntax rather than specific content examples
the difference between content-driven (few-shot) and structure-driven (meta) approaches
why Meta Prompting is token-efficient: cognitive work of extracting structure is shifted to the prompt writer
why it can be seen as a zero-shot variant: no concrete content examples
its failure condition: relies on model's prior knowledge of the task domain

Key insights

Meta Prompting 是把"从示例归纳结构"的工作转移给了人，模型只接收结构指令
失效边界与 zero-shot 类似：模型缺乏先验知识时无法填充正确内容
数学推导选 CoT 而非 Meta Prompting：CoT 保证每一步计算正确，Meta 只约束形式

Result

Meta Prompting pass on 2026-04-04
ready to move on to Self-Consistency prompting

Session 09

Focus

Self-Consistency

What I studied

how Self-Consistency improves on CoT by sampling multiple reasoning paths and using majority voting
why single-path CoT decoding has no error correction mechanism
why majority voting works: error paths are diverse (divergent), correct paths converge to the same answer
why Self-Consistency is inapplicable to open-ended tasks: no objective correct answer to vote on
the engineering trade-off: token cost multiplies with number of samples

Key insights

"文无第一" — open-ended tasks have no ground truth, so voting has no meaning
Self-Consistency is not a default technique; the high cost means it's reserved for high-stakes reasoning tasks
被淘汰的推理路径消耗算力却对最终答案无贡献，工程上需要权衡
它更准确地说是对 CoT 采样/解码方式的增强，而不是把 CoT 本身定义成贪婪解码

Result

Self-Consistency pass on 2026-04-07

Session 10

Focus

Tree of Thoughts (ToT)

What I studied

how ToT generalizes CoT from a single chain into a search tree over intermediate thoughts
why ToT is different from Self-Consistency: path selection happens during reasoning, not only at the end
the four key components: thought decomposition, thought generation, state evaluation, and search algorithm
how BFS and DFS can be used to expand, prune, and backtrack over candidate reasoning states
why ToT is useful for planning/search-heavy tasks such as Game of 24, creative writing planning, and mini crosswords

Key insights

CoT is a chain; ToT is a tree
Self-Consistency compares complete chains after generation, while ToT evaluates branches during generation
ToT is best understood as a reasoning-plus-search framework, not just a prompt wording trick
the power of ToT comes from branching, evaluation, and backtracking, but that also creates its main cost
ToT should be reserved for tasks where planning or search materially matters

Result

Tree of Thoughts pass on 2026-04-15
ready to move on to RAG

FilesExpand file tree

LEARNING_LOG.md

Latest commit

History

LEARNING_LOG.md

File metadata and controls

Learning Log

Session 01

Focus

What I studied

Key insights

Confusions / open questions

Session 02

Focus

What I studied

Key insights

Confusions / open questions

Session 03

Focus

What I studied

Key insights

Confusions / open questions

Session 04

Focus

What I studied

Key insights

Confusions / open questions

Session 05

Focus

What I studied

Key insights

Confusions / open questions

Result

Session 06

Focus

What I studied

Key insights

Confusions / open questions

Result

Session 07

Focus

What I studied

Key insights

Confusions / open questions

Result

Session 08

Focus

What I studied

Key insights

Result

Session 09

Focus

What I studied

Key insights

Result

Session 10

Focus

What I studied

Key insights

Result