测试与验收清单

1. 使用原则

不要每改一点就跑全量。
先跑最小相关测试，绿了再扩大范围。
每个阶段只看当前阶段的失败。

2. 分模块测试命令

A. 数学与工具函数

覆盖函数：
- run_silu
- run_softmax
- run_cross_entropy
- run_gradient_clipping
命令：

uv run pytest tests/test_nn_utils.py -q
uv run pytest tests/test_model.py::test_silu_matches_pytorch -q

通过标准：
- 与 PyTorch 结果数值一致（容差内）。

B. 数据采样

覆盖函数：
- run_get_batch
命令：

uv run pytest tests/test_data.py -q

通过标准：
- x,y 形状正确
- y == x 右移一位
- 采样索引范围与分布合理
- 非法 device 会报错

C. 模型组件与整体

覆盖函数：
- run_linear
- run_embedding
- run_rmsnorm
- run_swiglu
- run_scaled_dot_product_attention
- run_rope
- run_multihead_self_attention
- run_multihead_self_attention_with_rope
- run_transformer_block
- run_transformer_lm
命令：

uv run pytest tests/test_model.py -q

通过标准：
- 与快照匹配（指定 atol/rtol 内）。

D. 优化器与学习率

覆盖函数：
- get_adamw_cls
- run_get_lr_cosine_schedule
命令：

uv run pytest tests/test_optimizer.py -q

通过标准：
- AdamW 训练后参数与参考实现一致或足够接近
- 学习率 schedule 序列逐项匹配预期值

E. 序列化

覆盖函数：
- run_save_checkpoint
- run_load_checkpoint
命令：

uv run pytest tests/test_serialization.py -q

通过标准：
- 模型参数完全恢复
- optimizer 状态完全恢复
- iteration 恢复正确

F. Tokenizer 与 BPE 训练

覆盖函数：
- get_tokenizer
- run_train_bpe
命令：

uv run pytest tests/test_tokenizer.py -q
uv run pytest tests/test_train_bpe.py -q

通过标准：
- encode/decode 与 tiktoken 对齐（测试场景内）
- special token 处理正确（含重叠 special token）
- train_bpe 输出 merges/vocab 正确
- 速度测试通过（test_train_bpe_speed）

3. 全量验收

命令：

uv run pytest -q

目标：
- 所有非 skip 用例通过
- macOS 常见结果为 46 passed, 2 skipped

4. 失败排查顺序

先看第一个失败，不要同时修十个。
打印关键 shape（特别是 attention/mha/rope）。
对照同模块最小测试，先把单测打绿。
再跑该文件全量，最后再跑全量。

5. 提交前最终清单

tests/adapters.py 无 NotImplementedError
uv run pytest -q 达到目标通过数
关键笔记补全：
- note/03-model-components/
- note/05-optimization/
- note/06-tokenizer/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

测试与验收清单

1. 使用原则

2. 分模块测试命令

A. 数学与工具函数

B. 数据采样

C. 模型组件与整体

D. 优化器与学习率

E. 序列化

F. Tokenizer 与 BPE 训练

3. 全量验收

4. 失败排查顺序

5. 提交前最终清单

FilesExpand file tree

测试与验收清单.md

Latest commit

History

测试与验收清单.md

File metadata and controls

测试与验收清单

1. 使用原则

2. 分模块测试命令

A. 数学与工具函数

B. 数据采样

C. 模型组件与整体

D. 优化器与学习率

E. 序列化

F. Tokenizer 与 BPE 训练

3. 全量验收

4. 失败排查顺序

5. 提交前最终清单