docs(site): fix perf section formatting; add code package links (#15)

g-husam · web-flow · commit 59ca9da61ef4 · 2026-01-26T21:24:38.000Z
diff --git a/docs/README.md b/docs/README.md
@@ -27,15 +27,16 @@ We performed some tests on a [Vertex AI Training Cluster](https://docs.cloud.goo
 These tests were conducted using ML Flashpoint _alongside_ NeMo's recommended checkpointing (as you would in production), where NeMo's default checkpointing used a 7-10 TB [Filestore](https://cloud.google.com/filestore) instance.
 
 Observations when comparing the hybrid of ML Flashpoint (every 5 steps) and NeMo checkpointing (every 50 steps) to just NeMo's regular checkpointing (every 10 steps):
+
 * Data write times that are up to 20-30x faster, with little to no optimization.
 This is expected to further improve with additional optimizations.
 * Total checkpoint recovery times that are ~7-10x faster (includes the time it takes to do checkpoint detection, cross-node coordination, replication, read into model state and be ready to resume training).
 * For _async_ checkpointing: improvements averaging **3-6%** for _overall job time_, with peaks of **5-10%** improvements.
 These improvements only account for checkpoint save efficiency, representing a "worst case" in the sense that checkpointing purely adds overhead and isn't actually used.
 Any job interruptions will also benefit from the improved checkpoint recovery times.
 
-While [ML runtime goodput](https://cloud.google.com/blog/products/ai-machine-learning/goodput-metric-as-measure-of-ml-productivity) is important, we focus on overall job time as an end-to-end metric, as it is most transparent and accounts for actual cost.
-Goodput can be misleading if improvements to unproductive time actually worsen productive time.
+While [ML runtime goodput](https://cloud.google.com/blog/products/ai-machine-learning/goodput-metric-as-measure-of-ml-productivity) is important, we focus on overall job time as an end-to-end metric, as it is simpler and allows for straightforward _total_ cost comparisons.
+Runtime goodput alone can be misleading if improvements to unproductive time actually worsen productive (active training) time, and the change in total evaluation period (job time) is not taken into account.
 
 ## Design Philosophy
 
diff --git a/docs/user-guide.md b/docs/user-guide.md
@@ -28,7 +28,7 @@ See the project's [README](http://cs/h/cloud-mlnet/ml-flashpoint/+/main:README.m
 
 ### NeMo 2.0 & Pytorch Lightning
 
-Code: See the `ml_flashpoint.adapter.nemo` package.
+Code: See the [`ml_flashpoint.adapter.nemo`](https://github.com/google/ml-flashpoint/tree/main/src/ml_flashpoint/adapter/nemo) package.
 
 !!! note
 
@@ -107,7 +107,7 @@ This reduces blocking time by avoiding duplicate work, at the cost of having a l
 
 ### Megatron-LM
 
-Code: See the `ml_flashpoint.adapter.megatron` package.
+Code: See the [`ml_flashpoint.adapter.megatron`](https://github.com/google/ml-flashpoint/tree/main/src/ml_flashpoint/adapter/megatron) package.
 
 The Megatron strategies depend on the PyTorch DCP implementations.
 Below are instructions for setting up ML Flashpoint checkpointing, which you should configure alongside regular checkpointing to long-term storage.
@@ -195,4 +195,4 @@ else:
 
 ### PyTorch DCP
 
-Code: See the `ml_flashpoint.adapter.pytorch` package.
+Code: See the [`ml_flashpoint.adapter.pytorch`](https://github.com/google/ml-flashpoint/tree/main/src/ml_flashpoint/adapter/pytorch) package.