docs(site): clarify perf points

g-husam · web-flow · commit c991bfb54e56 · 2026-02-02T11:21:24.000-05:00
diff --git a/docs/README.md b/docs/README.md
@@ -33,12 +33,12 @@ When comparing
 1. the hybrid of ML Flashpoint (every 5 steps) and NeMo checkpointing (every 50 steps), to
 1. NeMo's regular checkpointing (every 10 steps - so half as often)
 
-the hybrid approach resulted in:
+We observe:
 
-* Data write times that are up to 20-30x faster, with little to no optimization.
+* Data write times that are up to 20-30x faster for ML Flashpoint, with little to no optimization.
 This is expected to further improve with additional optimizations.
-* Total checkpoint recovery times that are ~7-10x faster (includes the time it takes to do checkpoint detection, cross-node coordination, replication, read into model state and be ready to resume training).
-* For _async_ checkpointing: improvements averaging **3-6%** for _overall job time_, and reaching **5-10%** when NeMo checkpointing is deferred to the end (300th step) instead of being done every 50 steps.
+* Total checkpoint recovery times that are ~7-10x faster for ML Flashpoint (includes the time it takes to do checkpoint detection, cross-node coordination, replication, read into model state and be ready to resume training).
+* For _async_ checkpointing: improvements averaging **3-6%** for _overall job time_ in the hybrid approach, and reaching **5-10%** when NeMo checkpointing is deferred to the end (300th step) instead of being done every 50 steps.
 These improvements only account for checkpoint _save_ efficiency, representing a "lower bound" value as it doesn't account for the speedups in _recovery_ time.
 Any job interruptions would also benefit from ML Flashpoint's recovery performance gains.