First and Second Moment Quantization#1229
Closed
shadygm wants to merge 1 commit into
Closed
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds quantized Adam first/second moment storage (uint8 + per-row scales) and updates the FastGS fused-Adam CUDA path to consume/update those quantized moments, reducing optimizer-state memory bandwidth/footprint while keeping the parameter updates on GPU.
Changes:
- Introduces quantized-moment Adam kernels/APIs (row-wise uint8 moments + per-row scale factors), plus utilities to quantize existing float moments and to zero quantized rows.
- Updates FastGS rasterization backward kernels to perform fused Adam updates via a row-wise dynamic update helper that supports quantized moments.
- Extends optimizer state serialization format (version bump) to persist quantized moments + scales and supports loading legacy (v1) float moments by quantizing on load.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/training/rasterization/fastgs/rasterization/src/backward.cu | Fixes invisible-Adam launch dimensions to operate per “row” (primitive) instead of per-element for multi-attribute params. |
| src/training/rasterization/fastgs/rasterization/include/kernels_backward.cuh | Switches multiple per-component Adam updates to row-wise updates and refactors invisible-step kernel to update entire rows (supports quantized path). |
| src/training/rasterization/fastgs/rasterization/include/kernel_utils.cuh | Adds quantize/dequant helpers and adam_step_row_dynamic supporting float and quantized moments; batches SH gradient updates into a row-wise call. |
| src/training/rasterization/fastgs/rasterization/include/fused_adam_types.h | Extends FusedAdamParam with quantized moment buffers + per-row scales and a quantized flag. |
| src/training/rasterization/fastgs/optimizer/src/adam.cu | Adds a quantized Adam entrypoint and kernel launch error reporting. |
| src/training/rasterization/fastgs/optimizer/src/adam_api.cu | Adds raw CUDA API for quantized Adam, moment quantization, and zeroing quantized rows. |
| src/training/rasterization/fastgs/optimizer/include/adam.h | Declares the quantized Adam API. |
| src/training/rasterization/fastgs/optimizer/include/adam_kernels.cuh | Implements quantized Adam step, float→quantized moment conversion, and zeroing quantized rows kernels. |
| src/training/rasterization/fastgs/optimizer/include/adam_api.h | Exposes raw APIs for quantized step, quantization, and zeroing quantized rows. |
| src/training/rasterization/fast_rasterizer.cpp | Plumbs new fused-Adam quantized pointers/scales into rasterization backward. |
| src/training/optimizer/adam_optimizer.hpp | Changes optimizer state moment tensors to quantized (UInt8) + adds scale tensors; extends fused param struct for quantized buffers. |
| src/training/optimizer/adam_optimizer.cpp | Allocates/maintains quantized moment state, performs quantized step, adds checkpoint v2 format with backward-compat conversion from v1 float moments. |
| src/core/tensor/tensor_masking_ops.cpp | Extends append_gather CUDA path to support UInt8/Bool tensors (needed by quantized optimizer state growth/gather paths). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+45
to
+48
| lfs::core::Tensor exp_avg; // Quantized first moment (m), uint8 | ||
| lfs::core::Tensor exp_avg_sq; // Quantized second moment (v), uint8 | ||
| lfs::core::Tensor exp_avg_scale; | ||
| lfs::core::Tensor exp_avg_sq_scale; |
Collaborator
Author
|
Closing since the new swizzle SH layout saved a bit more vram, will look into it in the future. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.