Skip to content

First and Second Moment Quantization#1229

Closed
shadygm wants to merge 1 commit into
masterfrom
feat/quantization-8
Closed

First and Second Moment Quantization#1229
shadygm wants to merge 1 commit into
masterfrom
feat/quantization-8

Conversation

@shadygm
Copy link
Copy Markdown
Collaborator

@shadygm shadygm commented May 19, 2026

No description provided.

Copilot AI review requested due to automatic review settings May 19, 2026 20:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds quantized Adam first/second moment storage (uint8 + per-row scales) and updates the FastGS fused-Adam CUDA path to consume/update those quantized moments, reducing optimizer-state memory bandwidth/footprint while keeping the parameter updates on GPU.

Changes:

  • Introduces quantized-moment Adam kernels/APIs (row-wise uint8 moments + per-row scale factors), plus utilities to quantize existing float moments and to zero quantized rows.
  • Updates FastGS rasterization backward kernels to perform fused Adam updates via a row-wise dynamic update helper that supports quantized moments.
  • Extends optimizer state serialization format (version bump) to persist quantized moments + scales and supports loading legacy (v1) float moments by quantizing on load.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/training/rasterization/fastgs/rasterization/src/backward.cu Fixes invisible-Adam launch dimensions to operate per “row” (primitive) instead of per-element for multi-attribute params.
src/training/rasterization/fastgs/rasterization/include/kernels_backward.cuh Switches multiple per-component Adam updates to row-wise updates and refactors invisible-step kernel to update entire rows (supports quantized path).
src/training/rasterization/fastgs/rasterization/include/kernel_utils.cuh Adds quantize/dequant helpers and adam_step_row_dynamic supporting float and quantized moments; batches SH gradient updates into a row-wise call.
src/training/rasterization/fastgs/rasterization/include/fused_adam_types.h Extends FusedAdamParam with quantized moment buffers + per-row scales and a quantized flag.
src/training/rasterization/fastgs/optimizer/src/adam.cu Adds a quantized Adam entrypoint and kernel launch error reporting.
src/training/rasterization/fastgs/optimizer/src/adam_api.cu Adds raw CUDA API for quantized Adam, moment quantization, and zeroing quantized rows.
src/training/rasterization/fastgs/optimizer/include/adam.h Declares the quantized Adam API.
src/training/rasterization/fastgs/optimizer/include/adam_kernels.cuh Implements quantized Adam step, float→quantized moment conversion, and zeroing quantized rows kernels.
src/training/rasterization/fastgs/optimizer/include/adam_api.h Exposes raw APIs for quantized step, quantization, and zeroing quantized rows.
src/training/rasterization/fast_rasterizer.cpp Plumbs new fused-Adam quantized pointers/scales into rasterization backward.
src/training/optimizer/adam_optimizer.hpp Changes optimizer state moment tensors to quantized (UInt8) + adds scale tensors; extends fused param struct for quantized buffers.
src/training/optimizer/adam_optimizer.cpp Allocates/maintains quantized moment state, performs quantized step, adds checkpoint v2 format with backward-compat conversion from v1 float moments.
src/core/tensor/tensor_masking_ops.cpp Extends append_gather CUDA path to support UInt8/Bool tensors (needed by quantized optimizer state growth/gather paths).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +48
lfs::core::Tensor exp_avg; // Quantized first moment (m), uint8
lfs::core::Tensor exp_avg_sq; // Quantized second moment (v), uint8
lfs::core::Tensor exp_avg_scale;
lfs::core::Tensor exp_avg_sq_scale;
@shadygm
Copy link
Copy Markdown
Collaborator Author

shadygm commented May 27, 2026

Closing since the new swizzle SH layout saved a bit more vram, will look into it in the future.

@shadygm shadygm closed this May 27, 2026
@MrNeRF MrNeRF deleted the feat/quantization-8 branch May 31, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants