Skip to content

Quantization using distribution of embeddings on pre-training dataset#477

Open
favyen2 wants to merge 5 commits intomainfrom
favyen/20260129-quantization
Open

Quantization using distribution of embeddings on pre-training dataset#477
favyen2 wants to merge 5 commits intomainfrom
favyen/20260129-quantization

Conversation

@favyen2
Copy link
Copy Markdown
Collaborator

@favyen2 favyen2 commented Feb 2, 2026

Try quantizing to 8/4/2/1-bit using distribution of embeddings on pre-training dataset.

@github-actions github-actions Bot added the size/l label Feb 2, 2026
return config


def quantize_embeddings_percentile(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this percentile-based bucketing approach behave substantially different than the statistics-naive quantization scheme in https://github.com/allenai/olmoearth_run/blob/006496243c8f00ada3b74a77874e87a93bfa661e/src/olmoearth_run/runner/tools/postprocessors/combine_geotiff.py#L48?

I magine it's probably pretty important with the very low-bit quantizations, but do you have a sense at int8?

Copy link
Copy Markdown
Collaborator Author

@favyen2 favyen2 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK neither approach showed any drop in performance at int8, here is Mike's results:
https://github.com/allenai/olmoearth_pretrain/blob/main/scripts/archived/2026-01-024_embedding_analysis/quant_comparison_rounded.csv

In the platform I think you should just go with the simple fixed quantization.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I computed the per-band distribution here, but I found that all of the bands follow almost the same distribution

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal distribution centered over 0.0? That would be convenient for int1 😅

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

quantiles_pdf

Definitely centered at 0 so I do think for 1-bit you can just do < 0 vs >= 0. I didn't think it was normal but I guess it does look roughly normal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it doesn't need to be a normal distribution when there are only two quantiles.

Copy link
Copy Markdown
Collaborator

@yawenzzzz yawenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just a small question about the int8 vs. unit8

Comment on lines +68 to +69
- "quantiles": torch.Tensor of shape (dim, num_buckets+1)
- "midpoints": torch.Tensor of shape (dim, num_buckets)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add a note say that quantiles are for quantization and midpoints are for dequantization


# Flatten to (N_total, dim)
# Convert to uint8 first to handle int8 wrap-around (128-255 stored as -128 to -1)
flat = quantized.reshape(-1, dim).to(torch.uint8).long()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why don't we just quantize to unit8, with this, there's no need to convert to uint8 in the dequantization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants