Add Zipf distribution support for skewed index generation in ModelInput #3708
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Add Zipf distribution support for generating embedding indices in test utilities.
This change adds an optional
zipf_alphaparameter that, when set, usesnp.random.zipf()instead of uniform random sampling togenerate indices. This produces a skewed access pattern where some embedding rows are accessed much more frequently than others,
enabling benchmarking scenarios with hot/cold data characteristics.
With alpha ≈ 1.1-1.2, approximately 20% of embedding rows receive ~80% of accesses. This can be useful for testing:
Implementation details
ModelInput._generate_zipf_indices()static methodnumpyis lazy-imported to avoid breaking backward compatibility whennumpyis unavailablenumpyimport failsnumpy's random state independently of PyTorch's seedChanges
input_config.py: Addzipf_alphafield toInputConfigdataclassmodel_input.py: Add_generate_zipf_indices()helper; plumbzipf_alphathroughgenerate()and_create_features_lengths_indices()test_model.py: Add_generate_zipf_indices()helper; addzipf_alphasupport togenerate()for both regular and weighted tablestests/test_model_input.py: Add unit tests for Zipf distribution functionalitytests/BUCK:Add test target for test_model_inputDifferential Revision: D91909007