Skip to content

[Benckmark] Benchmark refactor example#1199

Open
lowdy1 wants to merge 4 commits intolinkedin:mainfrom
lowdy1:bmk_eg
Open

[Benckmark] Benchmark refactor example#1199
lowdy1 wants to merge 4 commits intolinkedin:mainfrom
lowdy1:bmk_eg

Conversation

@lowdy1
Copy link
Copy Markdown
Contributor

@lowdy1 lowdy1 commented Apr 24, 2026

Summary

The current benchmark scripts contain significant boilerplate when constructing common_configs for run_benchmarks. Although compute_model_config_sweep_config and compute_seq_len_sweep_config provide the core sweep logic, each script still:

  • Reimplements probe logic
  • Manually builds extra_benchmark_config
  • Duplicates common_configs assembly
  • Defines redundant helpers like _resolve_* and *_model_config variants

This PR removes that duplication by introducing higher-level builders that standardize how benchmarks are defined.


Proposal

1. Introduce higher-level sweep builders

Add two unified helper functions in benchmark_model_configs.py:

build_model_config_sweep(...)
build_token_length_sweep(...)

These functions:

  • Wrap existing sweep utilities (compute_*_sweep_config)

  • Internally handle memory probing via setup_fn + forward_fn

  • Automatically construct extra_benchmark_config from:

    • model_keys (dynamic model attributes)
    • extra_configs (static overrides)
  • Return a fully-formed common_configs dict

So benchmark scripts reduce to:

common_configs = build_*(...)
run_benchmarks(**common_configs)

2. Standardize kernel definition via setup_fn

Instead of manually writing probe_fn, all kernels now define:

setup_fn: SingleBenchmarkRunInput -> Tuple[Any, ...]
forward_fn: Tuple[Any, ...] -> torch.Tensor  (optional)

The builders handle:

setup_out = setup_fn(input)
output = forward_fn(*setup_out)

A default is provided:

forward_fn = lambda x, layer: layer(x)

This removes duplicated forward/probe logic across scripts.


3. Eliminate redundant helpers

The following patterns are removed across benchmark scripts:

  • probe_fn definitions
  • extra_config_fn
  • _resolve_model_config_*
  • bench_*_model_config variants

All are now handled centrally by the builders.


New APIs

build_model_config_sweep

  • Sweeps across model configurations (x-axis = model name)
  • Keeps total tokens (B * T) approximately constant
  • Uses setup_fn + forward_fn to estimate memory per model
build_model_config_sweep(
    kernel_name,
    all_model_configs,
    setup_fn,
    model_keys,
    forward_fn=...,
    probe_provider="torch",
    extra_configs=None,
    bt=2048,
    overwrite=False,
)

build_token_length_sweep

  • Sweeps across sequence length (x-axis = T)
  • Automatically adjusts batch size based on memory estimation
  • Uses the same setup_fn + forward_fn abstraction
build_token_length_sweep(
    kernel_name,
    probe_seq_len,
    model,
    setup_fn,
    model_keys,
    extra_configs=None,
    forward_fn=...,
    probe_provider="torch",
    x_values_fn=...,
    overwrite=False,
)

Example (after refactor)

common_configs = build_token_length_sweep(
    kernel_name="layer_norm",
    probe_seq_len=1024,
    model=model,
    setup_fn=_setup_layer_norm,
    model_keys=["hidden_size", "dtype"],
    extra_configs={"eps": 1e-6},
    probe_provider="huggingface",
)

common_configs["kernel_providers"] = ["liger", "huggingface"]

run_benchmarks(..., **common_configs)

Benchmark Command Examples

python ./benchmark/scripts/benchmark_swiglu.py --sweep-mode model_config [--model llama_3_8b]
python ./benchmark/scripts/benchmark_swiglu.py [--sweep-mode token_length] [--bt 2048]

Notes

  • model_config: sweeps across different model configurations (fixed total tokens)
  • token_length: sweeps across sequence lengths / batch sizes (fixed model) and this is the default and can be omitted

  • Hardware Type: A100-80G-PCIe
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Comment thread benchmark/scripts/benchmark_swiglu.py Outdated
Comment on lines +130 to +132
def x_values_fn(config):
return [2**i for i in range(10, int(math.log2(config.seq_len)) + 1)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put it in build_token_length_sweep? I feel we can set this function as default x range.

Comment thread benchmark/scripts/benchmark_swiglu.py Outdated
Comment on lines +121 to +128
def extra_config_fn(config):
return {
"bsz": config.batch_size,
"hidden_size": model.hidden_size,
"intermediate_size": model.intermediate_size,
"hidden_act": "silu",
"dtype": model.dtype,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we pass a key list to build_token_length_sweep and let it query those keys from model configs?

Comment thread benchmark/scripts/benchmark_swiglu.py Outdated
Comment on lines 106 to 119
@@ -171,40 +118,41 @@ def _probe():
x, layer = _setup_swiglu(probe_input)
return layer(x)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea, add arguments probe_length/provider, and put probe_fn in build_token_length_sweep

Comment thread benchmark/scripts/benchmark_swiglu.py Outdated
"model_configs": model_configs_info,
"bsz": sweep.batch_size,
"seq_len": sweep.seq_len,
def probe_fn(model_cfg, probe_seq_len):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Could it merge into build_model_config_sweep?

@lowdy1 lowdy1 force-pushed the bmk_eg branch 3 times, most recently from 5648ac6 to f7e3e18 Compare April 25, 2026 10:00
@lowdy1 lowdy1 force-pushed the bmk_eg branch 7 times, most recently from ef7d096 to 83c76fc Compare April 28, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants