Skip to content

[MIRROR][Feature Request] Use rotating buffers for in benchmark functions #8

@yzh119

Description

@yzh119

Copied from flashinfer-ai#2187

flashinfer.testing.bench_gpu_time_with_cupti today supports flushing the L2 cache before each run, but bench_gpu_time_with_cuda_event does not.

This means that only in environments where cupti-python is installed we are able to microbenchmark performance with a cold L2, which is not widespread today (requires CUDA 13).

It would be helpful to implement a rotating buffer using input[round % N] and output[round % N] where N could be set according to L2 cache size to use a cold L2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions