[MIRROR][Feature Request] Use rotating buffers for in benchmark functions

Copied from https://github.com/flashinfer-ai/flashinfer/issues/2187

[flashinfer.testing.bench_gpu_time_with_cupti](https://github.com/flashinfer-ai/flashinfer/blob/main/flashinfer/testing/utils.py#L646C5-L646C30) today supports flushing the L2 cache before each run, but [bench_gpu_time_with_cuda_event](https://github.com/flashinfer-ai/flashinfer/blob/main/flashinfer/testing/utils.py#L551) does not.

This means that only in environments where cupti-python is installed we are able to microbenchmark performance with a cold L2, which is not widespread today (requires CUDA 13).

It would be helpful to implement a rotating buffer using input[round % N] and output[round % N] where N could be set according to L2 cache size to use a cold L2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIRROR][Feature Request] Use rotating buffers for in benchmark functions #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[MIRROR][Feature Request] Use rotating buffers for in benchmark functions #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions