Skip to content

[Tuner] Type inconsistencies between libtuner and rocm backend #2908

@RattataKing

Description

@RattataKing

Tuner will hit a runtime error with --benchmark-timing-method=rocprof:

File "amd-shark-ai/amdsharktuner/amdsharktuner/libtuner.py", line 1113, in get_valid_time_us
    if result.is_valid()
       ^^^^^^^^^^^^^^^
AttributeError: 'RocProfBenchmarkResult' object has no attribute 'is_valid'

This happens because rocm_common.RocProfBenchmarkResult does not inherit from libtuner.BenchmarkResult, and the tool-agnostic benchmark design fails at:

benchmark_tool_config = rocm_common.RocProfConfig(benchmark_fn=rocm_common.run_rocprof_command)
result = benchmark_tool_config.benchmark_fn(BenchmarkPack(...)) # returned RocProfBenchmarkResult

There are multiple ways to fix this bug:

  1. Add is_valid() to RocProfBenchmarkResult
  2. Move BenchmarkResult from libtuner to common, replace RocProfBenchmarkResult in rocm_common
  3. Add benchmark_tool_config.convert_benchmark_result

This bug bypassed mypy check due to the inconsistent and overly loose typing between tuner and its target backend, some other risky codes I found are:

run_rocprof_command(benchmark_pack: Any)
benchmark_fn: Callable

When needed, we can rehome some class in libtuner all together and leave in the same place, for example:
BenchmarkResult BenchmarkPack, CandidateTracker, CompilePack, not limited to common.py because they are not used by other files

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions