Skip to content

expose max-autotune in configs for better perf #2373

Open
@felipemello1

Description

TLDR: when compiling, max-autotune will improve perf, but will take longer to compile. If the user has a long training job to do, it is definetely worth it.

TODO:

  1. Define the flag in just one config
  2. Find all places where we compile (prob: flex attention utils, compile model, compile loss)
  3. Run a test without max-autotune and log with metric_logger=torchtune.training.metric_logging.WandBLogger
  4. Run a test with max-autotune
  5. Share results in a PR
  6. If accepted, implement it for every config/recipe (ideally this should be implemented in the utility level, not recipe level)
tune run full_finetune_single_device --config llama3_2/1B_full_single_device dataset.packed=True tokenizer.max_seq_len=4096 dataset.split=train[:5%] metric_logger=torchtune.training.metric_logging.WandBLogger 
Image

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    community help wantedWe would love the community's help completing this issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions