Open
Description
Command: tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device
Output:
INFO:torchtune.utils._logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:
batch_size: 2
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
checkpoint_files:
- model-00001-of-00004.safetensors
- model-00002-of-00004.safetensors
- model-00003-of-00004.safetensors
- model-00004-of-00004.safetensors
model_type: LLAMA3
output_dir: /tmp/Meta-Llama-3.1-8B-Instruct/
recipe_checkpoint: null
compile: false
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
device: cuda
dtype: bf16
enable_activation_checkpointing: true
epochs: 1
gradient_accumulation_steps: 64
log_every_n_steps: 1
log_peak_memory_stats: false
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
_component_: torchtune.modules.get_cosine_schedule_with_warmup
num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: /tmp/lora_finetune_output
model:
_component_: torchtune.models.llama3_1.lora_llama3_1_8b
apply_lora_to_mlp: false
apply_lora_to_output: false
lora_alpha: 16
lora_attn_modules:
- q_proj
- v_proj
lora_dropout: 0.0
lora_rank: 8
optimizer:
_component_: torch.optim.AdamW
lr: 0.0003
weight_decay: 0.01
output_dir: /tmp/lora_finetune_output
profiler:
_component_: torchtune.training.setup_torch_profiler
active_steps: 2
cpu: true
cuda: true
enabled: false
num_cycles: 1
output_dir: /tmp/lora_finetune_output/profiling_outputs
profile_memory: false
record_shapes: true
wait_steps: 5
warmup_steps: 5
with_flops: false
with_stack: false
resume_from_checkpoint: false
save_adapter_weights_only: false
seed: null
shuffle: true
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
max_seq_len: null
path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model
DEBUG:torchtune.utils._logging:Setting manual seed to local seed 3188944798. Local seed is seed + rank = 3188944798 + 0
Writing logs to /tmp/lora_finetune_output/log_1727379753.txt
Traceback (most recent call last):
_File "/home/kailash/miniconda3/envs/llm/bin/tune", line 8, in <module>
sys.exit(main())
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/_cli/tune.py", line 49, in main
parser.run(args)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/_cli/tune.py", line 43, in run
args.func(args)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/_cli/run.py", line 185, in _run_cmd
self._run_single_device(args)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/_cli/run.py", line 94, in _run_single_device
runpy.run_path(str(args.recipe), run_name="__main__")
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/runpy.py", line 288, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/recipes/lora_finetune_single_device.py", line 739, in <module>
sys.exit(recipe_main())
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/config/_parse.py", line 99, in wrapper
sys.exit(recipe_main(conf))
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/recipes/lora_finetune_single_device.py", line 733, in recipe_main
recipe.setup(cfg=cfg)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/recipes/lora_finetune_single_device.py", line 215, in setup
checkpoint_dict = self.load_checkpoint(cfg_checkpointer=cfg.checkpointer)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/recipes/lora_finetune_single_device.py", line 148, in load_checkpoint
self._checkpointer = config.instantiate(
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/config/_instantiate.py", line 106, in instantiate
return _instantiate_node(OmegaConf.to_object(config), *args)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/config/_instantiate.py", line 31, in _instantiate_node
return _create_component(_component_, args, kwargs)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/config/_instantiate.py", line 20, in _create_component
return _component_(*args, **kwargs)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/training/checkpointing/_checkpointer.py", line 348, in __init__
self._checkpoint_paths = self._validate_hf_checkpoint_files(checkpoint_files)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/training/checkpointing/_checkpointer.py", line 389, in _validate_hf_checkpoint_files
checkpoint_path = get_path(self._checkpoint_dir, f)
File "/home/kailash/miniconda3/envs/llm/lib/python3.9/site-packages/torchtune/training/checkpointing/_utils.py", line 95, in get_path
raise ValueError(f"No file with name: {filename} found in {input_dir}.")
ValueError: No file with name: model-00001-of-00004.safetensors found in /tmp/Meta-Llama-3.1-8B-Instruct._
Can any help me with this?
Activity