-
Notifications
You must be signed in to change notification settings - Fork 298
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Python Version
Appreciate any help solving the issue...
(I've seeing in other threads people blaming this type of crashes on the CPU memory, but a g4dn.12xlarge has 192Gb RAM. So unless there's a hard threshold in the code, this should be plenty for my 1000/200 training dataset.)
Here is the error trace:
$:~/mistral-finetune$ torchrun --nproc-per-node 4 --master_port $RANDOM -m train example/7B.yaml
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING]
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] *****************************************
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] *****************************************
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 3
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 0
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 0
2024-08-18 10:17:48 (UTC) - 0:00:04 - train - INFO - Going to init comms...
2024-08-18 10:17:48 (UTC) - 0:00:04 - train - INFO - Run dir: /home/usr/mistral-finetune/7B
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 2
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 1
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 2
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 1
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:50 (UTC) - 0:00:06 - train - INFO - TrainArgs: {'batch_size': 1,
'checkpoint': True,
'ckpt_freq': 100,
'data': {'data': '',
'eval_instruct_data': 'kbp_validation_set_prepared_full_mistral.jsonl',
'instruct': {'dynamic_chunk_fn_call': True, 'shuffle': True},
'instruct_data': 'kbp_training_set_prepared_full_mistral.jsonl',
'shuffle': False},
'eval_freq': 100,
'log_freq': 1,
'lora': {'dropout': 0.0, 'enable': True, 'rank': 64, 'scaling': 2.0},
'max_norm': 1.0,
'max_steps': 300,
'mlflow': {'experiment_name': None, 'tracking_uri': None},
'model_id_or_path': '/home/usr/mistral-finetune/mistral_models',
'no_ckpt': False,
'no_eval': False,
'num_ckpt_keep': 3,
'num_microbatches': 1,
'optim': {'lr': 6e-05, 'pct_start': 0.05, 'weight_decay': 0.1},
'run_dir': '/home/usr/mistral-finetune/7B',
'save_adapters': True,
'seed': 0,
'seq_len': 32768,
'wandb': {'key': '',
'offline': True,
'project': 'csa-project',
'run_name': 'csa-run-1'},
'world_size': 4}
wandb: Currently logged in as: ll (ll-itu). Use `wandb login --relogin` to force relogin
2024-08-18 10:17:51 (UTC) - 0:00:07 - metrics_logger - INFO - initializing wandb
wandb: WARNING Changes to your `wandb` environment variables will be ignored because your `wandb` session has already started. For more information on how to modify your settings with `wandb.init()` arguments, please refer to https://wandb.me/wandb-init.
wandb: Tracking run with wandb version 0.17.7
wandb: Run data is saved locally in /home/usr/mistral-finetune/7B/wandb/run-20240818_101751-ebgqmgpj
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run csa-run-1
wandb: ⭐️ View project at https://wandb.ai/ll-itu/csa-project
wandb: 🚀 View run at https://wandb.ai/ll-itu/csa-project/runs/ebgqmgpj
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Reloading model from /home/usr/mistral-finetune/mistral_models/consolidated.safetensors ...
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Converting model to dtype torch.bfloat16 ...
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Loaded model on cpu!
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Initializing lora layers ...
2024-08-18 10:17:52 (UTC) - 0:00:08 - finetune.wrapped_model - INFO - Finished initialization!
2024-08-18 10:17:52 (UTC) - 0:00:08 - finetune.wrapped_model - INFO - Sharding model over 4 GPUs ...
2024-08-18 10:18:02 (UTC) - 0:00:18 - finetune.wrapped_model - INFO - Model sharded!
2024-08-18 10:18:02 (UTC) - 0:00:18 - finetune.wrapped_model - INFO - 167,772,160 out of 7,415,795,712 parameters are finetuned (2.26%).
2024-08-18 10:18:02 (UTC) - 0:00:18 - dataset - INFO - Loading kbp_training_set_prepared_full_mistral.jsonl ...
2024-08-18 10:18:03 (UTC) - 0:00:19 - dataset - INFO - kbp_training_set_prepared_full_mistral.jsonl loaded and tokenized.
2024-08-18 10:18:03 (UTC) - 0:00:19 - dataset - INFO - Shuffling kbp_training_set_prepared_full_mistral.jsonl ...
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_logger
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/usr/mistral-finetune/train.py", line 327, in <module>
fire.Fire(train)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/usr/mistral-finetune/train.py", line 64, in train
_train(args, exit_stack)
File "/home/usr/mistral-finetune/train.py", line 243, in _train
output = model(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_logger
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger
exec(code, run_globals)
File "/home/usr/mistral-finetune/train.py", line 327, in <module>
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_logger
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
fire.Fire(train)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/usr/mistral-finetune/train.py", line 327, in <module>
component, remaining_args = _CallAndUpdateTrace(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
fire.Fire(train)return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component = fn(*varargs, **kwargs)
File "/home/usr/mistral-finetune/train.py", line 64, in train
_train(args, exit_stack)
component, remaining_args = _CallAndUpdateTrace( File "/home/usr/mistral-finetune/train.py", line 243, in _train
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
output = model(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
h = layer(h, freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
component = fn(*varargs, **kwargs)
File "/home/usr/mistral-finetune/train.py", line 64, in train
_train(args, exit_stack)
File "/home/usr/mistral-finetune/train.py", line 243, in _train
output = model(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
return self.checkpoint_fn( # type: ignore[misc]
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ret = function(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
h = layer(h, freqs_cis, att_mask)h = layer(h, freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
return self.checkpoint_fn( # type: ignore[misc]
return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return self.checkpoint_fn( # type: ignore[misc]
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
ret = function(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ret = function(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
return self._call_impl(*args, **kwargs)
output = memory_efficient_attention(xq, key, val, mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
return _memory_efficient_attention(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
return _fMHA.apply(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
out, op_ctx = _memory_efficient_attention_forward_requires_grad(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
op = _dispatch_fw(inp, True)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
output = self._fsdp_wrapped_module(*args, **kwargs)
return _run_priority_list(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 32768, 32, 128) (torch.bfloat16)
key : shape=(1, 32768, 32, 128) (torch.bfloat16)
value : shape=(1, 32768, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
p : 0.0
`[email protected]` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
operator wasn't built - see `python -m xformers.info` for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
output = memory_efficient_attention(xq, key, val, mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
return self._call_impl(*args, **kwargs)
return _memory_efficient_attention(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
return _fMHA.apply(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
out, op_ctx = _memory_efficient_attention_forward_requires_grad(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
op = _dispatch_fw(inp, True)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
output = memory_efficient_attention(xq, key, val, mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
return _run_priority_list(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
return _memory_efficient_attention(raise NotImplementedError(msg)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 32768, 32, 128) (torch.bfloat16)
key : shape=(1, 32768, 32, 128) (torch.bfloat16)
value : shape=(1, 32768, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
p : 0.0
`[email protected]` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
operator wasn't built - see `python -m xformers.info` for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128
return _fMHA.apply(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
out, op_ctx = _memory_efficient_attention_forward_requires_grad(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
op = _dispatch_fw(inp, True)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
return _run_priority_list(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 32768, 32, 128) (torch.bfloat16)
key : shape=(1, 32768, 32, 128) (torch.bfloat16)
value : shape=(1, 32768, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
p : 0.0
`[email protected]` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
operator wasn't built - see `python -m xformers.info` for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128
wandb:
wandb: 🚀 View run csa-run-1 at: https://wandb.ai/ll-itu/csa-project/runs/ebgqmgpj
wandb: ⭐️ View project at: https://wandb.ai/ll-itu/csa-project
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./7B/wandb/run-20240818_101751-ebgqmgpj/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closed: eval_logger
2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closed: metrics_logger
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/usr/mistral-finetune/train.py", line 327, in <module>
fire.Fire(train)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/usr/mistral-finetune/train.py", line 64, in train
_train(args, exit_stack)
File "/home/usr/mistral-finetune/train.py", line 243, in _train
output = model(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
h = layer(h, freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
return self.checkpoint_fn( # type: ignore[misc]
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
ret = function(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
output = self._fsdp_wrapped_module(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
output = memory_efficient_attention(xq, key, val, mask)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
return _memory_efficient_attention(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
return _fMHA.apply(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
out, op_ctx = _memory_efficient_attention_forward_requires_grad(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
op = _dispatch_fw(inp, True)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
return _run_priority_list(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 32768, 32, 128) (torch.bfloat16)
key : shape=(1, 32768, 32, 128) (torch.bfloat16)
value : shape=(1, 32768, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
p : 0.0
`[email protected]` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
operator wasn't built - see `python -m xformers.info` for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128
[2024-08-18 10:18:08,858] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 6934 closing signal SIGTERM
[2024-08-18 10:18:10,074] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 6935) of binary: /home/usr/mistral-finetune/mistenv/bin/python3.10
Traceback (most recent call last):
File "/home/usr/mistral-finetune/mistenv/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-08-18_10:18:08
host : ip-172-31-x-x.ec2.internal
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 6936)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-08-18_10:18:08
host : ip-172-31-x-x.ec2.internal
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 6937)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-08-18_10:18:08
host : ip-172-31-x-x.ec2.internal
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 6935)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
$:~/mistral-finetune$Pip Freeze
$:~/mistral-finetune$ pip3 freeze
absl-py==2.1.0
annotated-types==0.7.0
attrs==24.2.0
certifi==2024.7.4
charset-normalizer==3.3.2
click==8.1.7
docker-pycreds==0.4.0
docstring_parser==0.16
filelock==3.15.4
fire==0.6.0
fsspec==2024.6.1
gitdb==4.0.11
GitPython==3.1.43
grpcio==1.65.5
idna==3.7
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
Markdown==3.7
MarkupSafe==2.1.5
mistral_common==1.3.4
mpmath==1.3.0
networkx==3.3
numpy==1.25.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
platformdirs==4.2.2
protobuf==5.27.3
psutil==6.0.0
pydantic==2.8.2
pydantic_core==2.20.1
PyYAML==6.0.2
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
safetensors==0.4.4
sentencepiece==0.2.0
sentry-sdk==2.13.0
setproctitle==1.3.3
simple_parsing==0.1.5
six==1.16.0
smmap==5.0.1
sympy==1.13.2
tensorboard==2.17.1
tensorboard-data-server==0.7.2
termcolor==2.4.0
tiktoken==0.7.0
torch==2.2.0
tqdm==4.66.5
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
wandb==0.17.7
Werkzeug==3.0.3
xformers==0.0.24Reproduction Steps
- Install libraries and dependences
- export CUDA_VISIBLE_DEVICES=0,1,2,3
- configure absolute paths in 7B.yaml file
- pass dataset validation test
- torchrun --nproc-per-node 4 --master_port $RANDOM -m train example/7B.yaml
Expected Behavior
A successful training session.
Additional Context
No response
Suggested Solutions
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working