Skip to content

自动被killed且无任何报错信息 #6687

Open
@duyu09

Description

Reminder

  • I have read the above rules and searched the existing issues.

System Info

llamafactory-cli env

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-3.10.0-693.el7.x86_64-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • PyTorch version: 2.5.0a0+b465a5843b.nv24.09 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA GeForce RTX 2080 Ti

Reproduction

  • 执行的命令是:
FORCE_TORCHRUN=0 CUDA_VISIBLE_DEVICES=1 llamafactory-cli train ../qwen_pretrain.yaml
  • qwen_pretrain.yaml中的配置为:
### model
model_name_or_path: /home/s-duy20/qwen
trust_remote_code: true

### method
stage: pt
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: pretrain
cutoff_len: 500
max_samples: 127
overwrite_cache: true
# preprocessing_num_workers: 1

### output
output_dir: /home/s-duy20/saves/qwen/lora/pretrain
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
# ddp_timeout: 180000000
lora_rank: 7

### eval
val_size: 0.05
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
  • 输出的日志的后半部分为:
[INFO|tokenization_utils_base.py:2204] 2025-01-17 11:01:20,810 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2204] 2025-01-17 11:01:20,810 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2204] 2025-01-17 11:01:20,810 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2204] 2025-01-17 11:01:20,810 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2204] 2025-01-17 11:01:20,810 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2204] 2025-01-17 11:01:20,810 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2470] 2025-01-17 11:01:21,138 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-01-17 11:01:21] llamafactory.data.loader:157 >> Loading dataset pretrain.json...
Generating train split: 0 examples [00:00, ? examples/s]Killed
  • 机器配置:
    RAM内存空间剩余有100GB左右(足够大),显存约12GB,磁盘空间足够大。

从日志中可以看到无任何报错信息,直接被Killed了,请问这是怎么回事?怎么解决?

Others

No response

Metadata

Assignees

Labels

bugSomething isn't workingpendingThis problem is yet to be addressed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions