Halo Strix 1500 step LoRA for Comfyui Flux 1 dev GMTEK EVO2 128gb total 64gb for VRAM #2260
bkpaine1
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
|
have you tried enabling torch inductor? also, set TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
[RANK 0] 2025-12-27 05:35:43,274 [INFO] Moving AutoencoderKL to accelerator, converting from torch.float32 to torch.bfloat16
[RANK 0] 2025-12-27 05:35:43,547 [INFO] [Rank 0] Processing 1 local validation work items (distributed_mode=False, total=1, work_item_prompts=[''])
Epoch 1/1, Steps: 60%|███████▏ | 900/1500 [11:06:47<7:17:18, 43.73s/it, grad_absmax=0.000372, lr=0.0001, step_loss=0.292][RANK 0] 2025-12-27 06:49:08,997 [INFO] Loading AutoencoderKL from black-forest-labs/FLUX.1-dev
[RANK 0] 2025-12-27 06:49:09,333 [INFO] Moving AutoencoderKL to accelerator, converting from torch.float32 to torch.bfloat16
[RANK 0] 2025-12-27 06:49:09,617 [INFO] [Rank 0] Processing 1 local validation work items (distributed_mode=False, total=1, work_item_prompts=[''])
Epoch 1/1, Steps: 66%|████████▌ | 993/1500 [12:15:07<6:09:20, 43.71s/it, grad_absmax=0.00028, lr=0.0001, step_loss=0.248]
Works just great! startup script if you don't want to fight the gui.
cat > ~/venvbc/SimpleTuner/config/config.json << 'EOF'
{
"--data_backend_config": "config/multidatabackend.json",
"--output_dir": "/home/test/venvbc/ST-OUTPUT/test_v1",
"--model_type": "lora",
"--model_family": "flux",
"--pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev",
"--train_batch_size": 1,
"--max_train_steps": 1500,
"--num_train_epochs": 0,
"--learning_rate": "1e-4",
"--lora_rank": 16,
"--mixed_precision": "bf16",
"--gradient_checkpointing": true,
"--resolution": 1024,
"--seed": 42,
"--optimizer": "adamw_bf16"
}
EOF
cd ~/venvbc/SimpleTuner
source ~/venvbc/bin/activate
unset PYTORCH_HIP_ALLOC_CONF
unset PYTORCH_ALLOC_CONF
export HSA_ENABLE_SDMA=0
export HSA_USE_SVM=0
python << 'PYTHONCODE'
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.set_float32_matmul_precision('medium')
import runpy
runpy.run_module('simpletuner.train', run_name='main', alter_sys=True)
PYTHONCODE
Beta Was this translation helpful? Give feedback.
All reactions