Halo Strix 1500 step LoRA for Comfyui Flux 1 dev GMTEK EVO2 128gb total 64gb for VRAM #2260

bkpaine1 · 2025-12-27T12:59:42Z

bkpaine1
Dec 27, 2025

[RANK 0] 2025-12-27 05:35:43,274 [INFO] Moving AutoencoderKL to accelerator, converting from torch.float32 to torch.bfloat16
[RANK 0] 2025-12-27 05:35:43,547 [INFO] [Rank 0] Processing 1 local validation work items (distributed_mode=False, total=1, work_item_prompts=[''])
Epoch 1/1, Steps: 60%|███████▏ | 900/1500 [11:06:47<7:17:18, 43.73s/it, grad_absmax=0.000372, lr=0.0001, step_loss=0.292][RANK 0] 2025-12-27 06:49:08,997 [INFO] Loading AutoencoderKL from black-forest-labs/FLUX.1-dev
[RANK 0] 2025-12-27 06:49:09,333 [INFO] Moving AutoencoderKL to accelerator, converting from torch.float32 to torch.bfloat16
[RANK 0] 2025-12-27 06:49:09,617 [INFO] [Rank 0] Processing 1 local validation work items (distributed_mode=False, total=1, work_item_prompts=[''])
Epoch 1/1, Steps: 66%|████████▌ | 993/1500 [12:15:07<6:09:20, 43.71s/it, grad_absmax=0.00028, lr=0.0001, step_loss=0.248]

Works just great! startup script if you don't want to fight the gui.

cat > ~/venvbc/SimpleTuner/config/config.json << 'EOF'
{
"--data_backend_config": "config/multidatabackend.json",
"--output_dir": "/home/test/venvbc/ST-OUTPUT/test_v1",
"--model_type": "lora",
"--model_family": "flux",
"--pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev",
"--train_batch_size": 1,
"--max_train_steps": 1500,
"--num_train_epochs": 0,
"--learning_rate": "1e-4",
"--lora_rank": 16,
"--mixed_precision": "bf16",
"--gradient_checkpointing": true,
"--resolution": 1024,
"--seed": 42,
"--optimizer": "adamw_bf16"
}
EOF

cd ~/venvbc/SimpleTuner
source ~/venvbc/bin/activate
unset PYTORCH_HIP_ALLOC_CONF
unset PYTORCH_ALLOC_CONF
export HSA_ENABLE_SDMA=0
export HSA_USE_SVM=0

python << 'PYTHONCODE'
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.set_float32_matmul_precision('medium')

import runpy
runpy.run_module('simpletuner.train', run_name='main', alter_sys=True)
PYTHONCODE

bghira · 2025-12-27T13:19:55Z

bghira
Dec 27, 2025
Maintainer

have you tried enabling torch inductor?

also, set TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1

1 reply

bkpaine1 Dec 27, 2025
Author

No but I will. TheROCK7.10+ has changed the game this is a CUDA killer at this price point. will report back after this lora run love this tool though! KILLER! GUI a bit overload and some caching sticks but otherwise a HUGE win for TheROCK thank you for your efforts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Halo Strix 1500 step LoRA for Comfyui Flux 1 dev GMTEK EVO2 128gb total 64gb for VRAM #2260

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Halo Strix 1500 step LoRA for Comfyui Flux 1 dev GMTEK EVO2 128gb total 64gb for VRAM #2260

Uh oh!

bkpaine1 Dec 27, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

bghira Dec 27, 2025 Maintainer

Uh oh!

Uh oh!

bkpaine1 Dec 27, 2025 Author

bkpaine1
Dec 27, 2025

Replies: 1 comment 1 reply

bghira
Dec 27, 2025
Maintainer

bkpaine1 Dec 27, 2025
Author