System Info
accelerate versions 1.11.0, 1.12.0, 1.13.0
Information
Tasks
Im trying to load model to 2 of my 4 GPUs evenly, but inferring the device map is difficult.
Reproduction
To reproduce:
config = AutoConfig.from_pretrained("google/gemma-4-E4B-it")
with init_empty_weights():
empty_model = AutoModelForCausalLM.from_config("google/gemma-4-E4B-it")
n_devices = torch.cuda.device_count()
memalloc = {"cpu": "8GiB", 0: "48GiB", 1: "48GiB", 2: "0GiB", 3: "0GiB"}
balanced_memory = get_balanced_memory(empty_model, max_memory=memalloc)
device_map = infer_auto_device_map(empty_model, max_memory=balanced_memory)
# device_map will have strange behavior mapping layer 8
model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E4B-it", device_map=device_map)
Expected behavior
This will cause error 'ValueError: The device_map provided does not give any device for the following parameters: model.language_model.layers.8.layer_scalar'
System Info
Information
Tasks
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py)Im trying to load model to 2 of my 4 GPUs evenly, but inferring the device map is difficult.
Reproduction
To reproduce:
Expected behavior
This will cause error 'ValueError: The device_map provided does not give any device for the following parameters: model.language_model.layers.8.layer_scalar'