> CUDA_VISIBLE_DEVICES="0" PYTORCH_ALLOC_CONF=expandable_segments:True KEYSVALS_LOG_DIR="/home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2/eval_logs0"; python3 keys_values/__main__.py eval_long /home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2 --model_type lora --devices 1 --batch_size 4 --kv_cache.name h2o-torch-quantized8 --kv_cache.cache_length 32768 --kv_cache.chunk_size 1024 --verbose some --attention_forward_temp_size_gb 8 --lora_dropout 0
> CUDA_VISIBLE_DEVICES="1" PYTORCH_ALLOC_CONF=expandable_segments:True KEYSVALS_LOG_DIR="/home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2/eval_logs1"; python3 keys_values/__main__.py eval_long /home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2 --model_type lora --devices 1 --batch_size 4 --kv_cache.name h2o-torch-quantized8 --kv_cache.cache_length 32768 --kv_cache.chunk_size 1024 --verbose some --attention_forward_temp_size_gb 8 --lora_dropout 0
> CUDA_VISIBLE_DEVICES="2" PYTORCH_ALLOC_CONF=expandable_segments:True KEYSVALS_LOG_DIR="/home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2/eval_logs2"; python3 keys_values/__main__.py eval_long /home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2 --model_type lora --devices 1 --batch_size 4 --kv_cache.name h2o-torch-quantized8 --kv_cache.cache_length 32768 --kv_cache.chunk_size 1024 --verbose some --attention_forward_temp_size_gb 8 --lora_dropout 0
> CUDA_VISIBLE_DEVICES="3" PYTORCH_ALLOC_CONF=expandable_segments:True KEYSVALS_LOG_DIR="/home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2/eval_logs3"; python3 keys_values/__main__.py eval_long /home/ubuntu/out/finetune/ml_ws/lora/qwen2_5_0_5b/variant0_copy2 --model_type lora --devices 1 --batch_size 4 --kv_cache.name h2o-torch-quantized8 --kv_cache.cache_length 32768 --kv_cache.chunk_size 1024 --verbose some --attention_forward_temp_size_gb 8 --lora_dropout 0
Devices 0, 1 run fine. For devices 2, 3:
File "/home/ubuntu/sync/keys_values/keys_values/__main__.py", line 140, in main
auto_cli(PARSER_DATA)
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/jsonargparse/_cli.py", line 129, in auto_cli
return _run_component(component, init.get(subcommand))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/jsonargparse/_cli.py", line 227, in _run_component
return component(**cfg)
^^^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/finetune/longcontext_eval.py", line 274, in setup
fabric.launch(
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 1010, in launch
return self._wrap_and_launch(function, self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 1121, in _wrap_and_launch
return to_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 1126, in _wrap_with_setup
return to_run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/finetune/longcontext_eval.py", line 481, in main
raise ex
File "/home/ubuntu/sync/keys_values/keys_values/finetune/longcontext_eval.py", line 448, in main
loss_values = model(batch[INPUT_IDS_NAME], batch["targets"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/virtenvs/keysvals/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/long_context.py", line 645, in forward
return self._forward_only(input_ids, targets, scale_factor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/long_context.py", line 1001, in _forward_only
loss_full = self._forward_internal(input_ids, targets, scale_factor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/long_context.py", line 810, in _forward_internal
result = self._forward_internal_no_check(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/long_context.py", line 917, in _forward_internal_no_check
y = block.forward(
^^^^^^^^^^^^^^
File "/home/ubuntu/sync/keys_values/keys_values/kvcache/stack_layers.py", line 58, in forward
self._check_kv_cache(cache, block_idx, batch_size, chunk_len)
File "/home/ubuntu/sync/keys_values/keys_values/kvcache/stack_layers.py", line 107, in _check_kv_cache
raise ValueError(
ValueError: KV cache for layer 0: chunk_len = 32768, must be <= max_forward_length() = 1024 (input_pos = 33792)
Also, output logs are not written for 2, 3.
Describe the bug
Running this:
Devices 0, 1 run fine. For devices 2, 3:
Also, output logs are not written for 2, 3.