I saw this line in source code:
|
_ = self.forward(input_ids, seq_ids, [], ignore_kvcache=True) |
|
torch.cuda.synchronize() |
|
|
|
# peak_memory = torch.cuda.max_memory_allocated() |
|
# total_memory = torch.cuda.get_device_properties(0).total_memory |
|
free_memory, total_memory = torch.cuda.mem_get_info() |
|
peak_memory = total_memory - free_memory |
after forward, some memory has been released, for example memory for Intermediate Activations and memory for input ids .etc
so could the calculation make a high block numbers than reality?
thanks
I saw this line in source code:
swiftLLM/swiftllm/worker/model.py
Lines 116 to 122 in 682cf9a
after
forward, some memory has been released, for example memory for Intermediate Activations and memory for input ids .etcso could the calculation make a high block numbers than reality?
thanks