-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Summary
Unexpected batch processing behavior where increasing num_shots (batch size) doesn't increase VRAM usage or show signs of parallel processing.
Observed Behavior
When testing with different batch sizes, we found VRAM usage doesn't increase with larger batch sizes. Is this an expected behaviour?
Test Code
import cudaq_qec as qec
import numpy as np
import stim
from beliefmatching.belief_matching import detector_error_model_to_check_matrices
try:
import GPUtil
except ImportError:
GPUtil = None
def get_gpu_usage():
if GPUtil is not None:
gpus = GPUtil.getGPUs()
if gpus:
gpu = gpus[0]
return gpu.memoryUsed, gpu.load*100
return None, None
def parse_detector_error_model(detector_error_model):
matrices = detector_error_model_to_check_matrices(detector_error_model)
out_H = np.zeros(matrices.check_matrix.shape)
matrices.check_matrix.astype(np.float64).toarray(out=out_H)
out_L = np.zeros(matrices.observables_matrix.shape)
matrices.observables_matrix.astype(np.float64).toarray(out=out_L)
return out_H, out_L, [float(p) for p in matrices.priors]
# Test with same circuit parameters, different batch sizes
test_configs = [
{"distance": 3, "rounds": 3, "num_shots": 5, "label": "Batch size 5"},
{"distance": 3, "rounds": 3, "num_shots": 25, "label": "Batch size 25"},
{"distance": 3, "rounds": 3, "num_shots": 50, "label": "Batch size 50"},
{"distance": 3, "rounds": 3, "num_shots": 100, "label": "Batch size 100"}
]
for config in test_configs:
print(f"\n--- {config['label']} ---")
vram_before, _ = get_gpu_usage()
circuit = stim.Circuit.generated("surface_code:rotated_memory_z",
rounds=config["rounds"],
distance=config["distance"],
after_clifford_depolarization=0.001,
after_reset_flip_probability=0.01,
before_measure_flip_probability=0.01,
before_round_data_depolarization=0.01)
detector_error_model = circuit.detector_error_model(decompose_errors=True)
H, logicals, noise_model = parse_detector_error_model(detector_error_model)
decoder = qec.get_decoder(
"tensor_network_decoder",
H,
logical_obs=logicals,
noise_model=noise_model,
contract_noise_model=True,
)
sampler = circuit.compile_detector_sampler()
detection_events, observable_flips = sampler.sample(config["num_shots"],
separate_observables=True)
vram_before, _ = get_gpu_usage()
if vram_before is not None:
print(f"Before decoding: VRAM: {vram_before:.0f} MB")
res = decoder.decode_batch(detection_events)
vram_after, _ = get_gpu_usage()
if vram_after is not None:
print(f"After decoding: VRAM: {vram_after:.0f} MB")Test Results
Surface code (d=3, r=3):
- shots=5: VRAM: 5 MB → 556 MB (+551 MB during decoding)
- shots=25: VRAM: 556 MB → 620 MB (+64 MB during decoding)
- shots=50: VRAM: 620 MB → 620 MB (no change during decoding)
- shots=100: VRAM: 620 MB → 620 MB (no change during decoding)
Environment
- Package:
cudaq-qec[tensor-network-decoder]stimbeliefmatchinggputil - Hardware: NVIDIA A6000 Ada
Metadata
Metadata
Assignees
Labels
No labels