fix(refit): use torch.Tensor APIs in scalar constant_mapping path

tp5uiuc · tp5uiuc · commit 6faa897c92ab · 2026-05-05T20:09:40.000-07:00
The fast-refit path on TensorRT-RTX was failing with "Fast refit failed on TensorRT-RTX: N of N engine weight(s) had no entry in weight_name_map" for any model containing scalar constants (e.g. batch-norm eps), because `weight_refit_map` values are torch.Tensor (since #3573) but two consumer call sites still used the old np.ndarray API: * _TRTInterpreter._construct_refit_mapping filtered scalars with `v.size == 1`. `Tensor.size` is a method, so the comparison was always False and `constant_mapping` was always empty -- scalar constants never reached the cached `weight_name_map["constant_mapping"]`. Fixed by switching to `v.numel() == 1`. * _refit_single_trt_engine_with_gm rehydrated those values via `torch.from_numpy(val).cuda()`, which raises TypeError on a Tensor. Fixed by using `val.cuda()` directly and renaming the local from `np_weight_type` to `weight_dtype` to reflect the actual type. With both fixes, the engine-cache hit + fast-refit path now covers scalar constants on TRT-RTX without falling back to GraphModule.forward; the formerly-skipped refit tests pass.
diff --git a/py/torch_tensorrt/dynamo/_refit.py b/py/torch_tensorrt/dynamo/_refit.py
@@ -177,10 +177,10 @@ def _refit_single_trt_engine_with_gm(
             constant_mapping_with_type = {}
 
             for constant_name, val in constant_mapping.items():
-                np_weight_type = val.dtype
-                val_tensor = torch.from_numpy(val).cuda()
-                trt_dtype = dtype._from(np_weight_type).to(trt.DataType)
-                torch_dtype = dtype._from(np_weight_type).to(torch.dtype)
+                weight_dtype = val.dtype
+                val_tensor = val.cuda()
+                trt_dtype = dtype._from(weight_dtype).to(trt.DataType)
+                torch_dtype = dtype._from(weight_dtype).to(torch.dtype)
                 constant_mapping_with_type[constant_name] = (
                     val_tensor.clone().reshape(-1).contiguous().to(torch_dtype),
                     trt_dtype,
diff --git a/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py b/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py
@@ -486,7 +486,7 @@ def _save_weight_mapping(self) -> None:
         sd = {k: v.to(torch_device) for k, v in self.module.state_dict().items()}
         weight_name_map: dict[str, Any] = {}
         weight_refit_map = self.ctx.weight_refit_map
-        constant_mapping = {k: v for k, v in weight_refit_map.items() if v.size == 1}
+        constant_mapping = {k: v for k, v in weight_refit_map.items() if v.numel() == 1}
         net = self.ctx.net
         for i in range(net.num_layers):
             layer = net[i]