Skip to content

Commit f4100ca

Browse files
committed
Version 1.1.14
Included in Version 1.1.14 Summary We implemented a complete memory-management upgrade for ETUR refiners (PRO + CE) to support low-spec systems and very large tile counts. This combined plan delivers: 1. meaningful 3-profile memory modes, 2. clear PRO tooltip text explaining tradeoffs, 3. CE forced to lowest-memory behavior, 4. RAM-based automatic fallback (cross-platform), 5. strict `Ultra Low` mode that unloads/reloads models between stages/tiles for minimum VRAM usage. Target scenario: 100 tiles, 2048x2048 input, denoise mask + image stabilizer + Redux + 3 ControlNets. --- PRO node input change We replaced the boolean `Low_Vram` with enum `VRAM_Profile`. Final profile names 1. `Fast Cache (Max Speed)` 2. `Low VRAM Cache (Unload Models)` 3. `Ultra Low Memory (Per-Tile Streaming)` ### PRO tooltip (final text) - `Fast Cache (Max Speed)`: Precomputes full tile conditioning (text + Redux + ControlNet) for all tiles and keeps models loaded. Fastest sampling, highest RAM/VRAM usage. - `Low VRAM Cache (Unload Models)`: Precomputes full tile conditioning, then unloads models to reduce VRAM. RAM can still be high with many tiles. - `Ultra Low Memory (Per-Tile Streaming)`: Caches repeated text conditioning only; Redux/ControlNet are rebuilt per tile and released immediately. Also unloads/reloads models between steps/tiles for minimum VRAM. Slowest mode; best for very low-spec systems. CE node behavior - No new CE UI input. - Force CE internally to `Ultra Low Memory (Per-Tile Streaming)`. Backward compatibility Map legacy workflows with `Low_Vram`: - `True` -> `Low VRAM Cache (Unload Models)` - `False` -> `Fast Cache (Max Speed)` - If CE path has no field -> force Ultra Low. --- ## 2) Runtime Profile Semantics ### A. Fast Cache (Max Speed) - Full precompute for all selected tiles: - text conditioning - Redux conditioning - ControlNet conditioning - Keep models resident. - Highest memory usage, best speed. ### B. Low VRAM Cache (Unload Models) - Same full precompute as Fast Cache. - Then unload models / soft-empty cache before tile sampling loop. - Lower VRAM than Fast Cache, RAM still high due to full caches. ### C. Ultra Low Memory (Per-Tile Streaming) - Precompute text-only deduplicated cache. - For each tile: - build dynamic conditioning on demand (Redux + ControlNet) - consume immediately - drop refs + unload models + flush cache - Between stages and between tiles: - aggressive unload/reload cycle. - Slowest, lowest VRAM and significantly lower RAM. --- ## 3) RAM-Based Auto Fallback (PRO) Scope - RAM-based only (not VRAM-based), per your requirement. - Applied to PRO before heavy precompute starts. ### Cross-platform RAM probing Primary: - `psutil.virtual_memory().available` and `.total` Fallbacks if `psutil` unavailable: - Windows: `ctypes` `GlobalMemoryStatusEx` - Linux: parse `/proc/meminfo` (`MemAvailable`) - macOS: `vm_stat`/page-size parsing (+ `sysctl hw.memsize` for total) If all probes fail: - Log warning; run selected profile without auto-fallback. Decision rule - Estimate required RAM for selected profile. - Safety threshold: - require `available_ram >= estimate * 1.25 + 1.5GB` - If unsafe: - auto-switch to `Ultra Low Memory (Per-Tile Streaming)` - log decision and numbers. - If selected profile already Ultra Low: - no upward fallback. --- ## 4) Estimation Model ### Fast / Low VRAM Cache estimate - Baseline process overhead - + per-tile unified conditioning cache estimate (all tiles): - sample tensor dimensions from representative tile - dtype factor - conditioning structure overhead multiplier Ultra Low estimate - Baseline - + text-only cache estimate - + one-tile transient working set (not multiplied by tile count) ## 5) Code Changes Files - [`py/nodes/UpscalerRefiner/TBG_Nodes_PRO.py`](a:/SD/ComfyUI_070_cu128/ComfyUI/custom_nodes/ComfyUI-TBG-ETUR/py/nodes/UpscalerRefiner/TBG_Nodes_PRO.py) - [`py/nodes/UpscalerRefiner/TBG_Nodes_CE.py`](a:/SD/ComfyUI_070_cu128/ComfyUI/custom_nodes/ComfyUI-TBG-ETUR/py/nodes/UpscalerRefiner/TBG_Nodes_CE.py) - [`py/nodes/UpscalerRefiner/TBG_Refiner.py`](a:/SD/ComfyUI_070_cu128/ComfyUI/custom_nodes/ComfyUI-TBG-ETUR/py/nodes/UpscalerRefiner/TBG_Refiner.py) - [`py/nodes/UpscalerRefiner/inc/vram_optimizing.py`](a:/SD/ComfyUI_070_cu128/ComfyUI/custom_nodes/ComfyUI-TBG-ETUR/py/nodes/UpscalerRefiner/inc/vram_optimizing.py) - New helper: `inc/memory_probe.py` (or equivalent) Patched the skippend Cnet Log bug in ETUR. What I changed I updated vram_optimizing.py: Removed the invalid SELF.INPUTS.controlnetimage gate (that field is never set in this path). Just triggering a wrong log output confusing users ControlNet preprocessors now validate real sources: use custom_controlnet_image (noise_image) if provided otherwise use per-tile image Fixed a second bug: it now applies filtered_pipe (not the original pipe). Why your log looked contradictory You saw both: Skipping ... no controlnetimage and Depth model loading each tile That happened because the code was logging skip during filtering, but still executed the original unfiltered cnet pipe afterward. The patch fixes that mismatch. Changes made Updated cnet.py: Logs at tile start: how many ControlNet pipe entries will be processed. Logs per entry before apply: model preprocessor source (tile_image vs custom_controlnet_image) effective strength, start, end image_shape Logs per entry after apply: conditioning applied. Updated vram_optimizing.py: Logs selected filtered entries per tile before precompute. Keeps the earlier fix: no bogus controlnetimage gate uses filtered_pipe for execution.
1 parent 9f5aa5f commit f4100ca

8 files changed

Lines changed: 656 additions & 175 deletions

File tree

py/nodes/UpscalerRefiner/TBG_Nodes_CE.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -236,9 +236,10 @@ def INPUT_TYPES(self):
236236
OUTPUT_NODE = True
237237
FUNCTION = "fn"
238238

239-
@classmethod
240-
def fn(self, **kwargs):
241-
return {
242-
"ui": {"value": [f"{kwargs.get('seed', None)}"]},
243-
"result": (TBG_Refiner_v1.fn(**kwargs))
244-
}
239+
@classmethod
240+
def fn(self, **kwargs):
241+
kwargs["VRAM_Profile"] = "Ultra Low Memory (Per-Tile Streaming)"
242+
return {
243+
"ui": {"value": [f"{kwargs.get('seed', None)}"]},
244+
"result": (TBG_Refiner_v1.fn(**kwargs))
245+
}

py/nodes/UpscalerRefiner/TBG_Nodes_PRO.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -415,8 +415,26 @@ def INPUT_TYPES(self):
415415
"Selected_Tiles_By_Numbers": ("STRING", {"label": "Selected_Tiles_Index_Numbers to process", "default": '',
416416
"tooltip": "You can set a list of selected tiles to process like 1,2,3,6 and activate Selected_Tiles_Only"}),
417417

418-
"Low_Vram": ("BOOLEAN", {"label": "Low Vram", "default": True, "label_on": "on", "label_off": "off",
419-
"tooltip": "Reduced VRAM Usage achieves lower memory consumption through conditioning compression, process separation, model unloading, and automatic memory cleanup."}),
418+
"VRAM_Profile": (
419+
[
420+
"Fast Cache (Max Speed)",
421+
"Low VRAM Cache (Unload Models)",
422+
"Ultra Low Memory (Per-Tile Streaming)",
423+
],
424+
{
425+
"label": "VRAM Profile",
426+
"default": "Low VRAM Cache (Unload Models)",
427+
"tooltip": (
428+
"Fast Cache (Max Speed): Precomputes full tile conditioning (text + Redux + ControlNet) "
429+
"for all tiles and keeps models loaded. Fastest sampling, highest RAM/VRAM usage. "
430+
"Low VRAM Cache (Unload Models): Same full precompute, then unloads models to reduce VRAM. "
431+
"RAM can still be high with many tiles. "
432+
"Ultra Low Memory (Per-Tile Streaming): Caches repeated text conditioning only; Redux/ControlNet "
433+
"are rebuilt per tile and released immediately. Also unloads/reloads models between steps/tiles "
434+
"for minimum VRAM. Slowest mode; best for very low-spec systems."
435+
),
436+
},
437+
),
420438

421439
},
422440
"hidden": {

py/nodes/UpscalerRefiner/TBG_Refiner.py

Lines changed: 214 additions & 69 deletions
Large diffs are not rendered by default.

py/nodes/UpscalerRefiner/inc/cnet.py

Lines changed: 68 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,20 @@
1111

1212

1313

14-
import comfy.model_management as model_management
14+
import comfy.model_management as model_management
15+
from ....utils.log import log
16+
17+
18+
def _controlnet_label(controlnet_model):
19+
if controlnet_model is None:
20+
return "None"
21+
name = getattr(controlnet_model, "name", None)
22+
if name:
23+
return str(name)
24+
model_file = getattr(controlnet_model, "model_file", None)
25+
if model_file:
26+
return str(model_file)
27+
return type(controlnet_model).__name__
1528

1629

1730
def stitch(
@@ -177,14 +190,22 @@ def stitch(
177190

178191

179192

180-
def apply_controlnets_from_pipe(self,SELF, cnetpipe, positive, negative, full_image, tile_image, vae, index):
181-
controlnet_node = nodes.ControlNetApplyAdvanced()
182-
for control in cnetpipe:
183-
controlnet_model = control["controlnet"]
184-
strength = control["strength"]
185-
start = control["start"]
186-
end = control["end"]
187-
preprocessor = control["preprocessor"]
193+
def apply_controlnets_from_pipe(self,SELF, cnetpipe, positive, negative, full_image, tile_image, vae, index):
194+
controlnet_node = nodes.ControlNetApplyAdvanced()
195+
tile_no = index + 1
196+
total_entries = len(cnetpipe) if cnetpipe is not None else 0
197+
log(
198+
f"[TBG][ControlNet] Tile {tile_no}: precompute start with {total_entries} pipe entries",
199+
None,
200+
None,
201+
f"Node {self.node_id}",
202+
)
203+
for entry_idx, control in enumerate(cnetpipe):
204+
controlnet_model = control["controlnet"]
205+
strength = control["strength"]
206+
start = control["start"]
207+
end = control["end"]
208+
preprocessor = control["preprocessor"]
188209
canny_high_threshold = control["canny_high_threshold"]
189210
canny_low_threshold = control["canny_low_threshold"]
190211
noise_image = control["noise_image"]
@@ -197,27 +218,46 @@ def apply_controlnets_from_pipe(self,SELF, cnetpipe, positive, negative, full_im
197218
cnet_image = np.array(cnet_image)
198219

199220
grid_cnetstrength = self.PROMPTER.output_cnet_js[index]
200-
if grid_cnetstrength is not None and isinstance(grid_cnetstrength, (float, int)):
201-
strength = grid_cnetstrength * self.KSAMPLER.cnet_multiply
202-
else:
203-
204-
strength = strength * self.KSAMPLER.cnet_multiply
205-
206-
# Preprocessero
207-
208-
if preprocessor == "DepthAnythingV2":
209-
model = DepthAnythingV2Detector.from_pretrained(filename="depth_anything_v2_vitl.pth").to(model_management.get_torch_device())
221+
if grid_cnetstrength is not None and isinstance(grid_cnetstrength, (float, int)):
222+
strength = grid_cnetstrength * self.KSAMPLER.cnet_multiply
223+
else:
224+
225+
strength = strength * self.KSAMPLER.cnet_multiply
226+
227+
source = "custom_controlnet_image" if noise_image is not None else "tile_image"
228+
model_label = _controlnet_label(controlnet_model)
229+
cnet_shape = tuple(cnet_image.shape) if hasattr(cnet_image, "shape") else "unknown"
230+
log(
231+
f"[TBG][ControlNet] Tile {tile_no} entry {entry_idx + 1}/{total_entries}: "
232+
f"model={model_label}, preprocessor={preprocessor}, source={source}, "
233+
f"strength={float(strength):.4f}, start={float(start):.3f}, end={float(end):.3f}, "
234+
f"image_shape={cnet_shape}",
235+
None,
236+
None,
237+
f"Node {self.node_id}",
238+
)
239+
240+
# Preprocessero
241+
242+
if preprocessor == "DepthAnythingV2":
243+
model = DepthAnythingV2Detector.from_pretrained(filename="depth_anything_v2_vitl.pth").to(model_management.get_torch_device())
210244
cnet_image = common_annotator_call(model, cnet_image, resolution=1024, max_depth=1)
211245
del model
212246
if preprocessor == "Canny Edge":
213247
cnet_image = common_annotator_call(CannyDetector(), cnet_image, canny_low_threshold=canny_low_threshold, canny_high_threshold=canny_high_threshold, resolution=1024)
214-
positive, negative = controlnet_node.apply_controlnet(
215-
positive, negative, controlnet_model, cnet_image, strength, start, end, vae
216-
)
217-
if preprocessor == "ControlNetInpaintingAliMama":
218-
from .image import TBG_Image
219-
inpaintmask = \
220-
TBG_Image().mask_get_fusion_mask(SELF,
248+
positive, negative = controlnet_node.apply_controlnet(
249+
positive, negative, controlnet_model, cnet_image, strength, start, end, vae
250+
)
251+
log(
252+
f"[TBG][ControlNet] Tile {tile_no} entry {entry_idx + 1}/{total_entries}: conditioning applied",
253+
None,
254+
None,
255+
f"Node {self.node_id}",
256+
)
257+
if preprocessor == "ControlNetInpaintingAliMama":
258+
from .image import TBG_Image
259+
inpaintmask = \
260+
TBG_Image().mask_get_fusion_mask(SELF,
221261
cnet_image.shape[1],
222262
cnet_image.shape[2],
223263
SELF.SIZE.cols_qty,
@@ -243,8 +283,8 @@ def apply_controlnets_from_pipe(self,SELF, cnetpipe, positive, negative, full_im
243283

244284
#if self.lowvram:
245285
# UnloadOneModelNode.route(controlnet_model)
246-
247-
return positive, negative
286+
287+
return positive, negative
248288

249289
def get_canny_mask_inverted(image,canny_low_threshold=100,canny_high_threshold=150):
250290
resolution = min(image.shape[1],image.shape[2])
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
import ctypes
2+
import os
3+
import platform
4+
import subprocess
5+
6+
7+
def _ram_info_psutil():
8+
try:
9+
import psutil
10+
11+
vm = psutil.virtual_memory()
12+
return {
13+
"ok": True,
14+
"available_bytes": int(vm.available),
15+
"total_bytes": int(vm.total),
16+
"source": "psutil",
17+
}
18+
except Exception as e:
19+
return {"ok": False, "reason": f"psutil failed: {e}"}
20+
21+
22+
def _ram_info_windows_ctypes():
23+
class MEMORYSTATUSEX(ctypes.Structure):
24+
_fields_ = [
25+
("dwLength", ctypes.c_ulong),
26+
("dwMemoryLoad", ctypes.c_ulong),
27+
("ullTotalPhys", ctypes.c_ulonglong),
28+
("ullAvailPhys", ctypes.c_ulonglong),
29+
("ullTotalPageFile", ctypes.c_ulonglong),
30+
("ullAvailPageFile", ctypes.c_ulonglong),
31+
("ullTotalVirtual", ctypes.c_ulonglong),
32+
("ullAvailVirtual", ctypes.c_ulonglong),
33+
("sullAvailExtendedVirtual", ctypes.c_ulonglong),
34+
]
35+
36+
stat = MEMORYSTATUSEX()
37+
stat.dwLength = ctypes.sizeof(MEMORYSTATUSEX)
38+
if ctypes.windll.kernel32.GlobalMemoryStatusEx(ctypes.byref(stat)):
39+
return {
40+
"ok": True,
41+
"available_bytes": int(stat.ullAvailPhys),
42+
"total_bytes": int(stat.ullTotalPhys),
43+
"source": "windows_ctypes",
44+
}
45+
return {"ok": False, "reason": "GlobalMemoryStatusEx failed"}
46+
47+
48+
def _ram_info_linux_proc():
49+
path = "/proc/meminfo"
50+
if not os.path.exists(path):
51+
return {"ok": False, "reason": "/proc/meminfo not found"}
52+
values = {}
53+
with open(path, "r", encoding="utf-8") as f:
54+
for line in f:
55+
if ":" not in line:
56+
continue
57+
k, v = line.split(":", 1)
58+
num = v.strip().split()[0]
59+
if num.isdigit():
60+
values[k] = int(num) * 1024
61+
if "MemAvailable" in values and "MemTotal" in values:
62+
return {
63+
"ok": True,
64+
"available_bytes": int(values["MemAvailable"]),
65+
"total_bytes": int(values["MemTotal"]),
66+
"source": "linux_proc_meminfo",
67+
}
68+
return {"ok": False, "reason": "MemAvailable/MemTotal not found"}
69+
70+
71+
def _ram_info_macos_vmstat():
72+
try:
73+
total = int(
74+
subprocess.check_output(["sysctl", "-n", "hw.memsize"], text=True).strip()
75+
)
76+
vm_out = subprocess.check_output(["vm_stat"], text=True)
77+
page_size = 4096
78+
for line in vm_out.splitlines():
79+
if "page size of" in line:
80+
parts = line.split("page size of", 1)[1].strip().split(" ", 1)
81+
if parts and parts[0].isdigit():
82+
page_size = int(parts[0])
83+
break
84+
85+
free_pages = 0
86+
inactive_pages = 0
87+
speculative_pages = 0
88+
for line in vm_out.splitlines():
89+
norm = line.strip().replace(".", "")
90+
if ":" not in norm:
91+
continue
92+
key, val = norm.split(":", 1)
93+
num = val.strip().split(" ")[0].replace(".", "")
94+
if not num.isdigit():
95+
continue
96+
n = int(num)
97+
if key == "Pages free":
98+
free_pages = n
99+
elif key == "Pages inactive":
100+
inactive_pages = n
101+
elif key == "Pages speculative":
102+
speculative_pages = n
103+
104+
available = (free_pages + inactive_pages + speculative_pages) * page_size
105+
return {
106+
"ok": True,
107+
"available_bytes": int(available),
108+
"total_bytes": int(total),
109+
"source": "macos_vm_stat",
110+
}
111+
except Exception as e:
112+
return {"ok": False, "reason": f"vm_stat failed: {e}"}
113+
114+
115+
def get_ram_info():
116+
probe = _ram_info_psutil()
117+
if probe.get("ok"):
118+
return probe
119+
120+
system = platform.system()
121+
if system == "Windows":
122+
return _ram_info_windows_ctypes()
123+
if system == "Linux":
124+
return _ram_info_linux_proc()
125+
if system == "Darwin":
126+
return _ram_info_macos_vmstat()
127+
return {"ok": False, "reason": f"unsupported platform: {system}"}

0 commit comments

Comments
 (0)