Skip to content

Flux2TEModel_ GGUF appears to be offloaded to RAM instead of fully released, causing OOM on low-RAM systems #14433

@Nonameagai

Description

@Nonameagai

Custom Node Testing

Your question

Title: Flux2TEModel_ GGUF appears to be offloaded to RAM instead of fully released, causing OOM on low-RAM systems

Environment:

  • ComfyUI v0.24
  • Google Colab
  • Tesla T4 (15GB VRAM)
  • ~13GB system RAM
  • Flux2 Klein 9B GGUF Q8
  • Flux2TEModel_ GGUF text encoder (Qwen 3 8B GUFF Q8)

Observed behavior:

After text encoding, cleanup/unload nodes are executed.

ComfyUI reports:

  • Flux2TEModel_ loaded (~10GB)
  • Partial unload from VRAM
  • Flux UNet then loads (~9.7GB)

In many runs, the text encoder seems to be moved from VRAM into system RAM rather than being fully released.

RAM usage often reaches 90-96%.

Two outcomes are observed:

  1. RAM pressure cache eventually frees memory and RAM drops back to ~30-40%, workflow completes successfully.
  2. RAM never drops and the process OOMs while loading the UNet.

Additional notes:

  • Reproduced with multiple unload/cleanup nodes.
  • ComfyUI logs indicate RAM pressure cache is active.
  • The issue seems related to model retention/offloading rather than pure VRAM exhaustion.

Question:

Is there a supported mechanism to force complete disposal of a GGUF text encoder instead of CPU offloading?

Are there known cases where RAM pressure cache retains GGUF models or Flux2TEModel_ references longer than expected?

Any recommended debugging steps to determine which object/reference prevents memory reclamation?

Logs

Other

No response

Metadata

Metadata

Assignees

Labels

User SupportA user needs help with something, probably not a bug.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions