Skip to content

feat: Implement cleanup methods for model handlers and enhance memory management#818

Closed
rajeshgangireddy wants to merge 3 commits intoopen-edge-platform:mainfrom
rajeshgangireddy:feature/cleaner_model_memory
Closed

feat: Implement cleanup methods for model handlers and enhance memory management#818
rajeshgangireddy wants to merge 3 commits intoopen-edge-platform:mainfrom
rajeshgangireddy:feature/cleaner_model_memory

Conversation

@rajeshgangireddy
Copy link
Copy Markdown
Contributor

@rajeshgangireddy rajeshgangireddy commented Mar 4, 2026

Solves #798

Ensure GPU/device memory is fully released when switching models at runtime. Previously, loading a new model (e.g. on prompt or config change) kept the old model in device memory. This caused out of memory errors after switching models after 3-4 times.

This PR adds explicit cleanup() methods to all ModelHandler subclasses and wires an unload-before-load pattern into the pipeline manager so the old model's memory is reclaimed before the replacement is loaded.

  • ModelHandler (base) - Added cleanup() no-op method so every handler is cleanable
  • TorchModelHandler - cleanup() nulls model & reference batch, calls gc.collect(), then torch.cuda.empty_cache() or torch.xpu.empty_cache()
  • OpenVINOModelHandler - cleanup() nulls model & reference batch, calls gc.collect()

Unload-before-load wiring

Processor._stop() - Now calls self._model_handler.cleanup() and guards against un-setup broadcaster
Pipeline.stop()- Clears _components and _threads dicts after stopping all components
Pipeline.stop_component() - New method to stop and remove a single component type
PipelineManager._update_pipeline_components() - Calls pipeline.stop_component(Processor) before creating the new processor
PipelineManager.on_config_change() - Sets self._pipeline = None after stop on activation/deactivation to drop the reference

Tests (10 new)

  • TorchModelHandler: cleanup frees refs, clears CUDA cache, clears XPU cache, idempotent call
  • OpenVINOModelHandler: cleanup frees refs, idempotent call
  • release_device_memory: CUDA, CUDA with index, XPU, CPU no-op
  • Processor: stop triggers cleanup, stop without setup is safe, PassThrough handler is safe
  • Pipeline: stop clears dicts, stop_component removes single component, noop for missing component
  • PipelineManager: processor update calls stop_component before create_processor (ordering verified)

Memory lifecycle during a model switch:

1. User changes prompt / config
2. PipelineManager receives ComponentConfigChangeEvent(PROCESSOR)
3. pipeline.stop_component(Processor)
   └─ Processor.stop()
      └─ _stop()
         ├─ broadcaster.unregister()
         └─ model_handler.cleanup()
             ├─ self._model = None
             ├─ self._reference_batch = None
             ├─ gc.collect()
             └─ torch.cuda.empty_cache()   # old model memory freed
4. factory.create_processor(...)            # new model loaded into now-free memory
5. pipeline.set_processor(new_processor)

Description

Type of Change

  • feat - New feature
  • 🐞 fix - Bug fix
  • 📚 docs - Documentation
  • ♻️ refactor - Code refactoring
  • 🧪 test - Tests
  • 🔧 chore - Maintenance

Related Issues

Breaking Changes


Examples

Screenshots

@rajeshgangireddy rajeshgangireddy marked this pull request as ready for review March 4, 2026 13:25
Copilot AI review requested due to automatic review settings March 4, 2026 13:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses GPU/device OOMs during runtime model switching by adding explicit model-handler cleanup hooks and enforcing an unload-before-load workflow in the runtime pipeline so old model memory is released before loading a replacement.

Changes:

  • Add cleanup() to ModelHandler and implement cleanup behaviors for Torch/OpenVINO handlers (incl. gc.collect() and device cache clearing for Torch).
  • Update processor/pipeline/pipeline-manager lifecycle so switching the processor stops/unregisters/cleans up the old handler before creating the new one; drop pipeline references on activation/deactivation.
  • Add unit tests covering cleanup behavior, device cache release, and unload-before-load ordering.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
application/backend/app/runtime/core/components/base.py Adds ModelHandler.cleanup() base hook (no-op).
application/backend/app/runtime/core/components/models/torch_model.py Adds release_device_memory() and Torch handler cleanup() to drop refs, GC, and clear CUDA/XPU cache.
application/backend/app/runtime/core/components/models/openvino_model.py Adds OpenVINO handler cleanup() to drop refs and run GC.
application/backend/app/runtime/core/components/processor.py Ensures Processor.stop() is safe without setup and triggers model_handler.cleanup().
application/backend/app/runtime/core/components/pipeline.py Clears component/thread registries on stop(); adds stop_component() for targeted component unload.
application/backend/app/runtime/pipeline_manager.py Wires unload-before-load for processor updates; drops pipeline reference on activation/deactivation stops.
application/backend/app/runtime/core/components/factories/model.py Passes device into TorchModelHandler so cleanup can release the correct device cache.
application/backend/tests/unit/runtime/test_pipeline_manager.py Adds ordering test to ensure processor unload happens before new processor creation; asserts pipeline ref is dropped on deactivation.
application/backend/tests/unit/runtime/core/components/test_processor.py Adds tests ensuring stop() triggers handler cleanup and is safe pre-setup / with passthrough handler.
application/backend/tests/unit/runtime/core/components/test_pipeline.py Adds tests verifying stop() clears internal registries and stop_component() behavior.
application/backend/tests/unit/runtime/core/components/models/test_torch_model.py Adds tests for Torch cleanup + release_device_memory() behavior across devices.
application/backend/tests/unit/runtime/core/components/models/test_openvino_model.py Adds tests for OpenVINO cleanup and idempotency.
application/backend/tests/unit/runtime/core/components/factories/test_model.py Updates expectations for Torch handler construction with device=.

Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
@rajeshgangireddy
Copy link
Copy Markdown
Contributor Author

PR will be closed as #907 takes care of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants