feat: Implement cleanup methods for model handlers and enhance memory management#818
Closed
rajeshgangireddy wants to merge 3 commits intoopen-edge-platform:mainfrom
Closed
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses GPU/device OOMs during runtime model switching by adding explicit model-handler cleanup hooks and enforcing an unload-before-load workflow in the runtime pipeline so old model memory is released before loading a replacement.
Changes:
- Add
cleanup()toModelHandlerand implement cleanup behaviors for Torch/OpenVINO handlers (incl.gc.collect()and device cache clearing for Torch). - Update processor/pipeline/pipeline-manager lifecycle so switching the processor stops/unregisters/cleans up the old handler before creating the new one; drop pipeline references on activation/deactivation.
- Add unit tests covering cleanup behavior, device cache release, and unload-before-load ordering.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| application/backend/app/runtime/core/components/base.py | Adds ModelHandler.cleanup() base hook (no-op). |
| application/backend/app/runtime/core/components/models/torch_model.py | Adds release_device_memory() and Torch handler cleanup() to drop refs, GC, and clear CUDA/XPU cache. |
| application/backend/app/runtime/core/components/models/openvino_model.py | Adds OpenVINO handler cleanup() to drop refs and run GC. |
| application/backend/app/runtime/core/components/processor.py | Ensures Processor.stop() is safe without setup and triggers model_handler.cleanup(). |
| application/backend/app/runtime/core/components/pipeline.py | Clears component/thread registries on stop(); adds stop_component() for targeted component unload. |
| application/backend/app/runtime/pipeline_manager.py | Wires unload-before-load for processor updates; drops pipeline reference on activation/deactivation stops. |
| application/backend/app/runtime/core/components/factories/model.py | Passes device into TorchModelHandler so cleanup can release the correct device cache. |
| application/backend/tests/unit/runtime/test_pipeline_manager.py | Adds ordering test to ensure processor unload happens before new processor creation; asserts pipeline ref is dropped on deactivation. |
| application/backend/tests/unit/runtime/core/components/test_processor.py | Adds tests ensuring stop() triggers handler cleanup and is safe pre-setup / with passthrough handler. |
| application/backend/tests/unit/runtime/core/components/test_pipeline.py | Adds tests verifying stop() clears internal registries and stop_component() behavior. |
| application/backend/tests/unit/runtime/core/components/models/test_torch_model.py | Adds tests for Torch cleanup + release_device_memory() behavior across devices. |
| application/backend/tests/unit/runtime/core/components/models/test_openvino_model.py | Adds tests for OpenVINO cleanup and idempotency. |
| application/backend/tests/unit/runtime/core/components/factories/test_model.py | Updates expectations for Torch handler construction with device=. |
application/backend/app/runtime/core/components/models/torch_model.py
Outdated
Show resolved
Hide resolved
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Signed-off-by: rajeshgangireddy <rajesh.gangireddy@intel.com>
Contributor
Author
|
PR will be closed as #907 takes care of this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Solves #798
Ensure GPU/device memory is fully released when switching models at runtime. Previously, loading a new model (e.g. on prompt or config change) kept the old model in device memory. This caused out of memory errors after switching models after 3-4 times.
This PR adds explicit
cleanup()methods to allModelHandlersubclasses and wires an unload-before-load pattern into the pipeline manager so the old model's memory is reclaimed before the replacement is loaded.ModelHandler(base) - Addedcleanup()no-op method so every handler is cleanableTorchModelHandler-cleanup()nulls model & reference batch, callsgc.collect(), thentorch.cuda.empty_cache()ortorch.xpu.empty_cache()OpenVINOModelHandler-cleanup()nulls model & reference batch, callsgc.collect()Unload-before-load wiring
Processor._stop()- Now callsself._model_handler.cleanup()and guards against un-setup broadcasterPipeline.stop()- Clears_componentsand_threadsdicts after stopping all componentsPipeline.stop_component()- New method to stop and remove a single component typePipelineManager._update_pipeline_components()- Callspipeline.stop_component(Processor)before creating the new processorPipelineManager.on_config_change()- Setsself._pipeline = Noneafter stop on activation/deactivation to drop the referenceTests (10 new)
TorchModelHandler: cleanup frees refs, clears CUDA cache, clears XPU cache, idempotent callOpenVINOModelHandler: cleanup frees refs, idempotent callrelease_device_memory: CUDA, CUDA with index, XPU, CPU no-opProcessor: stop triggers cleanup, stop without setup is safe, PassThrough handler is safePipeline: stop clears dicts,stop_componentremoves single component, noop for missing componentPipelineManager: processor update callsstop_componentbeforecreate_processor(ordering verified)Memory lifecycle during a model switch:
Description
Type of Change
feat- New featurefix- Bug fixdocs- Documentationrefactor- Code refactoringtest- Testschore- MaintenanceRelated Issues
Breaking Changes
Examples
Screenshots