Skip to content

cluster_yaml is not cleaned up #5011

Open
@cg505

Description

@cg505

During review for #4980, it was noted that CloudVmRayBackend.remove_cluster_config has a bug that prevents it from actually deleting the config:

handle.cluster_yaml = None
global_user_state.update_cluster_handle(handle.cluster_name, handle)
common_utils.remove_file_if_exists(handle.cluster_yaml) # but cluster_yaml was already set to None...

The fix is obvious, but we should be careful since it could introduce a leak if other code was unintentionally relying on this.

While looking at this, I noticed in post_teardown_cleanup that we call remove_cluster_config in the terminate case

self.remove_cluster_config(handle)
and then later there is an if statement that depends on handle.cluster_yaml...
if handle.cluster_yaml is not None:
_detect_abnormal_non_terminated_nodes(handle)
So we will never hit this.

We need to look carefully at this function and actually specify the expected invariant for when handle.cluster_yaml will/won't be set and when the file will/won't exist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions