Description
During review for #4980, it was noted that CloudVmRayBackend.remove_cluster_config
has a bug that prevents it from actually deleting the config:
handle.cluster_yaml = None
global_user_state.update_cluster_handle(handle.cluster_name, handle)
common_utils.remove_file_if_exists(handle.cluster_yaml) # but cluster_yaml was already set to None...
The fix is obvious, but we should be careful since it could introduce a leak if other code was unintentionally relying on this.
While looking at this, I noticed in post_teardown_cleanup
that we call remove_cluster_config
in the terminate case
skypilot/sky/backends/cloud_vm_ray_backend.py
Line 4294 in a7f9295
skypilot/sky/backends/cloud_vm_ray_backend.py
Lines 4360 to 4361 in a7f9295
We need to look carefully at this function and actually specify the expected invariant for when handle.cluster_yaml will/won't be set and when the file will/won't exist.