-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I've been noticing terraform getting stuck trying to interact with ECP a lot lately.
For example, I attempted to destroy a cluster that failed to completely deploy with make clean.. terraform destroy started running and eventually stalled (presumably due to network/VPN hiccup). Four hours later, nothing was progressing, the process can't be terminated without kill -9, and terraform leaves the resources in a 'locked' state.
I don't know a way to recover from this and have wasted a ton of time trying to address it already; terraform does have a force-unlock subcommand, but attempting to run that yields Local state cannot be unlocked by another process . When this happened previously, I manually deleted the lockfile but that didn't allow terraform destroy to run again either.
I ended up having to delete the buildir manually and spend about 2 hours drilling into resources in the openstack console to ensure everything was cleaned up, but we should either determine a way to recover from this kind of scenario (which I've now hit again) in a graceful way that allows terraform to clean things up, or provide another subcommand in catapult to clean resources from ECP using the openstack CLIs instead of terraform