VCF 9.0.1 Hangs during VCF Automation Deployment #103
Replies: 31 comments
-
|
I do see this in the deployment and it references the following KB: However, since this is a fully automated deployment, I am not sure non-lowercase names plays a part. |
Beta Was this translation helpful? Give feedback.
-
|
There are a few ways to troubleshoot this.
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
.GenericStartVCFTask","finishState":"","errorState":"","properties":{},"uiProperties":{"displayText":"overall fips status collection for vcf","displayKey":"vmf::sm::overallfipsstatuscollectionforvcf"},"nodes":[{"symbolicName":"com.vmware.vrealize.lcm.vcf.plugin.tasks.GenericStartVCFTask","symbolicNameTxt":"overallfipsstatuscollectionforvcf-com.vmware.vrealize.lcm.vcf.plugin.tasks.GenericStartVCFTask","type":"SIMPLE","task":"com.vmware.vrealize.lcm.vcf.plugin.tasks.GenericStartVCFTask","properties":{},"uiProper Here is an error I found occurring. I am attaching extended lines of the log in the logs. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I don't see anything specific here. Can you try #91 (comment) |
Beta Was this translation helpful? Give feedback.
-
Can you try step 2 and see what the exact error stack is? #91 (comment) |
Beta Was this translation helpful? Give feedback.
-
|
Here is |
Beta Was this translation helpful? Give feedback.
-
This file does not have the deployment logs. Are there any other domainmanager log files in that folder? |
Beta Was this translation helpful? Give feedback.
-
|
These are the folders and files located in the path. What could be helpfully? thank you
|
Beta Was this translation helpful? Give feedback.
-
|
You will want to untar the domainmanager.2026... files and look at those log files to see what the error is. Another option is to retry the deployment from the VCF Installer UI and then the domainmanager.log file will get new logs added including the error log but deployment/failure may take time. |
Beta Was this translation helpful? Give feedback.
-
the uncompressed logs don't show any error. Is there any log form other appliance helpfully? |
Beta Was this translation helpful? Give feedback.
-
yes, you can follow step 1 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
If you go into the Components tab shown in your screenshot and look at VCF Automation, do you see any details? |
Beta Was this translation helpful? Give feedback.
-
Hi, this is the new log file with errors. |
Beta Was this translation helpful? Give feedback.
-
|
As you can see in the file shared: Binary upload for VCFA to VCF Ops is failing. This is not really a Holodeck thing, but a VCF thing. Are you using an offline depot or online? If offline depot, can you verify the checksum for the VCFA binary that you've placed in the offline depot and compare it with the checksum from the Broadcom Support Portal value? If they don't match, you will need to fix the binary. If that is not the issue, can you share the full domain manager log and not just the grepped Error log to see the stack trace has any more details |
Beta Was this translation helpful? Give feedback.
-
Is the online depot... no network or internet restrictions. actually y see some timeskew errors |
Beta Was this translation helpful? Give feedback.
-
|
I am using the online repository as well. |
Beta Was this translation helpful? Give feedback.
-
|
There are a few errors I can see from the logs: The first error says NO_CONTENT. You can log into VCF Ops CLI and check if the file actually exists in the path or not. This error is visible only once and moves on to the next error which is repeated, so I'm thinking the file was available after thee first error. In this case the connection was reset by peer. This is seen multiple times and I'm not really sure why. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
When I restarted the deployment, the SDDC manager in the lab produced these errors in the GUI when I was watching it. Not sure if this is helpful or not. |
Beta Was this translation helpful? Give feedback.
-
|
Seems like it has now gotten past that portion of the deployment and continued. |
Beta Was this translation helpful? Give feedback.
-
|
Great! |
Beta Was this translation helpful? Give feedback.
-
|
So, I have now been able to duplicate this issue. While I had gotten past the error reported above, I started having trouble and failures on the NSX that I reported as bug #94. So, with having previous problems, I ran Remove-HoloDeckInstance [-ResetHoloRouter] and decided to start over. I ran the same deployment command shown above, and all seemed to be going well until I hit the automation section, where it seemed to hang with the same error originally reported. The system is configured with the workaround published in bug #89 and deploying with command "New-HoloDeckInstance -Version '9.0.1.0' -InstanceID 'gray' -WorkloadDomainType 'SharedSSO' -NsxEdgeClusterMgmtDomain -NsxEdgeClusterWkldDomain -DeployVcfAutomation -DeploySupervisor" One thing I noticed this time was that when I tried to log into the GUI for the installer, I received a "not authorized" message for the admin@local account. I ended up switching to root and running "sudo systemctl restart commonsvcs" which fixed the login issue. As soon as the deployment times out, I am going to see if running the command again completes the installation of the VCF Automation as before. |
Beta Was this translation helpful? Give feedback.
-
|
So, it kept hangs at the Automation deployment as before. I rebooted both the Holorouter and the app installer appliances. I restarted the deployment, and as shown above, the Automation deployment completed. However, I when I got to the NSX deployment, I started hitting the same error I reported in #94. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
VCFA deployment generally fails when there is high CPU. But here's how you can check further:
and you can look for more error details at the k8s level. |
Beta Was this translation helpful? Give feedback.
-
|
So, restarting from the failed state seemed to have gotten past that point and moved onto the NSX Edge deployment. So, the automation portion is now deployed. |
Beta Was this translation helpful? Give feedback.
-
|
Great. Moving this to a discussion as we did not find any specific bug in Holodeck, but this is a good guide for troubleshooting VCFA deployment |
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I am trying to deploy a full stack with the command:
New-HoloDeckInstance -Version '9.0.1.0' -InstanceID 'gray' -WorkloadDomainType 'SharedSSO' -NsxEdgeClusterMgmtDomain -NsxEdgeClusterWkldDomain -DeployVcfAutomation -DeploySupervisor
This works fine until it goes to deploy VCF Automation and ends up showing the error:
10-02-2026 05:57:47 SddcMgmtDomain[2175]: [ERROR] Management Domain deployment failed. Check the logs below for more details
10-02-2026 05:57:47 SddcMgmtDomain[2175]: [ERROR] @{name=Retrieve the status of VCF Automation Deployment request; description=Retrieve the status of VCF Automation Deployment request; status=COMPLETED_WITH_FAILURE; creationTimestamp=02/09/2026 19:54:19; updateTimestamp=02/10/2026 05:56:32; errors=System.Object[]}
The deployment retries and then hangs with the holo router showing:
10-02-2026 12:43:56 SddcMgmtDomain[2175]: [INFO] Current task in progress: Retrieve the status of VCF Automation Deployment request
10-02-2026 12:43:56 SddcMgmtDomain[2175]: [INFO] Management Domain is not ready yet. Sleeping for 5 mins
And has been showing this for over 12 hours.
Reproduction steps
1.New-HoloDeckInstance -Version '9.0.1.0' -InstanceID 'gray' -WorkloadDomainType 'SharedSSO' -NsxEdgeClusterMgmtDomain -NsxEdgeClusterWkldDomain -DeployVcfAutomation -DeploySupervisor
2.
3.
...
Expected behavior
Complete the deployment without issue.
Additional context
This is my second attempt, after deleting all previous VM's, including the Holotrouter and redeploying. Seems to hang as the same point. I had been having some DNS issues, etc. with my previous attempt and thought it maybe related. I deleted everything related to the VCF 9 deployment, including the Holo Router and started over. This time it all worked fine until I hit this step again.
Beta Was this translation helpful? Give feedback.
All reactions