power-failure.html.md.erb

---
title: Checking Pivotal Platform State after a Power Failure on vSphere
owner:
---

<% current_page.data.title = "Checking " + vars.product_name + " State after a Power Failure on vSphere" %>

This topic describes how to check <%= vars.first_product_name  %> state after a power failure in an on-premises vSphere installation. 

If you have a procedure at your company for handling power failure scenarios and would to like add steps for checking that <%= vars.product_name %> is in a good state, you can use this procedure as a template.  

## <a id="overview"></a> Overview  

This section describes the process used by <%= vars.product_name %> to recover from power failures and exceptions to that process.

### <a id="recovery-process"></a> Automatic Recovery Process   

When power returns after a failure, vSphere and <%= vars.product_name %> automatically do the following to recover your environment:

1. vSphere High Availability (HA) recovers VMs. 
2. BOSH ensures the processes on those VMs are healthy, with the exception of the Ops Manager VM and the BOSH VM itself. <%= vars.product_name %> uses BOSH to deploy and manage its VMs. For more information, see [BOSH](https://bosh.io/docs). 
3. The Diego runtime of Pivotal Application Service (PAS) recovers apps that were running on the VMs. For more information, see [Diego](../concepts/diego/diego-architecture.html). 

### <a id="manual-recovery-scenarios"></a> Scenarios that Require Manual Intervention

There are two scenarios that can require manual intervention when recovering your environment after a power failure:

* If PAS is configured to use a MySQL cluster instead of a single node, the cluster does not recover automatically. 
* If you have Ops Manager v2.5.3 or earlier and encounter the following known issue in the BOSH Director: [Monit inaccurately reports the health of UAA](https://docs.pivotal.io/pivotalcf/2-5/pcf-release-notes/opsmanager-rn.html#monit). 

The procedure in this topic includes more detail about addressing these scenarios.

## <a id="checklist"></a> Checklist  

Use the checklist in this section to ensure <%= vars.product_name %> is in a good state after a power failure. It includes links to sections that contain more detail about each phase. 

This checklist assumes your <%= vars.product_name %> on vSphere installation is set up for vSphere HA and you have the BOSH Resurrector enabled.

<table>
<tr>
<th>Phase</th>
<th>Component</th>
<th>Action</th>
</tr>
<tr>
<td>1</td>
<td>vSphere</td>
<td><a href="#check-vSphere">Ensure vSphere is Running</a></td>
</tr>
<tr>
<td>2</td>
<td>Ops Manager</td>
<td><a href="#check-ops-manager">Ensure Ops Manager is Running</a></td>
</tr>
<tr>
<td>3</td>
<td>BOSH Director</td>
<td><a href="#check-bosh">Ensure BOSH Director is Running</a></td>
</tr>
<tr>
<td>4</td>
<td>BOSH Director</td>
<td><a href="#resurrector">Ensure BOSH Resurrector Finished Recovering</a></td>
</tr>
<tr>
<td>5</td>
<td>PAS</td>
<td><a href="#check-pas">Ensure PAS VMs are Running</a> (This may include manually recovering the MySQL cluster)</td>
</tr>
<tr>
<td>6</td>
<td>PAS</td>
<td><a href="#check-apps">Ensure Apps Hosted on PAS are Running</a></td>
</tr>
<tr>
<td>7</td>
<td><%= vars.product_name %> Healthwatch</td>
<td><a href="#check-hw">Check the Healthwatch Dashboard</a></td>
</tr>
</table>

## <a id="check-vSphere"></a> Phase 1: Ensure vSphere is Running

Ensure that vSphere is running and has fully recovered from the power failure. Check your internal vSphere monitoring dashboard. 

## <a id="check-ops-manager"></a> Phase 2: Ensure Ops Manager is Running

To ensure Ops Manager is running, do the following:

1. Open vCenter and navigate to the resource pool that hosts your <%= vars.product_name %> deployment.

1. Select the **Related Objects**, and then **Virtual Machines**. 

1. Locate the VM with the name `OpsMan-VERSION`, such as `OpsMan-2.6`. 

1. Review the **State** and **Status** columns for the Ops Manager VM. If Ops Manager is running, they say **Powered On** and **Normal**. If this is not the case, restart the VM. 

## <a id="check-bosh"></a> Phase 3: Ensure BOSH Director is Running

To ensure BOSH Director is running, do the following:

1. In a browser, navigate to the <%= vars.product_name %> Ops Manager UI and select the **BOSH Director for vSphere** tile. 
	<p class="note"><strong>Note</strong>: If you do not know the URL of the Ops Manager VM, you can use the IP address from vCenter.</p>

1. Select **Status**. 

1. In the **BOSH Director** row, record the **CID**. The CID is the cloud ID and corresponds to the VM name in vSphere. 

1. Navigate to the vCenter resource pool or cluster that hosts your <%= vars.product_name %> deployment.

1. Select **Related Objects**, and then **Virtual Machines**.

1. Locate the VM with the name that corresponds to the **CID** value you copied. 

1. Review the **State** and **Status** columns for the VM. If the **State** is not **Powered On**, restart the VM. 

1. If the VM is **Powered On** but **Status** does not display **Normal**, it may be due the following known issue: [Monit inaccurately reports the health of UAA](https://docs.pivotal.io/pivotalcf/2-5/pcf-release-notes/opsmanager-rn.html#monit). To resolve this issue, do the following:
	1. SSH into the BOSH Director VM using the instructions in [SSH into the BOSH Director VM](../customizing/trouble-advanced.html#bosh-director-ssh). 
	1. Run the following command to see that all processes are running:

		```
		monit summary
		```

	1. If the `uaa` process is not running, run the following command:

		```
		monit restart UAA
		``` 

## <a id="resurrector"></a> Phase 4: Ensure BOSH Resurrector Finished Recovering

If enabled, the BOSH Resurrector re-creates any VMs in a problematic state after being recovered by vSphere HA. 

To ensure BOSH Resurrector finished recovering, do the following:

1. Log in to the Ops Manager VM with SSH using the instructions in [Log in to the Ops Manager VM with SSH](../customizing/trouble-advanced.html#ssh). 

1. Authenticate with the BOSH Director VM using the instructions in [Authenticate with the BOSH Director VM](../customizing/trouble-advanced.html#log-in). 

1. Run the following command to see if there is any currently running or queued Resurrector activity:

	```
	bosh tasks --all -d ''
	``` 
	Look for `scan` and `fix` in the task description. If there are no tasks running, it is likely that BOSH Director has finished recovering. You can also run `bosh tasks --recent --all -d ''` to view finished tasks.

## <a id="check-pas"></a> Phase 5: Ensure PAS VMs are Running

<p class="note"><strong>Note</strong>: You can also apply the steps in this section to any <%= vars.product_name %> services. To further ensure the health of <%= vars.product_name %> services, use the <%= vars.product_name %> Healthwatch dashboard and the documentation for each service.</p>

To ensure PAS VMs are running, do the following:

1. Run the following command to confirm that VMs are running:

	```
	bosh vms
	``` 
	BOSH lists VMs by deployment. The deployment with the `cf-` prefix is the PAS deployment. 


1. If the `mysql` VM is not running, it is likely because it is a cluster and not a single node. Clusters require manual intervention after an outage. See [Manually Recover PAS MySQL (Clusters Only)](#check-mysql) to confirm and recover the cluster. 

1. If any other VMs are not running, run the following command:
	
	```
	bosh cck -d DEPLOYMENT
	```
	This command scans for problems and provides options for recovering VMs. For more information, see [IaaS Reconciliation](https://bosh.io/docs/cck/) in the BOSH documentation. 

1. If you cannot get all VMs running, contact [Pivotal Support](https://support.pivotal.io) for assistance. Provide the following information:
	* You have started this checklist to recover from a power failure on vSphere
	* A list of failing VMs
	* Your <%= vars.product_name %> version

### <a id="check-mysql"></a> Manually Recover PAS MySQL (Clusters Only)

To manually recover PAS MySQL, do the following:

1. In a browser, navigate to the <%= vars.product_name %> Ops Manager UI and select the **Pivotal Application Service** tile. 

1. Select the **Resource Config** pane. 

1. Review the **INSTANCES** column of the **MySQL Server** job. If the number of instances is greater than `1`, manually recover MySQL by following this procedure: [Recovering From MySQL Cluster Downtime](../mysql/bootstrap-mysql.html). 

## <a id="check-apps"></a> Phase 7: Ensure Apps Hosted on PAS are Running

To ensure apps hosted on PAS are running, do the following:

1. Check the status of an app your company runs on <%= vars.product_name %>. Run any healthchecks that the app has  or visit the URL of the app to see that it is working. 

1. Push an app to <%= vars.product_name %>. 


## <a id="check-hw"></a> Phase 8: Check the Healthwatch Dashboard

You can use <%= vars.product_name %> Healthwatch to further assess the state of <%= vars.product_name %>. For more information, see [Using <%= vars.product_name %> Healthwatch](https://docs.pivotal.io/pcf-healthwatch/using.html).