It’s been a month, i was busy to make our environment more stable, a lot of troubleshooting, webex session and discussing. Few days ago I noticed random VMs kept vMotion constantly. Some VMs got strange situation, show orphan, invalid or unknown status, but still online.
I couldn’t find any evidence why the VMs went to these status. One more thing I noticed was CPU and memory utilization of ESXi 5.1 shows 0 on vCenter server 5.1.
Following statement is not mature conclusion, it’s my inference according to DRS, HA and that particular 0 value CPU/memory. I also discussed that with VMware BCS support.
VM changed to abnormal status due to vMotion interrupted by something, more like HA kicked off due to network/storage intermittent failed. That become high chance since DRS kept try move heavy workload VM to 0 CPU/memory host.
You have to upgrade to ESXi 5.1 latest version or vCenter Server 5.1 update 1c to permanent fix this problem.
Workaround:
Choose one option from following options, that’s temporary solution, issue will present again.
1. Restart ESXi management agent.
2. Disconnect/reconnect ESXi on vSphere client.
Update: you have to upgrade ESXi host and vcenter server both to permanent fix the problem.