• Error 12711 VMM cannot complete the WMI operation on the server because of an error

    Finally I implemented Hyper-V 2012 and SCVMM 2012 R2 on my lab, unfortunately FreeNAS does not supports SCSI-3 persistent reservation of Windows Server 2012 R2, you can refer bug #4003. It lead to my iSCSI storage cannot be brought online in Failover Cluster. I have to find out alternative.

    I decided to use Windows Server File Server instead of iSCSI eventually. There are bunch of benefit to use that to leverage new SMB 3.0 technology. Key is it supports high available.

    Followed the guide I successful created first shares for Hyper-V cluster, I created a testing VM but cannot power it on. It show me:

    Error (12711)
    VMM cannot complete the WMI operation on the server (dcahyv02.contoso.com) because of an error: [MSCluster_Resource.Name="SCVMM test (1)"] The cluster resource could not be brought online by the resource monitor.

    The cluster resource could not be brought online by the resource monitor (0x139A)

    Recommended Action
    Resolve the issue and then try the operation again.

    I went to cluster service manager on a Hyper-V host, event logs show me:

    Cluster resource ‘SCVMM test (1)’ of type ‘Virtual Machine’ in clustered role ‘SCVMM test Resources (1)’ failed. The error code was ‘0x80004005’ (‘Unspecified error’).

    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    Initially I suspected that’s a problem of new file server, or SCVMM bug. But problem was it cannot be brought up even I created a VM on Hyper-V host directly. It gave me this error:

    Virtual machine ‘test’ could not be started because the hypervisor is not running (Virtual machine ID AE786CAA-C74B-4F9E-8867-30191197087B). The following actions may help you resolve the problem: 1) Verify that the processor of the physical computer has a supported version of hardware-assisted virtualization. 2) Verify that hardware-assisted virtualization and hardware-assisted data execution protection are enabled in the BIOS of the physical computer.  (If you edit the BIOS to enable either setting, you must turn off the power to the physical computer and then turn it back on.  Resetting the physical computer is not sufficient.) 3) If you have made changes to the Boot Configuration Data store, review these changes to ensure that the hypervisor is configured to launch automatically.

    Hypervisor is not running… it indicates something related to virtualization layer. I finally realized that I was running Hyper-V host on VMware Workstation on a laptop, it’s twice nested VM, something wrong!

    I followed my article How to configure nested Hyper-V VM on VMware Workstation to fix this problem.


  • Windows cannot be installed on drive 0 partition 1

    I think Windows Server 2012 will be next popular server OS just like Windows Server 2008, it’s also a nice hypervisor OS on virtual world. How do you think?

    Installation is first step to experience the wonderful OS, you may see some strange problem during that step just like me. Today’s topic occurred long time ago, just want to share with people who may face similar issue like me.

    That’s HP blade system with local disk attached, you may see similar problem on other vendor. When you select disk to install OS, installer may says Windows can’t be installed on drive 0 partition 1, or Windows cannot be installed on this disk. This computer’s hardware may not support booting to this disk. Ensure that the disk’s controllers is enabled in the computer’s BIOS menu.

    That’s because boot volume is not set on array controller. For example by HP servers, you have to reboot and press F8 after BIOS checks array controller to enter array controller management interface. Then go to Select Boot Volume in main menu, select Direct Attached Storage, and then select the disk you want to install OS. Follow up the wizard to continue boot up.

    If the problem persists, go to array controller management interface, rebuild array and select boot volume again, it should fix your problem.


  • Google AdSense available on my blog

    About one month ago, I requested Google AdSense for my blog, I almost forgot that request due to the busy life. My friend Saju told me his IT blog has Google AdSense, that’s reminded me I have a pending AdSense. It was blank after I set it up in my blog, Today morning it’s finally show ADs…that’s not a relative of money, it’s just part of IT blog. lol

    Still in memory, my first Google AdSense check was 10 years ago, I still remember it’s $200, my friend and me was so exciting when we known the check arrived China, that’s first time I made USD, probably first time saw how USD looks like. 🙂 Google AdSense…it brought back memories, it’s tough time for me in my life, but I still want to thanks my family, my friends and everyone who supported me.

    时过境迁,那时候的事情在我的心里不再是仇恨和痛苦,这是我一生中的一小段经历、经验和做为一个男人应有的挫折。希望未来会更好。


  • Nodes in the ESXi cluster may report corruption after reboot host or attach device

    VCE just released a new KB vce2563 to description the issue.

    If your ESXi 5.x hosts is connected on VMAX running Enginuity 5876.159.102 and later, you may see this particular issue after reboot ESXi host or attach storage if you enabled block delete feature of VAAI.

    To check the option status you can run following command on PowerCLI:

     Get-VMHost -Location cluster name | Get-VMHostAdvancedConfiguration -Name VMFS3.EnableBlockDelete


  • Error 2931 The connection to the VMM agent on the virtualization server was lost

    Windows Server 2012, the biggest competitor of VMware vSphere. There are adequate reason to use Hyper-V 2012 instead of vSphere 5.x, but it’s still very hard to for newbie, we spend more than 30 hours to try figure out how to create cluster on SCVMM 2012 SP1, the software is easy to install, but hard to configure. I saw “failed” everywhere, it’s not a mature product in my view.

    We installed Windows Server 2012 data center edition on HP BL460, storage is NetApp FAS2240 (Maybe wrong, I’m not storage guy). We got following error message when we created Hyper-V Cluster on SCVMM2012 SP1.

    Error (2931)
    VMM is unable to complete the request. The connection to the VMM agent on the virtualization server (xxx) was lost.
    Unknown error (0x80338029)

    Recommended Action
    Ensure that the Windows Remote Management (WS-Management) service and the VMM agent are installed and running and that a firewall is not blocking HTTPS traffic.

    This can also happen due to DNS issues. Try and see if the server (dcahyv04.amat.com) is reachable over the network and can be looked up in DNS. You can ping the virtualization server from VMM management server and make sure that the IP address returned matches the IP address locally obtained from the virtualization server.

    If the error still persists, restart the virtualization server, and then try the operation again.

    SCVMM job failed on Mounts storage disk on xxxx.

    Initially I thought that’s something wrong with services, I checked the mentioned Windows Remote Management service, but it’s up and running. Then I noticed WINS servers was not set, but still no lucky.

    Why the job always failed on mount storage? Maybe something related to disk operation? SCVMM server is remote server, it must be operates disk remotely, so I tried connect Hyper-V server by Computer Management tool remotely, it show me RPC is unavailable when I click Disk Management node. Aha…firewall problem, that’s because SCVMM server disabled firewall, but Hyper-V server enabled, the RPC ports was blocked by client side.

    Sometimes cluster creating can be successful after I disabled firewall, but still Hyper-V server looks like hard to mount storage.

    Since SCVMM mount/unmount storage on each Hyper-V hosts during cluster creating, it takes very long time to mount storage before the job failed, we suspected that’s something related to storage, finally, we installed NetApp Host Utilities 6.0.1 and NTAP MPIO 4.0.1 to solved the problem.

    To summarize, you must enable remote management for Hyper-V host, such as Remote Register service…etc, allows required ports in Windows firewall and storage MPIO plugin should be installed as well. BTW, you should disable UAC on Windows Server 2012, it’s different with Windows Server 2008, check http://social.technet.microsoft.com/wiki/contents/articles/13953.windows-server-2012-deactivating-uac.aspx.

    That’s just first step to make Hyper-V successful. 🙂


  • How to Add VMware PowerCLI to Standard PowerShell Enviroenment

    1. create a file with name “Profile.ps1” under %windir%system32WindowsPowerShellv1.0profile.ps1

    2. Add following content to the file.

    # Adds the base cmdlets
    Add-PSSnapin VMware.VimAutomation.Core
    # Add the following if you want to do things with Update Manager
    #Add-PSSnapin VMware.VumAutomation
    # This script adds some helper functions and sets the appearance. You can pick and choose parts of this file for a fully custom appearance.
    . “C:Program FilesVMwareInfrastructurevSphere PowerCLIScriptsInitialize-PowerCLIEnvironment.ps1”

    You need administrator permission to create file on the system32 location.


  • How to configure vSAN on nested ESXi hosts with SSD hard disk

    There are lot of articles introduce vSAN feature and steps by steps guide. I referred William Lam’s article & Duncan’s article to configure vSAN on my lab, I was true I exactly followed his steps to configure the vSAN, but I can not saw anything under disk field under Disk Management.

    Please note: Following steps does not work for ESXi 6.0 RC on VMware Workstation 10. You have to set scsix:y.virtualssd = 0 in vmx file to mark the disk as non-SSD. Please refer to William’s article for detail.

    After looked into it deeper, I found something interesting:

    esxcli storage core device list

    I got that output:

    mpx.vmhba1:C0:T1:L0
    Display Name: Local VMware, Disk (mpx.vmhba1:C0:T1:L0)
    Has Settable Display Name: false
    Size: 5120
    Device Type: Direct-Access
    Multipath Plugin: NMP
    Devfs Path: /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0
    Vendor: VMware,
    Model: VMware Virtual S
    Revision: 1.0
    SCSI Level: 2
    Is Pseudo: false
    Status: on
    Is RDM Capable: false
    Is Local: true
    Is Removable: false
    Is SSD: true
    Is Offline: false
    Is Perennially Reserved: false
    Queue Full Sample Size: 0
    Queue Full Threshold: 0
    Thin Provisioning Status: unknown
    Attached Filters:
    VAAI Status: unsupported
    Other UIDs: vml.0000000000766d686261313a313a30
    Is Local SAS Device: false
    Is Boot USB Device: false
    No of outstanding IOs with competing worlds: 32

    Initially, I thought that disk marked as SSD since I ran command to enable SSD. Actually it’s not like that, it shows SSD since my hard disk is SSD!!!! I don’t have to run the command introduced in the articles to turn SSD on, it’s nature SSD. lol

    What I need to do is actually totally oppositely. That’s the steps I used to enable vSAN:

    1. Create two disks.

    2. Login ESXi hosts by SSH.

    3. Run following command, find out the two disks you want to use for vSAN. Record runtime name.

    esxcli storage core device list

    4. Run following command to disable SSD for one disk.

    esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device vmhba1:C0:T2:L0 –option “disable_ssd”

    5. Follow up the articles above to enable vSAN ports, create clusters, enable vSAN on cluster and join ESXi hosts to clusters.


  • How to setup NTP services by PowerCLI

    NTP service is very important for troubleshooting, vmkernel log timestamp is incorrect if your NTP service is not running and ESXi system time is wrong. It can also impact to VM system time even you disable time synchronization on VMware Tools since VM still need to sync time with ESXi after awake from suspended status, finish vMotion, or revert from snapshot.

    I know it’s simple to configure NTP services on single how, what if you want to configure NTP service on massed hosts?

    Basically we have 3 steps to make sure NTP service working properly:

    • Configure NTP server IP address.
    • Bring up NTP service.
    • Set services startup along with ESXi system.

    Let’s try PowerCLI:

    Get-VMHOST -Location Cluster Name | Add-VMHostNtpServer -NtpServer “NTP server address

    Get-VMHOST -Location Cluster Name | Get-VMHostService| Where-Object {$_.key -eq “ntpd”} | Start-VMHostService

    Get-VMHOST -Location Cluster Name | Get-VMHostService| Where-Object {$_.key -eq “ntpd”} | Set-VMHostService –Policy On


  • Travel to Chengdu again

    It’s about 5 years since last time I visited Chengdu. A beautiful city, people say “you gonna love it, and wanna live there if you come to Chengdu”. People looks like live very relax in Chengdu, they drink tea in park, play Mahjong and enjoy professional people scrape their ears (most like ears massage). I was being Chengdu for 3 month, so I’m kind familiar with this city. All memory is 5 years ago.

    I was excited to get my Raspberry Pi on morning, I plan to play on it all the day. But my wife wants to discuss travel plan when we had lunch. She told me she visited Chengdu several times before, but no one is real travel, they just went to the city, got goods, then went back, she only know one place “He Hua Chi” – a clothes market. They bought clothes there and sales in their city.

    Finally, we had a 3 days trip in Chengdu! That’s a crazy plan for me since I never tried planning and going in same day! We flight to Chengdu on night, and checked in a great hotel. Since it’s close to Chinese New Year, so no much people and traffic, I felt it’s like an empty city. 5 years, a lot of change, I can felt my heart beat rock when I saw some building I was familiar with. I didn’t go back to the city after I got new job in Xi’an, a lot of things I was missing every day…He’s, Minto, Chunxi road, JinLi…etc.

    在出租车上,看着外边的建筑,似曾相逢,却又遥远模糊,那些熟悉的建筑还能勾起往日的点点滴滴。5年前,我还是个年轻小伙,还是个不知明天在哪里的待业青年,还是个受了就业打击的毕业不久的大学生,还是个刚刚经历了变态公司折磨的小网管,还是个创业失败的年轻人。5年后,之前的点滴成功、失败、挫折、荣耀都成为了我今天做为一个男人的生活积累、人生经验。无论是痛苦也好,高兴也好,他都是宝贵的。

    成都,希望有机会再来。 Smile


  • How to decode ESXi 5.x SCSI error code

    Storage is critical component for virtualization, lot of VM performance issue is related to storage latency. You may see similar error message on vmkernel log for some case:

    2014-02-11T07:18:20.541Z cpu8:425351)ScsiDeviceIO: 2331: Cmd(0x4124425bc700) 0x2a, CmdSN 0xd5 from world 602789 to dev “naa.514f0c5c11a00025” failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0

    It much like language of another planet when I first time saw itJ. Let’s see how to “translate” it to human language.

    First, I split it to several sections:

    a) 2014-02-11T07:18:20.541Z cpu8:425351)

    b) ScsiDeviceIO: 2331: Cmd(0x4124425bc700) 0x2a, CmdSN 0xd5

    c) from world 602789

    d) to dev “naa.514f0c5c11a00025”

    e) failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0

    Section A shows the UTC time when the error occurred.

    Section B shows what command is sent. (Actually I don’t even know what the command means is, please let me know if you know it.)

    Section C shows which world the command related to.

    You can found which world it is by following command

    ps | grep 602789

    Section D shows which storage device it show error message.

    You could identify which datastore it is by following command if your datastore contains single LUN:

    esxcfg-scsidevs –m naa.514f0c5c11a00025

    You could also check out LUN setting and information by following command:

    esxcli storage core device list –d naa.514f0c5c11a00025

    esxcli storage nmp device list –d naa.514f0c5c11a00025

    Section E shows SCSI sense code. That’s the part I want to give more detail.

    It’s breakdown to two sections:

    SCSI status codeH:0x0 D:0x2 P:0x0

    H means host status

    D means device status

    P means plugin status

    Sense data0x4 0x44 0x0

    0x4 means Sense Key

    0x44 means Additional Sense Code

    0x0 means ASC Qualifier

    Before decode, you should translate each code to NNNh notation, 0xNNN = NNNh. For example 0x7a = 7Ah, 0x77 = 77h.

    SCSI status code is easy to decode. You just need to change the format and check out the code from http://www.t10.org/lists/2status.htm.

    In our example H:0x0 D:0x2 P:0x0, host code 0x0 (00h) means ESX host side is good, device code 0x2 (02h) means device is not ready, plugin status code 0x0 (00h) means LUN plugin is good. (Clarify: device code 0x2 is actually means “check condition”, it’s not really means “device is not ready”, it’s just for easy understand, but looks like it confuse since “Check Condition” has different means with “Device is not Ready”. Thanks Tony point out that. )

    Sense data is a little bit complicate. You have to refer two links http://www.t10.org/lists/2sensekey.htm and http://www.t10.org/lists/asc-num.txt.

    In our example: 0x4 0x44 0x0, Sense Key 0x4 (4h) means HARDWARE ERROR, Additional Sense Code is 0x44 (44h) and ASC Qualifier is 0x0 (00h), combine the both code to 44h/00h, it means INTERNAL TARGET FAILURE.

    Okay, then we put all decode language together:

    ESX host side is good, device is not ready, LUN plugin is good because HARDWARE ERROR INTERNAL TARGET FAILURE

    Actually I dumped this code from an fnic firmware/driver incompatible case. Is it make your troubleshooting more easy?J

    You could also refer to following links to get more detail:

    Understanding SCSI device/target NMP errors/conditions in ESX/ESXi 4.x and ESXi 5.x

    Understanding SCSI host-side NMP errors/conditions in ESX 4.x and ESXi 5.x

    Interpreting SCSI sense codes in VMware ESXi and ESX

    Interpreting SCSI sense codes in VMware ESXi and ESX