Linux virtual machine hang on ESXi 5.5 host

English Version

Again something wrong on ESXi 5.5! Please don’t upgrade VMware Tools to 5.5 if you have Debian or Red Hat Linux virtual machine on your ESXi 5.5 hosts. There is a unsolved bug on vmmemctl drivers (balloon driver) of VMware Tools 5.5 can lead to Linux virtual machine hangs.

You may see similar output below on hanged Linux virtual machine:

crash> bt PID: 9709 TASK: ffff8100a0459080 CPU: 0 COMMAND: “vmmemctl” #0 [ffff810120095b70] crash_kexec at ffffffff800b1509 #1 [ffff810120095c30] __die at ffffffff80065137 #2 [ffff810120095c70] do_page_fault at ffffffff80067430 #3 [ffff810120095d60] error_exit at ffffffff8005ddf9 [exception RIP: Balloon_QueryAndExecute+493] RIP: ffffffff8820bd7d RSP: ffff810120095e10 RFLAGS: 00010297 RAX: 00000000ffffffff RBX: ffff81008627ff48 RCX: 0000000000000001 RDX: 000000000000006c RSI: 0000000000000202 RDI: ffffffff88216fc0 RBP: ffffffff88216fc0 R8: ffff810120094000 R9: 000000000000003c R10: ffff81013fc14068 R11: 00002ae6787fedc8 R12: ffff81008627e000 R13: 0000000000000282 R14: ffff810122f71de8 R15: ffffffff800a3d4a ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff810120095e28] Balloon_GetStats at ffffffff8820ba32 [vmmemctl] #5 [ffff810120095e58] Balloon_QueryAndExecute at ffffffff8820bbb8 [vmmemctl] #6 [ffff810120095e68] OS_UnmapPage at ffffffff8820b716 [vmmemctl] #7 [ffff810120095ee8] kthread at ffffffff80032c68 #8 [ffff810120095f48] kernel_thread at ffffffff8005dfc1 crash>

You can workaround the issue by disable balloon driver (Refer to KB: Disabling the balloon driver). I don’t recommend to do that since you will lost memory optimize capability when ESXi  host suffers memory constrains.

To check your balloon driver version, please run following command:

strings /lib/modules/2.6.18-371.1.2.el5/misc/vmmemctl.ko | grep bora-vmsoft

You will get similar output below:

/build/mts/release/bora-1768286/bora-vmsoft/lib/kernelStubs/kernelStubsLinux.c

The number after “bora-” should be less than 1768286.

Chinese Version

靠!ESXi 5.5又一次出问题了!如果你的ESXi 5.5上跑着Red Hat或者Debian等Linux平台的虚拟机,最好不要把VMware Tools升级到5.5的版本。此版本下有一个尚未解决的bug可以导致Linux虚拟机宕机。此bug和vmmemctl驱动(内存balloon的驱动)有关。

在宕机的Linux虚拟机上会有类似的错误提示:

crash> bt
PID: 9709 TASK: ffff8100a0459080 CPU: 0 COMMAND: “vmmemctl”
#0 [ffff810120095b70] crash_kexec at ffffffff800b1509
#1 [ffff810120095c30] __die at ffffffff80065137
#2 [ffff810120095c70] do_page_fault at ffffffff80067430
#3 [ffff810120095d60] error_exit at ffffffff8005ddf9
[exception RIP: Balloon_QueryAndExecute+493]
RIP: ffffffff8820bd7d RSP: ffff810120095e10 RFLAGS: 00010297
RAX: 00000000ffffffff RBX: ffff81008627ff48 RCX: 0000000000000001
RDX: 000000000000006c RSI: 0000000000000202 RDI: ffffffff88216fc0
RBP: ffffffff88216fc0 R8: ffff810120094000 R9: 000000000000003c
R10: ffff81013fc14068 R11: 00002ae6787fedc8 R12: ffff81008627e000
R13: 0000000000000282 R14: ffff810122f71de8 R15: ffffffff800a3d4a
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff810120095e28] Balloon_GetStats at ffffffff8820ba32 [vmmemctl]
#5 [ffff810120095e58] Balloon_QueryAndExecute at ffffffff8820bbb8 [vmmemctl]
#6 [ffff810120095e68] OS_UnmapPage at ffffffff8820b716 [vmmemctl]
#7 [ffff810120095ee8] kthread at ffffffff80032c68
#8 [ffff810120095f48] kernel_thread at ffffffff8005dfc1
crash>

其实也可以通过禁用balloon驱动临时解决这个问题(具体可以参考知识库:Disabling the balloon driver)。但是我不推荐这样做,因为这样会导致你的虚拟机在ESXi主机内存吃紧的时候失去优化功能。

可以运行以下命令查看vmmemctl版本:

strings /lib/modules/2.6.18-371.1.2.el5/misc/vmmemctl.ko | grep bora-vmsoft

得到以下输出:

/build/mts/release/bora-1768286/bora-vmsoft/lib/kernelStubs/kernelStubsLinux.c

在”bora-“之后的数字应该小于1768286。