Blue Screen with Bug Check 50 on ESXi 5.x

Some critical VMs got blue screen in last few weeks. After working with OS and hardware vendor, we figured out the root cause eventually. It’s a CPU problem related to Intel v2 CPU of E3, E5 and E7 families. The detail information is documented in VMware KB Windows 2008 R2 and Solaris 10 64-bit virtual machines blue screen or kernel panic when running on ESXi 5.x with an Intel E5 v2 series processor.

I would like to add my comments for this particular issue. For HP ProLiant servers, it looks like impact to Generation 8 and later, the issue gets fixed in SPP2014.02 or later. Blade server BIOS firmware and PC server BIOS firmware should be updated to version 10/02/2014 or later . I would recommend you upgrade to SPP2014.02 or later not only for BIOS but also for all components. No additional action is required after upgrading the firmware. For Cisco B series servers, this particular issue also documented on bug VMs running on Intel E5 v2 CPUs may experience unexpected reloads, it’s fixed on firmware version 2.2(2c), but this is not a Cisco suggested version, I would say you’d better wait next suggested version be released to fix this problem.

For other hardware vendors whom haven’t release hotfix yet, you have to upgrade to ESXi 5.1U2, or using software MMU. You may concern about performance degrade by using software MMU. Here is a white paper for the performance evaluation.

Please also check my post HP Blade Firmware Upgrading Best Practices for ESXi Host before you update firmware on HP blade systems.

 

过去的几周里我们的一些比较重要的虚拟机蓝屏了。通过与操作系统、硬件厂商合作,我们最终找到了原因。这是一个有关英特尔E5、E6、E7系列的v2 CPU的问题。详细信息请参阅VMware知识库Windows 2008 R2 and Solaris 10 64-bit virtual machines blue screen or kernel panic when running on ESXi 5.x with an Intel E5 v2 series processor.

有关这个问题我希望多说两句 。对于惠普ProLiant系列服务器,这个问题只影响第八代和之后的服务器,这个问题在SPP2014.02中 得到修复。刀片服务器BIOS固件和PC服务器BIOS固件需要更新到2014.10.02或者之后的版本,我比较建议不仅升级BIOS而是把所有组件都升级到SPP2014.02或者之后的版本,升级后这个问题就会得到解决。对于思科B系列服务器,这个问题也 被记录在VMs running on Intel E5 v2 CPUs may experience unexpected reloads中,此故障在固件 2.2(2c) 中得到修复,但是这个固件版本不是 思科建议的版本。我建议最好等待思科放出下一个建议版本后再升级。

对于其他 还没有放出补丁的硬件厂商,你可以通过升级ESXi到5.5U2修复这个问题,或者采用Software MMU。但是Software MMU可能会对性能 有一定的影响。具体可以参考这里

在你升级惠普刀片系统前,请参考我的文章HP Blade Firmware Upgrading Best Practices for ESXi Host