Oracle Utilizes 50% of Physical Processors on HPE Server

DBA team told me Oracle was running slow on a HPE server. I observed the CPU utilization was about 50% of overall capacity. Whenever Oracle database bumps up the system experienced slowness.

Further digged into the issue, I see Oracle workload only ran on single physical processor, but the server has two processors. And the Windows 2012 R2 resource manager show the system used Processor Group, the two physical processors were grouped out. This technology is described in Microsoft MSDN article.

To fix the issue you have to change value of “NUMA Group Size Optimization” to “Flat” in BIOS. Please refer to HPE article for detail steps.

Detail of HPE server behavior is documented here. Please note, the article says it impacts to ProLiant Gen9 and Intel E5-26xx v3 processors. But it actually also impacts to Intel E5-26xx v4 and Synergy blades.

HPE SPP2016.04 is released!

HPE SPP2016.04 is released. I usually test April version since it’s always stable than October or other earlier version.

Continue reading “HPE SPP2016.04 is released!” →

ESXi 5.5 and Emulex OneConnect 10Gb NIC

*** English Version ***

You are using HP ProLiant BL460c G7 or Gen8, ESXi version is 5.5, NIC is Emulex chipset. You are using driver version 10.x.x.x. You may experience the host randomly lost connectivity on vCenter Server, host status show “No responding”. You cannot ping any virtual machine hosted on the blade. High pause frame is observed on HP virtual connect model down links after problem occurred. And you see similar error in vmkernel logs:

Continue reading “ESXi 5.5 and Emulex OneConnect 10Gb NIC” →

Blue Screen with Bug Check 50 on ESXi 5.x

Some critical VMs got blue screen in last few weeks. After working with OS and hardware vendor, we figured out the root cause eventually. It’s a CPU problem related to Intel v2 CPU of E3, E5 and E7 families. The detail information is documented in VMware KB Windows 2008 R2 and Solaris 10 64-bit virtual machines blue screen or kernel panic when running on ESXi 5.x with an Intel E5 v2 series processor.

Continue reading “Blue Screen with Bug Check 50 on ESXi 5.x” →

Firmware upgrading is pending on install in progress on HP ESXi 5.5 blade by HP SUM

In my post HP Blade Firmware Upgrading Best Practices for ESXi Host I mentioned HP released firmware and drivers by SPP image. I have set my ESXi 5.5 baseline to SPP2014.06 as I have tested it in my lab environment. Looks like stable.

Continue reading “Firmware upgrading is pending on install in progress on HP ESXi 5.5 blade by HP SUM” →

How to get HP ProLiant blade server and enclosure information

An enterprise infrastructure administrator needs to run plenty of reports for firmware, software version, or any kind of infrastructure data in their day-to-day operation. Some vendors provide powerful tools to pull out data from their solution, but what if you don’t have such tools? It is pain to get data manually especially for large number of servers. I’m going to share my trick to you. I’ll use HP ProLiant blade system for example, as it’s very common case in enterprise datacenter.

Continue reading “How to get HP ProLiant blade server and enclosure information” →

HP Blade Firmware Upgrading Best Practices for ESXi Host

I discussed this topic with a group, some people think firmware upgrade is not required if ESXi host working fine, that’s adapted to small business, but I think enterprise can do more better.

My ESXi running on HP blades, I’ll use that platform for example to share my thought and experience.

Why you need a plan for HP blade firmware upgrading of ESXi host?

First voice around my head is “We suggest you upgrade firmware to latest version”. You may experience similar like me when you call HP for helping, that’s look like HP official statement whenever we suspect a problem related to hardware. 😉 You know how hard to upgrade bulk of ESXi hosts to troubleshooting a network/storage problem, especially your hosts are running on older version, it may be extremely time consuming. So keep firmware up to date will save troubleshooting time, also make your life easy. 🙂

Even no issue on hardware, you may still need to upgrade software, it’s rarely but some maybe conflict with old firmware, and in this scenarios please consider significantly downtime when you have to upgrade firmware if your server is running on older version.

Reboot is required for most firmware upgrading,

HP blade firmware upgrading tools for ESXi host

HP is right statement, their firmware has lifecycle, and the official HP policy is only to support updating to a new version that is two versions newer than the currently installed version.

Recently HP is replacing old firmware tools by HP Service Pack for ProLiant (SPP). SPP is an all in one image file includes firmware, drivers and management tools for ProLiant servers. Thanks HP, it’s pretty confuse when I upgrade by old way, now it’s easy to know which firmware level your servers exactly on.

You can upgrade ESXi host by two ways below. Online upgrading is recommended. Refer to
HP ProLiant Gen8 and later Servers – Understanding the Differences between Online and Offline Modes in HP SUM

Online upgrading – ESXi 5.x first time supports online firmware upgrading, that’s really benefit for production ESXi host. But on other side SPP doesn’t support online upgrading for all components on ESXi host, such as power management, and you have to install HP customized ESXi to use online upgrading.

Offline upgrading – offline upgrading is convention for all OS, ~30 minutes downtime is required for each blade.

You can click here for more detail of SPP.

Best practices for HP blade firmware upgrading

I’m using it now, it may give you some idea of how to plan firmware upgrading for ESXi host.

Before implement firmware

Ensure HBA firmware is supported by storage vendor.
Ensure NIC firmware is supported by OS and switch.
Please check VMware compatibility guide.
Create SPP server.
You may have multiple Datacenter on different location. You have to prepare servers on each location to store SPP image, it reduces SPP image load time from local server.
Create firmware baselines.
You may want to keep ESXi host firmware up to date, I suggest creating a baseline, all ESXi host must be upgraded to exactly same firmware base on baseline. Enterprise datacenter may has thousands ESXi host, unified firmware will make it more stable. Your troubleshooting also more efficiency since it’s possible to identify hardware issue quickly.
Create rollback plans.
HP firmware can be force rollback, but not 100% successful, you can prepare alternative, such as vendor support after upgrading failed, data recovery from tape…etc.
Create update plan.
Which SPP will you use?
Which ESXi version should be along with the baseline?
How you upgrade ESXi host?
Create testing environment.
I would recommend perform testing if you want upgrade all smoothly. As least run the upgrading on one ESXi host and keeps it running 72 hours, monitor vmkernel log in case any issue.
Generate firmware report.
A firmware report is required to understanding the whole picture.
You can generate the reports by native HP SUM (Smart Update Manager) in SPP image, or you can download SUM from HP website and run on a server, native version has problem to generate reports for some blade model, so latest version is preferred.
Identify hotfixes and critical advisories.
Read SPP release notes and HP CA to understand known issue and work around will make your IT life beautiful. 🙂

Pre-check before upgrade OA/VC

HP blade is installed on enclosure, it managed by enclosure Onboard Administrator (OA) and interactive with network/storage via virtual connect module (VCM). Blade firmware should compatible with OA and VCM firmware version as well.

Before the upgrading you should spend some time to verify enclosure health and version by following steps.

Perform a health check on the VC modules by Virtual Connect Support Utility.
If OA firmware is 1.x, it must be updated to 2.32 before updating to newer versions.
If VC firmware is greater than 3.00, then OA must be 3.00 first.
Run HP Virtual Connect Pre 3.30 Analyzer if VC version is 3.x and upgrade to 3.3.
Make sure that the VC modules are set up in a redundant configuration. Stack link should be configured.

You also need to make sure blade drivers is updated by same SPP image before upgrading.

Firmware upgrading

As I mentioned above, blade firmware should compatible with OA/VCM firmware, upgrade sequence is very important, blade may lost communication with OA/VCM if you upgrade by wrong sequence.

If VC earlier than 1.34:
Sequence is VC -> OA -> Blade.
If VC 1.34 or later:
Online mode sequence is OA -> Blade -> VC. (This is for firmware upgrading by SPP image.)
Offline mode sequence is OA -> VC -> Blade. (This is for upgrading under CLI or offline mode.)
Insert the SPP image via iLO. ( You can also extract the image to local disk of target server if it’s Windows )
Boot from CD-ROM if you run via iLO.
I recommend you select Interactive Mode if that’s first time you do it for a particular hardware specification.
Go to review stage by following the wizard.
Make sure all hardware is listed on updating list.
Reboot after upgrading completed.

Note: If your blade firmware/driver is earlier than SPP2013.02 (include this version) you must upgrade VC to 4.01 or later, and then upgrade blades.

That’s the best practices what I’m using, please let me know if you have better idea.

Windows cannot be installed on drive 0 partition 1

I think Windows Server 2012 will be next popular server OS just like Windows Server 2008, it’s also a nice hypervisor OS on virtual world. How do you think?

Installation is first step to experience the wonderful OS, you may see some strange problem during that step just like me. Today’s topic occurred long time ago, just want to share with people who may face similar issue like me.

That’s HP blade system with local disk attached, you may see similar problem on other vendor. When you select disk to install OS, installer may says Windows can’t be installed on drive 0 partition 1, or Windows cannot be installed on this disk. This computer’s hardware may not support booting to this disk. Ensure that the disk’s controllers is enabled in the computer’s BIOS menu.

That’s because boot volume is not set on array controller. For example by HP servers, you have to reboot and press F8 after BIOS checks array controller to enter array controller management interface. Then go to Select Boot Volume in main menu, select Direct Attached Storage, and then select the disk you want to install OS. Follow up the wizard to continue boot up.

If the problem persists, go to array controller management interface, rebuild array and select boot volume again, it should fix your problem.

All paths lost on HBA port

HP, a great company, I like the hardware design of HP ProLiant server, it’s pretty easy for datacenter maintenance and operation, do you like it? Today, I’ll introduce a storage issue on HP ProLiant BL460, BL480 blades. This issue happened on Qlogic HBA with VC-FC module. I have two dual port Qlogic HBAs on each ESXi5.x host, one port of each HBA was zoned together on SAN switch.

For example, vmhba1 and vmhba3 are zoned for LUN allocation, each LUN have two paths on each HBA port.

I observed all LUNs disappeared on random HBA port sometimes, it’s not happening very frequently, but it can lead to ALL VM DEAD if you get storage outage when LUNs disappeared!!! This problem becomes more frequently more your virtual infrastructure grows bigger.

This is the symptoms when the issue happening:

And if you login SSH console and check HBA card status by:

less /proc/scsi/qla2xxx/[Device ID]

You will find following differences of two HBA ports:

See? All targets show Offline status on problem HBA.

scsi-qla3-target-0=500a09859d812da0:030098:1000:<Offline>

You have two options to fix it:

Reseat blade. Downtime and local resource is required.
Reset HBA by following step:

Record the Device ID, and force HBA do rescan:

echo “scsi-qlascan” > /proc/scsi/qla2xxx/adapter_id

Wait few seconds, force LIP login:

echo “scsi-qlalip” > /proc/scsi/qla2xxx/adapter_id

Wait few minutes, LUNs come back online… JYou could refer to KB 1031199 for more detail.

This is a temporary remediation, the problem will repeat. I’ll show you some permanent solution in next blog.