I used to see memory degrading on Cisco UCS blades. But less see on HPE blades. I thought it maybe quality control problem of Cisco manufacture. Today I read two articles in Cisco website, it explains why we see memory degrading and how it works. I attached the articles below.
Managing Correctable Memory Errors on Cisco UCS Servers
UCS Enhanced Memory Error Management
The conduction in the whitepaper is not only specific for Cisco UCS, but also for any modern servers. Following is summary of why memory errors rates is going high nowadays.
- Larger memory systems contain more bits
- Higher capacity DRAM chips require smaller bit cells which result in fewer stored charges per bit
- Lower operating voltages can lead to reduced noise margin
- Higher operating speeds can lead to reduced timing margin
New B200 M4 blades can running on Intel v4 processors. You may see discovery issue if your UCSM firmware version lower than 2.2.7c. I hit that problem few days ago when I install a new M4 blade. The FSM hung on 58% a real long time and failed eventually.
Cisco UCS blade system is the best blade system I used so far. Whatever the hardware, software or support is perfect. I recommend leverage the system for primary system of virtualization. UCS blade system architecture is different with HP. I feel it more likes a network system. Fabric Interconnect (FI) modules exchange data between uplinks and internal components. IOMs on each chassis controls data routing. Architecture is complicate, but it’s powerful to manage large datacenter. Talking about large datacenter, you may have hundred chassis or blades. Data goes through FIs, IOMs and blades, you could see issues on any layer. It’s hard to find out where exactly the problem is. UCS Manager provides statistics for ports just like how Cisco does on network switches. You can show statistics of a particular port. But it doesn’t tell you when and which layer it happened. I tested Cisco UCS adapter for vRealize Operation Manager before I reviewed NetApp adapter for vRealize Operation Manager. It’s developed by same company Blue Medora. I’d like to introduce few of this product, it’s just my personal review.
I noticed UCS Manager got unexpected failover after we upgraded firmware to 2.2(2c). Looks like it hits a bug CSCuo11700. Firmware should be upgraded to 2.2.(3a) to fix the issue.
I just found a Cisco KB descripts a firmware issue may impact to ESXi FC storage performance. Please have a look whether your Cisco UCS system firmware is 2.1(2a), 2.1(2c), or 2.1(2d).
Whatever you configure on MDS, whatever you configure on Cisco UCS FIs, whatever you do for port channel on both side, the Cisco UCS uplink ports always down with error message Initilize failed, or Error disabled.
Congratulation! your device hit MDS firmware bug…https://tools.cisco.com/bugsearch/bug/CSCtr01652/?reffering_site=dumpcr.
Few weeks ago, I tried to standardize networking of a cluster, there were 4 VLANs for production virtual machines, I binded the VLANs on one virtual switch which had 4 physical vmnic.
Then I created 4 port groups with different VLAN ID, but for some reason virtual machines unreachable via some vmnics. Network team verified port channel was good.
I tried on several ESXi 5.0 hosts in the cluster, all had same problem, finally we found that’s a Cisco switch bug….you could find detail information and work around here.