You may see following error in Cisco UCS Manager:
default Keyring’s certificate is invalid
The reason is Admin -> Key Management -> KeyRing default is expired. It’s not possible to delete or change the KeyRing in GUI. You have to log in to SSH of Cisco UCS Manager and run following commands (The strings after “#”):
lab-B# scope security
lab-B /security # scope keyring default
lab-B /security/keyring # set regenerate yes
lab-B /security/keyring* # commit-buffer
lab-B /security/keyring #
This will result in a disconnect of the Cisco UCS Manager GUI on your client computer. Just refreshing the page after 5 seconds. It’s no impact to blades.
A few days ago, I deleted some older firmware packages in Cisco UCS Manager. Suddenly more than 100 warnings were generated. The error messages are similar below:
blade-controller image with vendor Cisco System Inc……is deleted
Clearly, it’s triggered due to packages deletion. But all of my service profiles and service profile templates were using existing firmware packages. The deleted packages were not been used anywhere.
I also deleted download tasks and cleaned up everything I can. The warnings still persisted. I figured out it’s caused by the default firmware policy when I read a blog article.
In case you are facing same issue. Please go to Servers -> Policies -> Host Firmware Packages -> default -> Click Modify Package Versions -> Change it to available version.
You may see “The IP address to reach the server is not set” when clicking the KVM console in Cisco UCS Manager. The issue persists even Cisco UCS Manager has enough IP addresses for management. Re-acknowledge or reset CIMC cannot fix the problem.
The fix procedure is go to “Equipment” -> Select the server -> “General” tab -> “Server Maintenance” -> “Decommission” the server.
Wait for the decommission completed, then re-acknowledge the server. IP address will be assigned to the server after the acknowledge process is completed.
Older UCS Manager uses Java application. The UI fonts could be extremely small on high DPI screen. The fix is:
- Go to “C:\Program Files (x86)\Java\jre1.8.0_171\bin“.
- Go to “Properties” of “jp2launcher.exe“.
- “Compatibility” tab -> “Change high DPI settings“.
- Check “Override high DPI scaling behavior….“.
- Select “System (Enhanced)” or “System“.
I used to see memory degrading on Cisco UCS blades. But less see on HPE blades. I thought it maybe quality control problem of Cisco manufacture. Today I read two articles in Cisco website, it explains why we see memory degrading and how it works. I attached the articles below.
Managing Correctable Memory Errors on Cisco UCS Servers
UCS Enhanced Memory Error Management
The conduction in the whitepaper is not only specific for Cisco UCS, but also for any modern servers. Following is summary of why memory errors rates is going high nowadays.
- Larger memory systems contain more bits
- Higher capacity DRAM chips require smaller bit cells which result in fewer stored charges per bit
- Lower operating voltages can lead to reduced noise margin
- Higher operating speeds can lead to reduced timing margin
New B200 M4 blades can running on Intel v4 processors. You may see discovery issue if your UCSM firmware version lower than 2.2.7c. I hit that problem few days ago when I install a new M4 blade. The FSM hung on 58% a real long time and failed eventually.
Cisco UCS blade system is the best blade system I used so far. Whatever the hardware, software or support is perfect. I recommend leverage the system for primary system of virtualization. UCS blade system architecture is different with HP. I feel it more likes a network system. Fabric Interconnect (FI) modules exchange data between uplinks and internal components. IOMs on each chassis controls data routing. Architecture is complicate, but it’s powerful to manage large datacenter. Talking about large datacenter, you may have hundred chassis or blades. Data goes through FIs, IOMs and blades, you could see issues on any layer. It’s hard to find out where exactly the problem is. UCS Manager provides statistics for ports just like how Cisco does on network switches. You can show statistics of a particular port. But it doesn’t tell you when and which layer it happened. I tested Cisco UCS adapter for vRealize Operation Manager before I reviewed NetApp adapter for vRealize Operation Manager. It’s developed by same company Blue Medora. I’d like to introduce few of this product, it’s just my personal review.
思科UCS刀片系列是我至今用过最好的刀片系统。无论是硬件、软件还是技术支持都堪称完美。个人推荐在大型虚拟化机房里把思科UCS作为主要设备。思科UCS刀片系统的架构和惠普的完全不同，感觉更像是个网络设备。Fabric Interconnect (FI)模块负责上联口和内部各组件之间的数据交换、IOM负责各刀箱数据路由。架构看起来很复杂，但是在管理大型数据中心时非常强大。说到大型数据中心，比如有 上百个刀箱和刀片服务器，数据要经过FI、IOM、刀片等，问题可能发生在任何层面，大型虚拟化数据中心很难找到问题的根源。UCS Manager有提供类似思科网络交换机一样的计数器功能，可以显示每一个端口的计数情况，但是这个监控工具不会告诉你什么时间、在哪个层面发生了问题 。在测试NetApp存储性能监控组件之前我有幸测试了vRealize Operations Manager 6的Cisco UCS性能监控组件。该组建同样由Blue Medora开发。以下简单介绍一下，只是我的个人观点 。
I noticed UCS Manager got unexpected failover after we upgraded firmware to 2.2(2c). Looks like it hits a bug CSCuo11700. Firmware should be upgraded to 2.2.(3a) to fix the issue.
I just found a Cisco KB descripts a firmware issue may impact to ESXi FC storage performance. Please have a look whether your Cisco UCS system firmware is 2.1(2a), 2.1(2c), or 2.1(2d).
Whatever you configure on MDS, whatever you configure on Cisco UCS FIs, whatever you do for port channel on both side, the Cisco UCS uplink ports always down with error message Initilize failed, or Error disabled.
Congratulation! your device hit MDS firmware bug…https://tools.cisco.com/bugsearch/bug/CSCtr01652/?reffering_site=dumpcr.
Your vHBAs or other PCI devices may stop running in ESXi 5.x when using Interrupt Remapping feature.
This issue only impact to UCS blade BIOS version 1.4(3c), it has been fixed on 1.4(3j).
Please refer to http://kb.vmware.com/kb/1030265 to see how to disable Interrupt Remapping feature in ESXi 5.x
Also refer to https://tools.cisco.com/bugsearch/bug/CSCty96722.