Category: English

English version of my posts.

  • [Quick Note] Failed to install pywinrm on CentOS 8

    You may see error message “Running setup.py install for pykerberos … error” when install pywinrm on CentOS 8. The additional errors are “unable to execute 'gcc': No such file or directory” and “command 'gcc' failed with exit status 1“.

    The reason because gcc dependence is missing on the machine. You need to run following command to install gcc then try install pywinrm again.

    # yum install gcc

  • Balanced Memory Configuration for ESXi Servers

    Background

    ESXi servers are fundamental infrastructure. The underly hardware performance has a butterfly effect on the upper layer virtual machines and applications. Since there is less than 10% performance overhead on the virtualization layer. So it’s valuable to get full performance on the hardware layer. Memory is a big player in hardware performance tuning. Balanced memory configuration is very important. I wrote an article “LRDIMM or RDIMM on ESXi hosts?” a long time ago. It was the understanding in 2016. I did some research recently. I hope the following study help with your hardware decisions. This study is based on Intel Xeon 2nd Generation Scalable Processors.

    1. Basic Concepts

    Before we talk about balanced memory and optimized memory. Let’s take a look at some basic concepts which will be used in the article.

    1.1 Memory Channel

    The memory channels control reading and writing bandwidth operations between the CPU and memory modules. Think about it like the lanes on a highway connect CPU and memory. There are 6 memory channels on Intel Xeon 2nd generation Scalable processors. This is different from Intel Xeon E7. E7 only has 4 memory channels. If you are an E7 user and transform to Scalable. You should notice that the balanced memory size on E7 doesn’t balance on Scalable processors.

    1.2 DIMM

    The full name of DIMM is Dual In-line Memory Module. It’s the small chip-set mounted on a printed circuit board. There are multiple DIMMs on the board. We usually call the whole stuff DIMM in life. DIMM should be installed on the DIMM slot on the server’s motherboard. Each memory channel has two DIMM slots for Intel Xeon 2nd Generation Scalable Processors.

    1.3 DIMM Type

    The major two types are RDIMM and LRDIMM. The main difference between the two types is that LRDIMM has a buffer in the DIMM. That means LRDIMM is slower than DIMM. LRDMM supports a larger size per DIMM. Such as 128 GB, or higher. RDIMM usually is 16 GB, 32 GB, or 64 GB.

    1.4 Rank

    It’s one set of memory chips that are accessed while writing to or reading from the memory. Small DIMM size needs maybe only a single rank. Larger DIMM size needs more ranks. Refer to this article to learn more about rank. Dual-rank is better performance than quad-rank according to hardware vendors.

    1.5 Interleave

    Memory interleaving allows a CPU to efficiently spread memory accesses across multiple DIMMs. All DIMMs should be in one interleaved set creating a single uniform memory region that is spread across as many DIMMs as possible. If there are different types of DIMM or size of DIMM, the memory subsystem interleaves multiple times to access data.

    2. Balanced Memory

    Memory balance refers to memory population rules on physical servers. There are 6 memory channels on an Intel Xeon 2nd Generation Scalable Processors. Each channel handles two DIMM slots. Enterprise users usually use dual processors for ESXi servers. So there are 24 DIMM slots totally in the server. Each processor handles 12 DIMM slots. The DIMM population should be in pairs and even numbers. For example. You have 12 DIMMs. 6 of 12 should be assigned to the first processor and the other 6 are assigned to the second processor’s DIMM slots. And the 6 DIMMs should be populated on the same slot on each processor. This is called a “balanced memory” population. The memory balance is also related to the NUMA balance for ESXi. The imbalanced memory population causes imbalanced NUMA nodes. This is very important in virtualization performance tuning.

    DIMM size and type in the population also impact to memory balance. The DIMM type and size on both processors should be the same. Some customers may want to upgrade existing balanced memory to a higher capacity. The suggestion is to use the same type and size of DIMMs for the upgrading.

    3. Memory Optimize

    Balanced memory is easy. Hardware vendor’s pre-sales usually give a balanced memory configuration. But the trick is the balance doesn’t means optimized. Balanced but non-optimized server’s performance can be only 35% of the balanced and optimized servers according to Lenovo.

    Fully populate the DIMM slots is the first rule. Each interleaving grabs data from all DIMM slots. Think about it like 24 lanes highway again. You can load more goods if you dispatch 24 trucks at the same time. One benefit of virtualization is hardware consolidation. We usually see high memory utilization on ESXi. So this is an important point to optimize ESXi performance.

    Use the same type and size of DIMM. Do not mix different types or sizes of DIMM in the server. Since different types or size of DIMM means multiple interleaves. Use the same example. The trucks can only load one type of goods each time. If there are two types of goods. The trucks have to go twice even it’s half loaded.

    Rank is another weight impact on performance. Dual-rank DIMM is the best performance. Quad-rank usually in big size DIMM. The performance is lower than dual-rank.

    Different DIMM type also impacts performance. RDIMM is a better performance than LRDIMM. The reason because LRDIMM has a cache on DIMM to handle data I/O between processor and memory. It slows down the I/O.

    The other things impact to ESXi performance are BIOS settings and power settings. These are out of scope of this post. Basically you need to set to high performance for BIOS and power to achieve best memory performance.

    4. Reference

    Lenovo: Balanced Memory Configurations with Second-Generation Intel Xeon Scalable Processors
    
    Dell: How to Balance Memory on 2nd Generation Intel® Xeon™ Scalable Processors
    Dell: Balanced Memory is Best: 2nd Generation AMD EPYC Processors for PowerEdge Servers
    
    Cisco: Intel Xeon Scalable 2nd Generation Processor Recommendations for Cisco UCS M5 Servers (Login required)
    
    HPE: Server Memory and Persistent Memory Population Rules for HPE Gen10 Servers With Intel Xeon Scalable Processors
    
    VMware: A Performance Comparison of Hypervisors

  • MAC Address Conflict with ESXi vmkernel NIC on Cisco UCS Blades

    Background

    I worked on a interesting case few month back. A ESXi blade was not able to bring up due to management IP address didn’t responding to ping. We tried to reconfigure IP address, re-acknowledge blade, rebuild the network, and even replaced the motherboard. It was no lucky. Eventually we figured it out that another ESXi host’s management network somehow configured same MAC address. It caused the MAC address conflict on network.

    This guide will show you some tips of how to troubleshooting MAC address conflicts on ESXi and Cisco UCS level.

    Some Reference

    The first article you should read is “vmk0 management network MAC address is not updated when NIC card is replaced or vmkernel has duplicate MAC address”. It helps you understand why vmkernel MAC address is not updated. The solution in the KB is change MAC address manually on ESXi. Or re-create management network.

    But the reality is we usually don’t know where the conflict comes from. We only know this Cisco UCS blade installed ESXi and it doesn’t responding to ping. So you may suspect it’s a hardware issue like me.

    Check MAC address conflicts on Cisco UCS

    There are some ways to check MAC address conflicts on Cisco UCS.

    • Login to UCS Manager by SSH and check MAC address status.
    • Export UCS Manager log and check MAC address conflicts in fwm_trace_log file.
    # Login to UCS Manager
    # Run following command to show mac address status.
    show platform fwm info mac <mac address> <vlan id>
    
    # Sample
    show platform fwm info mac 0025.0050.11.11 141
    

    Login to UCS Manager GUI to generate support log.

    Admins -> AllFaults, Events and Audit -> Log -> TechSupport Files

    Generate a ucsm log bundle. Download and extract it. There are two major files in the log bundle: UCSM_A_TechSupport.tar.gz and UCSM_B_TechSupport.tar.gz. The files correspond to their respective Fabric Interconnect.

    MAC address conflicts usually occurred on one Fabric Interconnect. So you may need to check both of them. I use A side as sample. Go to extract folder -> UCSM_A_TechSupport -> sw_trace_logs -> fwm_trace_log.current

    Search keyword “REGMAC seen on border port” in the log. You need to repeat same in the log of the other FI. If you can find the entries and time is recently. Then it indicate there is conflict on the MAC address outside the UCS domain.

    There maybe other reasons can cause mac address issue. I wrote in Error: No NIC found with MAC address…

  • Quick Note: Microsoft Remote Desktop Connection Manager Windows overfit High DPI Screen

    4K screen is getting popular in recent years. You may see some challenge for legacy applications. Such as “Microsoft Remote Desktop Connection Manager”. It’s stopped developed since 2014. But it’s still a useful tool for server administrators.

    You may see the windows overfits screen on 4K display. The fix is:

    1. Go to properties of “RDCMan.exe
    2. Compatibility” tab
    3. Change high DPI settings
    4. Uncheck “Override high DPI scaling behavior“.
  • “Session data container is missing. It must’ve been destroyed, probably due to logout”

    Quick post. You may see “Session data container is missing. It must’ve been destroyed, probably due to logout.” after rebooting a new joined domain vCenter PSC server. It usually occurs when you login to PSC web client.

    The reason is stale cookies on client browser. Clean up cookies will fix the issue.

  • Access Deny When Run PowerShell Scripts

    You may get access deny when modify particular section of Windows Server. Such as some registry keys or system directories.

    The reason is Windows Server protects sensitive part of operating system. This is similar like running command without root permission on Linux. You have to run as administrator to work around access deny problem.

    I faced this issue when run guest command on Embotics Commander workflow. Looks like there is no official document talk about this issue. The workaround is disable UAC on Windows Server. Following are some helpful references.

    Please refer to Disabling User Account Control (UAC) on Windows Server to understanding impact of disable UAC.

    There are plenty of articles on internet talk about how to disable UAC.

    There are two steps:

    • Disable UAC notification in Control Panel.
    • Change value of key EnableLUA from 1 to 0 in registry path HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionpoliciessystem.

    You may need to reboot the server, or wait few minutes.

    Validation is you should see a reminder message when run command.

  • Get SSD Hard Disk Information by PowerShell to HPE Servers

    Summary

    HPE published an advisor for SSD issue recently. The issue impacts most popular Proliant servers in the world. The remediation is upgrading firmware. Unfortunately HPE doesn’t have a product can easy report hard disk model for Gen9 and earlier models. I have tried HPE OneView and OneView Global Dashboard. However, we can get SSD hard disk information by PowerShell through API.

    Solution

    Following procedure helps you get SSD hard disk information in large environment.

    1. Make sure you have same credential available on iLO of Proliant servers. It can be local or domain credentials.
    2. Prepare a Windows 2016 or Windows 10 computer with latest patch and internet.
    3. Install HPEiLOCmdlet by following PowerShell command
    Install-Module -Name HPEiLOCmdlets
    1. Connecting to HPE iLO.
    $Conn = Connect-HPEiLO -IP xxx -User xxx -Password xxx -DisableCertificateAuthentication
    1. Retrieving HPE Smart Array Storage Controller information.
    $HardDisks = Get-HPEiLOSmartArrayStorageController -Connection $Conn
    1. Run following command to get physical disk information.
    HardDisks.Controllers.PhysicalDrives

    Conclusion

    PowerShell API is much flexible to get any information of hardware. The solution above is core part. Of course you can leverage ForEach-Object to do some automation report to export to CSV file. PowerShell is not only method, you can also get SSD hard disk information by other API.

  • Machine Learning Basic – Calculate Euclidean Distance by PowerShell

    The core of Machine Learning is to find out rules in a set of data. One basic operation of Machine Learning is “Cluster”. Or simply call it “classify data”. For example, there are 1000 records of toy sales data. It will be useful for proactive new incoming customer’s behavior if we can classify the data to multiple groups (Such as “buy stuffed toys” group and “buy electronic toys” group).

    So we need to leverage partition methods to classify the sales data. There are multiple ways to do so. One simple method call “K-Means“. It calculates the distance between each data point and centroids ( Center point of a group ). And then assign data points to the closest centroids. Wikipedia has a detail description of the method.

    Hence, as you can see, the key to “K-Means” is to calculate distance. There are several ways of calculation. “Euclidean Distance” is one way. Please refer to Wikipedia for deep dive. Long to short, you need to distribute data to a 2D axis. Each data point has x and y value. “Euclidean Distance” between two data points is:

    The formulation is simple, but you have to calculate the distance between each data point to every centroids. Following is a super simple PowerShell code to help calculate Euclidean Distance of a 3 clustered data.

    $K1 = (3.67, 9)
    $K2 = (7, 4.33)
    $K3 = (1.5, 3.5)
    
    $input = Import-Csv "c:tempinput.txt"
    
    $i = 1
    foreach ($seed in $input){
        $K1Result= [math]::Sqrt([math]::pow(($K1[0]-$seed.x),2)+[math]::pow(($K1[1]-$seed.y),2))
        $K2Result= [math]::Sqrt([math]::pow(($K2[0]-$seed.x),2)+[math]::pow(($K2[1]-$seed.y),2))
        $K3Result= [math]::Sqrt([math]::pow(($K3[0]-$seed.x),2)+[math]::pow(($K3[1]-$seed.y),2))
        Write-Host "K1 to  A$i distance is $K1Result"
        Write-Host "K2 to  A$i distance is $K2Result"
        Write-Host "K3 to  A$i distance is $K3Result"
        $i++
    }
    
    PowerShell

    This script assumes you want to partition records to a set of 3 clusters (K1, K2, K3). $K1, $K2, and $K3 are centroids of each cluster (group). You can adjust it according to your purpose.

    The script loads records in “input.txt” file then calculates Euclidean Distance of each record. Each record in “input.txt” only has x and y value. Following is a sample of “input.txt“. You can copy it for testing.

    x,y
    2, 10
    2, 5
    8, 4
    5, 8
    7, 5
    6, 4
    1, 2
    4, 9

    Following is the result:

    K1 to A1 distance is 1.9465096968677
    K2 to A1 distance is 7.55968914704831
    K3 to A1 distance is 6.51920240520265
    K1 to A2 distance is 4.33461647669087
    K2 to A2 distance is 5.04469027790607
    K3 to A2 distance is 1.58113883008419
    K1 to A3 distance is 6.61429512495474
    K2 to A3 distance is 1.05304320899002
    K3 to A3 distance is 6.51920240520265
    K1 to A4 distance is 1.66400120192264
    K2 to A4 distance is 4.17958131874474
    K3 to A4 distance is 5.70087712549569
    K1 to A5 distance is 5.20469979921993
    K2 to A5 distance is 0.67
    K3 to A5 distance is 5.70087712549569
    K1 to A6 distance is 5.5162396612185
    K2 to A6 distance is 1.05304320899002
    K3 to A6 distance is 4.52769256906871
    K1 to A7 distance is 7.49192231673554
    K2 to A7 distance is 6.43652856748108
    K3 to A7 distance is 1.58113883008419
    K1 to A8 distance is 0.33
    K2 to A8 distance is 5.55057654663009
    K3 to A8 distance is 6.04152298679729

  • “The terminal process terminated with exit code: 1” in Visual Studio Code when open PowerShell file

    When you open a PowerShell file in Visual Studio Code, you may see following error:

    The terminal process terminated with exit code: 1

    The issue usually occurred on new provisioned system or enterprise environment with restricted security policy. The reason and solution are same like my other post: “Timed out waiting for the PowerShell extension to start” in Visual Studio Code.

  • Authentication failed when clone git repository on Windows for Bitbucket

    I wrote a post talk about how to install Git and integrate with Visual Studio Code for Bitbucket server. Today, I got following message when I cloned a new repository. The reason was incorrect password.

    fatal: Authentication failed for ‘https://bb.zhengwu.org/vmware.git/’

    Time needed: 10 minutes

    Following is express solution for authentication failed for git repository clone.

    1. Open “Credential Manager” on Windows

      a. Click Start button
      b. Type “Credential Manager” and open it
      c. Click “Windows Credentials“.

    2. Change password for Git repository

      a. Click your Git repository in the list
      b. Click “Edit” to change credential.