Balanced Memory Configuration for ESXi Servers

Background

ESXi servers are fundamental infrastructure. The underly hardware performance has a butterfly effect on the upper layer virtual machines and applications. Since there is less than 10% performance overhead on the virtualization layer. So it’s valuable to get full performance on the hardware layer. Memory is a big player in hardware performance tuning. Balanced memory configuration is very important. I wrote an article “LRDIMM or RDIMM on ESXi hosts?” a long time ago. It was the understanding in 2016. I did some research recently. I hope the following study help with your hardware decisions. This study is based on Intel Xeon 2nd Generation Scalable Processors.

1. Basic Concepts

Before we talk about balanced memory and optimized memory. Let’s take a look at some basic concepts which will be used in the article.

1.1 Memory Channel

The memory channels control reading and writing bandwidth operations between the CPU and memory modules. Think about it like the lanes on a highway connect CPU and memory. There are 6 memory channels on Intel Xeon 2nd generation Scalable processors. This is different from Intel Xeon E7. E7 only has 4 memory channels. If you are an E7 user and transform to Scalable. You should notice that the balanced memory size on E7 doesn’t balance on Scalable processors.

1.2 DIMM

The full name of DIMM is Dual In-line Memory Module. It’s the small chip-set mounted on a printed circuit board. There are multiple DIMMs on the board. We usually call the whole stuff DIMM in life. DIMM should be installed on the DIMM slot on the server’s motherboard. Each memory channel has two DIMM slots for Intel Xeon 2nd Generation Scalable Processors.

1.3 DIMM Type

The major two types are RDIMM and LRDIMM. The main difference between the two types is that LRDIMM has a buffer in the DIMM. That means LRDIMM is slower than DIMM. LRDMM supports a larger size per DIMM. Such as 128 GB, or higher. RDIMM usually is 16 GB, 32 GB, or 64 GB.

1.4 Rank

It’s one set of memory chips that are accessed while writing to or reading from the memory. Small DIMM size needs maybe only a single rank. Larger DIMM size needs more ranks. Refer to this article to learn more about rank. Dual-rank is better performance than quad-rank according to hardware vendors.

1.5 Interleave

Memory interleaving allows a CPU to efficiently spread memory accesses across multiple DIMMs. All DIMMs should be in one interleaved set creating a single uniform memory region that is spread across as many DIMMs as possible. If there are different types of DIMM or size of DIMM, the memory subsystem interleaves multiple times to access data.

2. Balanced Memory

Memory balance refers to memory population rules on physical servers. There are 6 memory channels on an Intel Xeon 2nd Generation Scalable Processors. Each channel handles two DIMM slots. Enterprise users usually use dual processors for ESXi servers. So there are 24 DIMM slots totally in the server. Each processor handles 12 DIMM slots. The DIMM population should be in pairs and even numbers. For example. You have 12 DIMMs. 6 of 12 should be assigned to the first processor and the other 6 are assigned to the second processor’s DIMM slots. And the 6 DIMMs should be populated on the same slot on each processor. This is called a “balanced memory” population. The memory balance is also related to the NUMA balance for ESXi. The imbalanced memory population causes imbalanced NUMA nodes. This is very important in virtualization performance tuning.

DIMM size and type in the population also impact to memory balance. The DIMM type and size on both processors should be the same. Some customers may want to upgrade existing balanced memory to a higher capacity. The suggestion is to use the same type and size of DIMMs for the upgrading.

3. Memory Optimize

Balanced memory is easy. Hardware vendor’s pre-sales usually give a balanced memory configuration. But the trick is the balance doesn’t means optimized. Balanced but non-optimized server’s performance can be only 35% of the balanced and optimized servers according to Lenovo.

Fully populate the DIMM slots is the first rule. Each interleaving grabs data from all DIMM slots. Think about it like 24 lanes highway again. You can load more goods if you dispatch 24 trucks at the same time. One benefit of virtualization is hardware consolidation. We usually see high memory utilization on ESXi. So this is an important point to optimize ESXi performance.

Use the same type and size of DIMM. Do not mix different types or sizes of DIMM in the server. Since different types or size of DIMM means multiple interleaves. Use the same example. The trucks can only load one type of goods each time. If there are two types of goods. The trucks have to go twice even it’s half loaded.

Rank is another weight impact on performance. Dual-rank DIMM is the best performance. Quad-rank usually in big size DIMM. The performance is lower than dual-rank.

Different DIMM type also impacts performance. RDIMM is a better performance than LRDIMM. The reason because LRDIMM has a cache on DIMM to handle data I/O between processor and memory. It slows down the I/O.

The other things impact to ESXi performance are BIOS settings and power settings. These are out of scope of this post. Basically you need to set to high performance for BIOS and power to achieve best memory performance.

4. Reference

Lenovo: Balanced Memory Configurations with Second-Generation Intel Xeon Scalable Processors

Dell: How to Balance Memory on 2nd Generation Intel® Xeon™ Scalable Processors
Dell: Balanced Memory is Best: 2nd Generation AMD EPYC Processors for PowerEdge Servers

Cisco: Intel Xeon Scalable 2nd Generation Processor Recommendations for Cisco UCS M5 Servers (Login required)

HPE: Server Memory and Persistent Memory Population Rules for HPE Gen10 Servers With Intel Xeon Scalable Processors

VMware: A Performance Comparison of Hypervisors

Network Latency on Virtual Machine

Slight network latency may cause application problem  on sensitive virtual machines. Even the network responding time is just 3 or 7 ms. There is a way to improve the  stability of responding latency – Enable RSS on NIC.

Network traffic is handled by single CPU core when RSS is disabled. Enable it will distribute the workload to 4 cores by default. You can increase CPU for RSS by change registry.

To summarize the solution. Go to Device Manager -> NIC properties -> Advance -> Find RSS option  and enable it. You will see 2 – 3 network drops when applying it.

You can refer following articles for detail.

Poor network performance or high network latency on Windows virtual machines

Virtual Receive-side Scaling in Windows Server 2012 R2

Regarding increase CPU for RSS. Read following article to learn how to modify it.

Setting the Number of RSS Processors