I used to see memory degrading on Cisco UCS blades. But less see on HPE blades. I thought it maybe quality control problem of Cisco manufacture. Today I read two articles in Cisco website, it explains why we see memory degrading and how it works. I attached the articles below.
Managing Correctable Memory Errors on Cisco UCS Servers
UCS Enhanced Memory Error Management
The conduction in the whitepaper is not only specific for Cisco UCS, but also for any modern servers. Following is summary of why memory errors rates is going high nowadays.
- Larger memory systems contain more bits
- Higher capacity DRAM chips require smaller bit cells which result in fewer stored charges per bit
- Lower operating voltages can lead to reduced noise margin
- Higher operating speeds can lead to reduced timing margin