Conventional wisdom regarding computer memory has for some time been that all RAM is created equal. Stated another way, it’s not really worth it to buy expensive ECC RAM because errors just don’t occur frequently enough to worry about. Even in server-grade products, designed to be running 24/7 in mission-critical environments, ECC RAM is often optional. Mainboards and RAM sold for consumer home use almost never even have the option of supporting ECC memory. A new study from Google indicates that this may be a problem.
A two-and-a-half year study of DRAM on 10s of thousands Google servers found DIMM error rates are hundreds to thousands of times higher than thought — a mean of 3,751 correctable errors per DIMM per year.
So starts a summary over at ZDNet. The study (PDF) is available for the reading. A large-scale analysis like this hasn’t been performed (at least publicly) yet, so the findings are pretty shocking. A hearty “thank you” goes to Google for taking the time to analyze this situation, and for publicizing the results.
Basically, the majority of DRAM chips on the market are far more error-prone than previously considered. And the consumer-grade mainboards are just as culpable — if not more so — for hard memory errors than the DRAM chips!
So what might a memory error look like? Darn near anything. Remember that inside your computer, everything is ones and zeros. If one of those ones becomes a zero, who knows what might happen? Maybe nothing, maybe a little stutter in your game, maybe a corrupted file saved to your hard disk, or maybe a complete system lockup.
It’s not all gloom-and-doom, though:
- Temperature plays little role in errors – just as Google found with disk drives – so heroic cooling isn’t necessary.
- The problem isn’t getting worse. The latest, most dense generations of DRAM perform as well, error wise, as previous generations.
- Heavily used systems have more errors – meaning casual users have less to worry about.
- No significant differences between vendors or DIMM types (DDR1, DDR2 or FB-DIMM). You can buy on price – at least for the ECC-type DIMMS they investigated.
- Only 8% of DIMMs had errors per year on average. Fewer DIMMs = fewer error problems – good news for users of smaller systems.
ECC memory usually commands a hefty premium, so it’s no surprise that many people choosing to save money cut that cost first. But maybe it’s time to think long-term about the value of your next purchase.