Why DDR5 does NOT have ECC (by default)

Поделиться
HTML-код
  • Опубликовано: 5 сен 2024

Комментарии • 651

  • @warren_r
    @warren_r 3 года назад +312

    My minimum specification is that I signed up for your Patreon today because good-quality videos like this are what we need more of.

    • @TechTechPotato
      @TechTechPotato  3 года назад +23

      I saw! Thank you for your support, it's so much appreciated 🙏

    • @turbolenza35
      @turbolenza35 3 года назад

      I will never buy Ddr5!!!

    • @kristjen86A
      @kristjen86A 3 года назад +1

      Ditto! It's a fantastic channel, and will help me broaden my already sizable knowledge on different technologies and whatnot! So signed up today thank you @TechTechPotato

    • @lordofthecats6397
      @lordofthecats6397 3 года назад +4

      @@turbolenza35 wat?

  • @happygimp0
    @happygimp0 3 года назад +55

    Completely disagree on 8:47. We absolutely need ECC for consumers.
    We need ECC everywhere.

    • @Monkwrestler
      @Monkwrestler 6 месяцев назад

      as yeah odds are it's not the end of the wrold I'm sure I'm not the only one who loves doing stuff on my personal computer for it to sudenly wig out for apparently no reason

    • @happygimp0
      @happygimp0 5 месяцев назад

      @@Monkwrestler Wir brauchen ECC überall wo gearbeitet wird. Aber das ist die grosse Mehrheit aller Computer. Dann ist es schlussendlich einfacher überall ECC zu verwenden.
      OK, Spielkonsolen gehen ohne ECC, dort interessiert es niemand. Aber wenn ein Mass plötzlich 12.5cm statt 12cm gross ist könnte das teuer werden.

  • @JaWz6
    @JaWz6 3 года назад +333

    i really thought they had ECC, I am glad you made this video to clarify

    • @deilusi
      @deilusi 3 года назад +4

      thing is it should lower error rates, but as explained not eliminate "in-transit" ones. Being completely unprotected vs having your data is safe vault that is protected, does help a bit, but its not a fix-all.

    • @TokyoQuaSaR
      @TokyoQuaSaR 3 года назад +4

      It IS ECC. There is ECC in many parts, in many sub components or IPs. As an FPGA engineer, I use IPs that can have ECC enabled, internal interconnects, SRAMs, memory controllers, PCIe controllers, etc etc. Sure the DDR5 on-die ECC is different and as Ian says it's not the whole "CPU to memory cell ECC" that people have been used to for RAM, but it's still ECC, that's not a lie to say so.

    • @Aereto
      @Aereto 3 года назад

      @@TokyoQuaSaR
      Not if I can call it ECC when my virtual data tables read from the HDD get stored on RAM ends up getting a corrupted entry stored while not even once getting a write/overwrite command.

    • @TokyoQuaSaR
      @TokyoQuaSaR 3 года назад +3

      @@Aereto I don't get what you're trying to say. And I'm not sure you're getting what I am trying to say.

    • @jakobe_bryantgaming5580
      @jakobe_bryantgaming5580 3 года назад

      At least it is more integral when it comes to system errors. On the bright side, the speeds are a massive improvement from DDR4 (ranging from 2133-5000 MHz, where DDR5 is ranging from 4600 MHz-12,600 MHz)

  • @tech6294
    @tech6294 3 года назад +218

    Wow, I thought DDR5 had ECC from everything I've read. Great job! This needs more attention. ;)

    • @eiliannoyes5212
      @eiliannoyes5212 3 года назад

      Reading... reading... reading...

    • @photonboy999
      @photonboy999 3 года назад +10

      DDR5 does have ECC.
      It's "on-die" ECC. What's not clear is whether DDR5 will will use ECC to talk to the CPU for all modules or whether that's optional. It's also not clear whether all DDR5 will have on-die ECC for local (in module only) corrections... I hope DDR5 is ECC at the system level ONLY or that's going to create a lot of confusion. Both AMD and Intel (and whoever controls ARM and even Apple) will have a say in this.

    • @TokyoQuaSaR
      @TokyoQuaSaR 3 года назад +1

      On-die ECC is still ECC. So yes it has ECC. It's not because it's not the whole chain that it's not ECC.

    • @MrNelahem
      @MrNelahem 3 года назад +2

      This is the problem with just reading stuff on the internet and not fact checking you just believe everything to be true. 90% of tech enthusiast I know all think DDR5 has full ECC because they read it on an article on the internet...

    • @TokyoQuaSaR
      @TokyoQuaSaR 3 года назад +1

      @@MrNelahem From an engineer point of view I don't think it's such a big deal for consumers to know that the ECC on DDR5 isn't the exact same thing as the full CPU to memory die ECC.
      I mean sure there are chances of getting errors in the transfer from CPU but I would say it's not as likely as for the memory cells if your system is correctly set up (eg not too much overclocking on the memory bus and the CPU memory controller etc). People working on servers or workstations used for critical applications should be aware of it though. But I assume they will be since they have already been used to buying "full path" ECC modules.

  • @johngreen4610
    @johngreen4610 3 года назад +58

    I worked on IBM mainframes for a very long time. They did error checking at every stage. I was astounded when personal computers came loong without even parity checking data.

    • @charlese2833
      @charlese2833 3 года назад +3

      I remember when you could use ECC on Intel consumer systems

  • @Allyouknow5820
    @Allyouknow5820 3 года назад +209

    "Certain CPU manufacturer wanted better benchmarks"
    *COUGH Intel COUGH*

    • @FakeGordonMahUng
      @FakeGordonMahUng 3 года назад +10

      Better benchmarks also tends to yield better performance too to be fair.

    • @zenith251
      @zenith251 3 года назад +6

      Anyone here remember RDRAM and Rambus's love affair with Intel?

    • @samiraperi467
      @samiraperi467 3 года назад

      @@zenith251 Yeah. I also remember a chipset having a memory handling bug that caused a huge performance hit.

    • @aitorbleda8267
      @aitorbleda8267 3 года назад +7

      @@FakeGordonMahUng Crashes and data corruption are worse than a bit faster speeds.

    • @TokyoQuaSaR
      @TokyoQuaSaR 3 года назад +3

      @@aitorbleda8267 For consumers it's better to have the faster version instead. The JEDEC speeds are REALLY slow.

  • @imadecoy.
    @imadecoy. 3 года назад +55

    Nice dub-over at 6:03, almost didn't notice!
    Edit: Again at 7:25 !

    • @etmasikewo
      @etmasikewo 3 года назад +1

      Something seemed off

    • @GraphicdesignforFree
      @GraphicdesignforFree 3 года назад

      dub-overs are quite difficult, but this one was indeed very well done.

    • @Metalcastr
      @Metalcastr 3 года назад +6

      If our brains had ECC we would've noticed

    • @KiinaSu
      @KiinaSu 3 года назад +1

      You missed the one at 7:11

    • @etmasikewo
      @etmasikewo 3 года назад

      @@KiinaSu oo thats the one i saw haha. Very smooth

  • @johnmothtech
    @johnmothtech 3 года назад +114

    So, basically more dodgy marketing from manufacturers, surprise surprise. Thanks for the explanation Ian.

    • @rdoursenaud
      @rdoursenaud 3 года назад +5

      To be fair, it’s OEMs that pull this kind of stunts, not manufacturers. Memory manufacturers know very well how their chip behaves. These are very well specified. Yet OEMs sell them and market them outside of those specs (Think XMP) on a daily basis…

    • @rdoursenaud
      @rdoursenaud 3 года назад +1

      @@Sidowse Yeah I meant OEM for the modules vs memory (chip) manufacturers. That’s what you get for commenting out late at night :p

  • @richardyao9012
    @richardyao9012 3 года назад +80

    Thank you for explaining this to people. People’s lack of understanding of this has been a headache. :/

    • @rorychivers8769
      @rorychivers8769 3 года назад +1

      I don't understand the blokes catchphrase and that's enough to give me a headache.

  • @kwinzman
    @kwinzman 3 года назад +38

    Thank you! I was getting tired of explaining that over and over. Now I have a video that I can link!

    • @turbolenza35
      @turbolenza35 3 года назад

      Say no to DDR5

    • @kwinzman
      @kwinzman 3 года назад +1

      @@turbolenza35 No, that's not what I was going for. DDR5 is not bad. DDR5 actually supports full ECC DIMMs just like DDR4. Plus actually some more features like read CRC (not mentioned in the video).

  • @NaokiWatanabe
    @NaokiWatanabe 3 года назад +37

    "If you do need a proper end to end ECC system where your data is fully protected.."
    I'd argue that describes everybody. It is unconscionable that in 2021 random data corruption leading to crashes, or worse, would be accepted in a core computing component.

    • @lordofthecats6397
      @lordofthecats6397 3 года назад +3

      000% agree. The problem it seems is that CPU manufacturers like to fuse off ECC in lower-end processors (including ALL consumer grade ones) 'cause market segmentation. The circuitry almost certainly still exists on the die though. Consumers should be sold CPUs with ECC enabled and have the option to choose whatever memory they want

    • @NaokiWatanabe
      @NaokiWatanabe 3 года назад +6

      ​@@lordofthecats6397 You're talking only about intel desktop CPUs of course.
      AMD supports ECC on all CPUs at every segment.
      ARM mobile CPUs have supported ECC since at least Cortex-4.
      SiFive's RISC-V CPUs support ECC on DDR and throughout their entire cache hierarchy.
      As does IBM Power.
      It looks like Apple's M1 might also (from log messages like ('AppleFireStormErrorHandler AppleARM64ErrorHandler: will not panic on correctible ECC errors').
      I'd argue people should perhaps not buy crippled CPUs lacking in important data integrity features but practically nobody understands the importance of this.

    • @lordofthecats6397
      @lordofthecats6397 3 года назад +2

      @@NaokiWatanabe Well Ok, I had no idea that AMD didn't lock out ECC. It'd be nice if they'd advertise that more. That's an actual selling point to me, unlike "CACHE". Anyway, f-ck Intel and only Intel for locking that out then. Also I meant to say 100% agree in my last comment, but it appears there was a little bit of data corruption ;) If only my processor came with ECC support! Unfortunately I'm stuck with Intel as they were pretty much the only option when I bought this computer.

    • @charlese2833
      @charlese2833 3 года назад +6

      It is worse than that! With rowhammer, it is a security defect! Google now claims to have achieved a 2nd row flip. No excuse for turning off ecc. No motherboard or bios should prevent it.

  • @tommihommi1
    @tommihommi1 3 года назад +72

    ECC is great for overclocking, since the memory directly reports when you've gone too far and get errors, and even if you do get errors, they will less often crash your system.
    Consumer ECC with non-JEDEC timings should be a thing.

    • @namibjDerEchte
      @namibjDerEchte 3 года назад +1

      I have the parts here, and building the PC is scheduled to start in about 26h.
      I'll put at least something on my channel, so feel free to sub now and unsubscribe once I posted the ECC OC content.

    • @benjaminfacouchere2395
      @benjaminfacouchere2395 3 года назад

      Interesting. So in your experience, is it more likely that the RAM will cause an overclocked system to crash or the CPU?

    • @tommihommi1
      @tommihommi1 3 года назад +7

      @@benjaminfacouchere2395 A unstable CPU OC is more predictable and tends to crash sooner, just run prime95 with the hottest FFT setting for a few minutes.
      Meanwhile bad RAM OC (and even unstable XMP presets) can be a lot more difficult to troubleshoot. Then there's the weird interactions.
      For example getting a new GPU can suddenly make your RAM that worked fine for years unstable, due to the case temperature being higher than before, or the GPU blowing it's hot exhaust straight onto the RAM sticks, or simply the GPU driver being more sensitive to memory errors.
      Then people blame the instability on the GPU instead of the memory throwing errors all over the place, which wouldn't happen with ECC.
      Diagnostics is difficult, since you might only get the errors when the GPU is under high load, but the CPU isn't, since the CPU cooler usually is what provides the memory with good airflow.
      ECC should be the standard.

    • @benjaminfacouchere2395
      @benjaminfacouchere2395 3 года назад +1

      @@tommihommi1 Thanks for the reply. I don't OC myself.
      I was just assuming watching i.e. LTT that modern CPUs have thermal throttle, so that crashing the CPU due to overheating wouldn't be that easy, and it would more have to do with the frequency itself, but I guess I'm wrong.

    • @tommihommi1
      @tommihommi1 3 года назад +2

      @@benjaminfacouchere2395 no, OC instability has nothing to do with the thermal throttling limit at all

  • @trjozsef
    @trjozsef 3 года назад +47

    Petition: put a heat pipe and a concrete hull on the RAM module instead of RGB.

    • @hardcorehardware361
      @hardcorehardware361 3 года назад +2

      Agreed

    • @temporoyale6251
      @temporoyale6251 3 года назад +3

      Maybe heat pipe or heatsinks cost more than just simply put an ECC module on the memory Die.

    • @hardcorehardware361
      @hardcorehardware361 3 года назад +6

      @@temporoyale6251 I would pay extra for a heatpipe just like I would pay the extra to get an LGA socket on AM5. Not everyone would pay the extra but I would. I would prefer actual heatpipes something useful over RGB but that's me. I'm kind of sick of RGB tbh.

    • @BRUXXUS
      @BRUXXUS 3 года назад +1

      Lead lined heat spreader

    • @temporoyale6251
      @temporoyale6251 3 года назад +1

      @@hardcorehardware361 I agree with you, I would love to see some b-die with a bulky heatspreader or something of the sort

  • @alb9229
    @alb9229 3 года назад +16

    Great video Ian ! Thank you for explaining what the difference is between on die ECC and proper module ECC . Many peoples will be disapointed because of the deceitful marketing .

  • @PebblesChan
    @PebblesChan 3 года назад +10

    Even after nearly 50 years we’re still plagued by DRAM refresh interruptions & consequences! In 1973 we had 2ms refresh times and triple voltages (-5, +5 & +12V) with Mostek MK4027 4096 bit x 1 DRAM.

    • @sporegnosis
      @sporegnosis 3 года назад +1

      You do realise the technology is the same? Let's get to photonic RAM crystals first and we'll talk again...

    • @PebblesChan
      @PebblesChan 3 года назад +2

      Back in the 1970's we de-lidded 4116 DRAMs to made crude image sensors.

    • @angelg3986
      @angelg3986 2 года назад

      I think they implemented self-refresh modes around the i486 era.

  • @not12listen
    @not12listen 3 года назад +26

    And a huge 'thanks, jerks' to Intel for artificially keeping ECC off of desktops.
    All Ryzen based CPUs do support (though, not officially) ECC. The motherboard maker has to also support ECC - and plenty do.
    As per FPS differences from non-ECC to ECC, it is typically less than 10FPS difference in most cases.

    • @perforongo9078
      @perforongo9078 3 года назад +5

      Frames Per Second? Nowhere near that. The difference is expressed as a percentage, as each computer performs differently. ECC at worst will decrease performance by one half of one percent. So not much. Or it won't affect performance at all. Sometimes it improves performance for some reason. But the effect is never large. It used to inhibit performance more in the past, but these days it's trivial. For it to decrease performance by 10 FPS, your computer would be outputting 2,000 FPS. One helluva fast computer you'd have there.

    • @kepstin
      @kepstin 3 года назад +4

      Annoyingly, ECC is not enabled on the existing models of non-PRO Ryzen APUs (no idea about the upcoming 5000g series, but it'll probably be the same). It is enabled on the Ryzen PRO APUs that are available on the grey market, tho.

    • @not12listen
      @not12listen 3 года назад +1

      @@perforongo9078 Absolutely fair point. :) I was using a generalization, just to give a 'rough' frame of reference, not an exact.
      Your clarification is welcomed. :)

    • @not12listen
      @not12listen 3 года назад

      @@kepstin That is true.
      That does not take into account the CPUs, which arguably, sell more units than APUs.
      I am not saying that APUs and less important than CPUs - just that they are different and need to be compared separately as their target market is different - that is especially true for the Pro series APUs.

    • @charlese2833
      @charlese2833 3 года назад

      @@kepstin So AMD copies another Intel mess. Management (backdoor) core and now this.

  • @adymode
    @adymode 3 года назад +4

    I dont know, we are saying the on die ecc is significantly less effective because the buses are external, but dont have any info on what fraction of errors will practically occur on the buses. Don't modern cpu caches all have ecc now? On die ecc still sounds good to me without more info on how frequently errors occur off the die.

  • @blackdeatroi
    @blackdeatroi 3 года назад +12

    3:10 5->6 needs 2 bit flips : )
    101 -> 110
    5 would turn to 1, 4, 7 or (5 + 2^x)

    • @volodumurkalunyak4651
      @volodumurkalunyak4651 3 года назад

      5 could also turn into 13, 21, 37, 69 or 133 (assuming single byte is used for storing that value)

    • @blackdeatroi
      @blackdeatroi 3 года назад +1

      @@volodumurkalunyak4651 which is what i included via (5 + 2^x)
      13 = 5 + 2^3
      21 = 5 + 2^4
      ...

    • @karlmehltretter2677
      @karlmehltretter2677 2 года назад

      you assumed his numbers are stored in ordinary binary representation. maybe he used a different one

    • @mytech6779
      @mytech6779 Месяц назад

      You assume the flip was to the final output of a calculation, it could have flipped any number of inputs or intermediate values. Keep in mind that this is RAM not persistant storage.

  • @TechLevelUpOfficial
    @TechLevelUpOfficial 3 года назад +18

    okay now give us our RGB ECC memory

  • @t.m.grokas6832
    @t.m.grokas6832 3 года назад +2

    Nice ECC edit at 6:03

  • @andrezunido
    @andrezunido 3 года назад +13

    I'm finding myself paying extra for high performance hardware without the gamer motif. It has its place, but I prefer my hardware to be clean, functional, performant and minimalist. Not necessarily in that order, but in pc hardware, lights and glitter are not for me. It's good that we still have some choice in the high performance market.

    • @MikkoRantalainen
      @MikkoRantalainen 2 года назад +2

      I agree. I'm still looking for a high airflow case without a window...

  • @gholland5840
    @gholland5840 3 года назад +25

    I would definitely love to see RGB ECC memory

    • @abhineetsingh12
      @abhineetsingh12 3 года назад

      GOD NO

    • @jpjude68
      @jpjude68 3 года назад +1

      oh man, what about RGB hard-drives? or even RGB CPU under the heatsink?

    • @abhineetsingh12
      @abhineetsingh12 3 года назад +1

      @@jpjude68 please no make it stop

    • @backupplan6058
      @backupplan6058 3 года назад

      @@jpjude68 RGB SSD’s are already on the market.

    • @MarwanMohamed588
      @MarwanMohamed588 3 года назад

      @@backupplan6058 and rgb m.2 lol I have one but I personally hate rgb so I have it just light white

  • @PatricPuola
    @PatricPuola 3 года назад +5

    This video's intro really needed a "What's your JEDEC Specification?"

  • @Vatharian
    @Vatharian 3 года назад +2

    I'm all for RGB ECC memory! Red LED indicating that module thermally throttled since last power down, blue for encountered and corrected error since last power down and green LED for OK status.

  • @Angmar3
    @Angmar3 3 года назад +3

    Thanks for the video, I had thought it was going to be the full ecc and not this partial version

  • @orbitalalpha
    @orbitalalpha 3 года назад +5

    Has anyone evaluated if on-die ecc would protect/mitigate against row hammer attacks?

  • @lucysluckyday
    @lucysluckyday 3 года назад +1

    Loved the 3rd section, where you jumped to basics and explained the fundamental DRAM concepts. It was definitely worth hearing how the errors can occur!

  • @scottylans
    @scottylans 3 года назад +6

    I just want ECC on all memory, even if it's slower. In time, they'll develop faster memory, with ECC.

  • @Bobbias
    @Bobbias 3 года назад +3

    One thing I think you could have talked about is that 1: DDR memory cells are capacitors. They charge up to a certain voltage, then that voltage leaks away. 2: you could remind people that in addition to density, the operating voltage of memory is getting lower and lower, so we have less and less difference between a 1 or a 0, making them more susceptible to bit flips as well.

  • @arshiasoleimany4509
    @arshiasoleimany4509 3 года назад +7

    There is one thing. In JESD79-5, on page 155, the new Refresh interval is defined as 32ms. This benefit is linked to less leakage yes, however I believe that the On-Die ECC is a contributing factor in this decrease to refresh interval. Especially considering VPP has decreased.

    • @charlese2833
      @charlese2833 3 года назад +3

      AFAIK Lower due to more leakage, and lower voltage, being worse, so 2x more refresh is needed. On-die ECC would allow poor chips to achieve 32ms, rather than needing 20ms and being out of spec.

  • @keyboard_g
    @keyboard_g 3 года назад +4

    Linus Torvalds blames Intel for non-ecc memory.

  • @goldnoob6191
    @goldnoob6191 3 года назад +2

    Once you go ECC, you never go back. Easy Oc, error reporting and quality.
    They also are faster than consumer memory since they can address 2/4 memory chips at a time per channel

    • @bneymanov
      @bneymanov 6 месяцев назад

      I think when you say they are faster, you are talking about R-DIMM vs U-DIMM? There's ECC U-DIMMs too, fwiw.

  • @mourikogoro9709
    @mourikogoro9709 3 года назад +24

    You saved me form believing that DDR5 inherently has ECC.

  • @fastflame200
    @fastflame200 3 года назад +67

    RGB Makes things faster, so the ECC memory will be faster with RGB ;)

  • @jannegrey593
    @jannegrey593 3 года назад +6

    This needs subtitles for both people with hearing problems and foreigners like me who are better at reading English than Hearing it.
    Thanks for bringing the problem to our attention though!

    • @TechTechPotato
      @TechTechPotato  3 года назад +7

      Sorry, this video was one of my 'short record and publish', two hours start to finish. No real time in there to get a subtitle track done and added. RUclips's automated ones are slow these days unfortunately.

    • @jannegrey593
      @jannegrey593 3 года назад

      @@TechTechPotato Understandable in this circumstance. I do wish that if you publish an article on anandtech about this or if you will make video of what happened this week, you might include this topic and captions. Because it is really, really important issue. For over a year now - I've been waiting for DDR5 for the sole purpose of it having "native" ECC. Seems like I won't be buying it, because it makes more sense to stick with DDR4 (just buy ECC version) for at least a good year if not more.
      As an early adopter of DDR, DDR2 and DDR3 - I know how prices will be looking in first year.

  • @PanzerfaustParty
    @PanzerfaustParty 3 года назад +1

    Great video Ian! I just signed up for your patreon since I've always found your articles on Anandtech to be really in-depth and I'm really enjoying hearing your insights/analysis into the semi industry with this channel.

  • @ApocDevTeam
    @ApocDevTeam 3 года назад +1

    Cosmic Bitflip, nice name for a YT channel. Would be nice to hear more about those chips they use in space that have protection against radiation.

  • @807800
    @807800 3 года назад +1

    It reminds me of those one big RUclipsr who said Linus Torvald's rant regarding Intel's lack of ECC support is mistimed because DDR5 would support ECC by default. Oh, boy....
    thankfully we got people like you here correcting this misinformation.

  • @tinfever
    @tinfever 3 года назад +10

    I want ECC RGB memory that lights up red around the dram package that had the error. Then I can overclock my RAM to look at the pretty lights!

    • @domm6812
      @domm6812 3 года назад +1

      To infinity, and beyond! 😂

    • @OlliC1981
      @OlliC1981 2 года назад

      That is the best idea i read in a while.

  • @organichand-pickedfree-ran1463
    @organichand-pickedfree-ran1463 3 года назад +1

    I'm just here for the editing. Jokes aside, I think it adds at least 10% more fun to video. Love it.

  • @Ting3624
    @Ting3624 3 года назад +1

    on-die ECC could actually provides tighter timings or lower voltage. Think of it like this, tighter timing => error occurs, lower voltage => error occurs, but as long as the # of errors falls below the number of correctable (or detectable, depends on the policy) errors, it could lead to possible gain.

  • @MartinCHorowitz
    @MartinCHorowitz 3 года назад +1

    The need for ECC is also a function of altitude, for Aircraft I used to see data corruption is for Data In Motion , and Ram Data at Rest. Flash uses a more robust ECC algorithm for data rest so that did much better,

    • @volodumurkalunyak4651
      @volodumurkalunyak4651 3 года назад

      Flash has to have much more sophisticated ECC scheme as it wears dows from writing data to itsels. Flash memory writes are not perfect either (especially with TLC and QLC). Flash-based storage also has to retain data if stored unpowered with no way to refresh charge within memory cells. That is why such a complex solution is deployed within flash storage

  • @JohnLeidegren
    @JohnLeidegren 3 года назад +1

    I wish this kind of marketing was abandoned. I've been considering ECC just to eliminate a class of memory corruption issues but now you have to wade through terms and definitions that have nothing do with it. I'm glad I have an Ian to help me out!

  • @Silverhks
    @Silverhks 3 года назад +1

    Dr. Cutress stepping up his editing game!
    I like it. Especially the Gran Prix effect
    Oh, and thanks for the concise explanation.

  • @geoffstrickler
    @geoffstrickler 3 года назад +15

    Never the less, on-die ECC is a major benefit to most non-server systems as it provides more reliable memory than traditional non-ECC systems.
    That said, I’ve generally purchased systems that support ECC, however, it’s been nearly impossible to buy laptop systems with ECC support.

    • @rdoursenaud
      @rdoursenaud 3 года назад

      The workstation segment has plenty of models supporting ECC (Thinkpad P series comes to mind). Unfortunately, the price tag is a bit overwhelming for most people.

    • @geoffstrickler
      @geoffstrickler 3 года назад

      @@rdoursenaud yes, the cheapest Dell laptop with ECC starts at over $1800, with the base CPU, 8GB ECC RAM, and 250GB SSD. It’s over $2k if you bump it to 16GB ECC.

  • @alcatorc
    @alcatorc 3 года назад +3

    Loved the video! Thanks for the great content as always.
    One nitpicky thing: probably don't need to worry about alpha particles from cosmic rays, those have a penetration depth of microns (in materials) to mms (in air). Main issue would be x-rays, some of which could come from alpha's radiating their energy via bremsstrahlung, but it's super unlikely that the alpha's could pass through the atmosphere without stopping.

  • @teknoman117
    @teknoman117 3 года назад +3

    The other way to say this is that DDR5 modules protect from bit flips that occur inside the DRAM chip but NOT in transit. If the error occurs inside the controller or the bus on the board, there's no way to detect it other than a sideband ECC approach (using additional bits of storage for parity).
    I don't think anyone other than maybe the RAM makers really have any data on what percentage of errors are transmission errors versus flips in the DRAM. Considering that most studies show that the number of bit flips from a given module per unit time increases with the age of the ram module and that replacing it (in the same system) has the errors return to baseline would suggest most errors occur within the module.
    Servers will still use extra DRAM chips but it will still drastically lower the amount of errors overall.
    This is still a big deal, and it isn't marketing garbage.

    • @benjiro8793
      @benjiro8793 11 месяцев назад

      Another thing that irks me in this reporting, is that memory in transit does have protections build in. The issue is that this is limited to MEM > Controller > Controller > CPU. Where as ECC adds extra protection on top of that layer (at the cost of reduced bandwidth). Memory in Transit "ECC" was a feature of freaking DDR3 to DDR4! Yes, it does not extend down the complete pipeline, unlike "true" ECC but over time there has been more and more protection added on every part.

  • @mdesm2005
    @mdesm2005 3 года назад +2

    Sounds like on die ECC is just for yeild. Which means that some memory locations will be on the ragged edge of malfunction, permanently due to manufacturing defects, not from radiation or termal effect. And this will be "covered up" by on die ECC.

  • @zzco
    @zzco 3 года назад +10

    It's kinda funny how you keep wanting to say some variant of "secure" or "secured" but actually mean "protected" :p

  • @NickolasGupton
    @NickolasGupton 3 года назад +1

    For bit flips, you say that there are only 2 ways, but there is another way that is becoming more common. Row hammer attacks. By flipping specific bits in the memory module you can inadvertently flip an unintended bit thorough voltage leakage.
    This isn't a huge deal for consumer systems, but it's a huge deal for cloud data centers and other multi-tenant systems.

  • @neur303
    @neur303 3 года назад +3

    Another effect for bitflips is capacitive coupling to the other cells.
    Would be nice if you could put out a bit of info on how this affects the different rowhammer exploits.

  • @neferiusnexus
    @neferiusnexus 2 года назад +1

    LPDDR5 (LP meaning low-power) has link-ECC, meaning the whole signal chain is error-corrected. LP memory is mostly used in phones, and thank goodness the standard finally made ECC mandatory. Thermal bit-flips are that much more common in phones that might overheating because they're charging while gaming.

  • @ukaszszczesniak7593
    @ukaszszczesniak7593 2 месяца назад

    Absolutely great science-based video! We need more transparent technology information like your video, not marketing slogans in technical specifications

  • @PlanetFrosty
    @PlanetFrosty 3 года назад +1

    I agree Minimum Specification should be to sign up to support this channel!

  • @vibonacci
    @vibonacci 3 года назад +4

    So On-Die ECC has nothing to do with ECC: it's a completely different solution that vaguely has the same goal.

    • @jonathanmitchell9779
      @jonathanmitchell9779 3 года назад +1

      No, that is incorrect. On-die ECC is performing the exact same error correction as on an ECC DDR3/4 DIMM. If a bit gets flipped on both a DDR4 ECC DIMM, and a standard DDR5 DIMM, they will both correct the error using virtually the same correction scheme(parity bits stored on-die/per-die for DDR5; parity bits stored on an additional 'parity data only' die on DDR4 ECC). The distinction between the two is that an ECC DDR4 module requires platform support for ECC over the data bus, whereas the DDR5 module uses ECC *ALWAYS* and it works just fine on a consumer platform out of the box. Is full end to end ECC better? Well sure, of course, but the topic of the video here is 'why ddr5 does not have ECC' which is objectively false. Does 'default' DDR5 correct single bit-flips from cosmic rays using ECC, even without a 'full' ECC platform? Yes, it does. How anyone can perform the logical leap to decide that error correction code in memory is not equal to error correction code in memory is beyond me.

    • @jonathanmitchell9779
      @jonathanmitchell9779 3 года назад

      Not to mention, the 'end to end' portion of full platform ECC exists primarily to correct *hard errors*, which has absolutely nothing to do with the memory cell bit flips referenced in this video. I challenge anyone to find real documentation showing that the data traversing/in-flight on the data bus is affected by radiation/cosmic rays in the same manner or to a similar degree as memory cells. Things like ESD or power surges can cause both, but the point here is that there is a good reason for why we have the two terms 'hard errors' and 'soft errors' as they are two different things.

  • @tomdchi12
    @tomdchi12 3 года назад +2

    Ah well... It sounded too good to be true that we'd all be getting ECC everywhere with DDR5. I appreciate the explanation and particularly that you clarified that 1TB of system RAM was the order of magnitude where we'll need to start worrying about it for "everyday" use. That said, 64GB is only 16x smaller than that, so we aren't too far away from that day. (Yes, I have a lot of tabs open in Chrome.)

    • @ttb1513
      @ttb1513 Год назад

      More precisely, if 1 or more bits in 64 being read wrong is rare, like 1% of the time, then on-die ECC is in fact offering some (99%) protection against bit flips that "normal" ECC modules protect against. The thing it does NOT do is protect against errors on the high speed bus to the CPU. Those errors would go undetected and uncorrected if you only have on-die ECC.

  • @Trick-Framed
    @Trick-Framed 3 года назад

    It is so nice to see this channel exploding in growth (Compared to other tech channels). I know it's early but Congratulations Ian. Looks like this is going to be a big W for you and for us, the viewers.

  • @namibjDerEchte
    @namibjDerEchte 3 года назад +3

    Now I wonder what he said instead of the overdubbed "protected"...

    • @TechTechPotato
      @TechTechPotato  3 года назад +1

      I said secure. I kept saying secure. It's not the right word in this context.

  • @sbrubak
    @sbrubak 3 года назад +3

    To my knowledge alpha radiation from radionucleides in the molding material is a significantly higher source of soft errors than cosmic rays

    • @charlese2833
      @charlese2833 3 года назад

      That used to be the case. They claim to have solved that.

    • @sbrubak
      @sbrubak 3 года назад

      @@charlese2833 A chipmaker can specify low activity mould compounds that reduce the alpha emissions by about one magnitude, but that is still more than the activity caused by cosmic rays.

  • @besssam
    @besssam 3 года назад +1

    Finally, i honestly lost count of how many times I tried to explain this to people, at least now I can just say here, watch this

  • @ChrisDupres
    @ChrisDupres 3 года назад +1

    Thank you so much for clearing this up. People are expecting 72bit channels and getting 64

    • @ppokorny99
      @ppokorny99 3 года назад

      DDR5 could be 2x40, double pumped for 160 bits wide equivalent to DDR4

  • @porina_pew
    @porina_pew 3 года назад +3

    What proportion of memory errors are on die vs in transit?
    Also looking forward to counting memory channels with DDR5

    • @arshiasoleimany4509
      @arshiasoleimany4509 3 года назад +1

      Completely depends on a few factors. Namely the ODT, memory topology, memory subsystem and motherboard all are contributing factors for the transit. Most errors aren’t too large an issue on die.

  • @solidreactor
    @solidreactor 3 года назад +4

    Would be nice to have a techtechpotato RAM video including registered memory. One thing I have "heard" (not sure if it's true) is that reg (ecc reg?) can be faster than none reg memory IF you are using several DIMMs per channel. Is this true perhaps? Would be amazing if someone like you could take the time to explain, confirm or debunk these "rumors"

    • @namibjDerEchte
      @namibjDerEchte 3 года назад

      Yes, RDIMMs can run faster, but have a penalty of iirc 2 in most important latencies.
      This only matters when you get above 32GB per channel.

  • @GodSEndOMG
    @GodSEndOMG 3 года назад +1

    Great knowledge. On die ECC on Ram will incresse ram comfortabilitet that will help alot.

  • @ColinTimmins
    @ColinTimmins 3 года назад +3

    Thanks for properly explaining "On-Die ECC", the more you know... =]

  • @cornchipzzzz
    @cornchipzzzz 3 года назад +3

    I guess the new ddr5 ECC will still be a lot better at handling cosmic rays. Assuming a ray only flips one correctable bit.

    • @sporegnosis
      @sporegnosis 3 года назад

      to be fair, the bit flips happen 99% on the ddr side and almost never on the bus pathways or CPU, but even the 0.0001% will get you eventually.

  • @uncleelias
    @uncleelias 3 года назад +1

    I would like RGB that lit up every time a cosmic particle hit the module or a bit needed correcting

  • @robertsneddon731
    @robertsneddon731 3 года назад +1

    The Threadripper Pro CPUs from AMD support ECC DDR4 RAM since they're intended for desktop workstation systems where 128GB of RAM is a minimum specification, a "good start" at best. A lot of these sorts of workstations are doing production work for things where a data bitflip wouldn't matter, like video or audio or image processing but there are other tasks, for example mathematical modelling or engineering where a single data bitflip early in the chain of number-crunching can cause a noticeable error after a few million iterations.
    I remember the Old Days when early minicomputers could be specced with DEC-TED memory (double-error correction, triple-error detection), they had 22 bits of RAM to store 16 bits of data with the other 6 bits used for parity checking. This was due more to the manufacturing processes of the time and high failure rates of components rather than cosmic rays and the like causing bit-flips.

  • @Astronomine
    @Astronomine 3 года назад +1

    7:21 - 7:29 did you just voiceover "safe" with "protected" 🤣 That felt like a bit flip in my brain

    • @TechTechPotato
      @TechTechPotato  3 года назад +1

      It was 'secured', but that's the wrong word as it implies 'security'.

  • @whismerhillgaming
    @whismerhillgaming 3 года назад +1

    If you read this a few questions :
    - is it impossible to overclock ECC RAM ? get tighter timings while keeping ECC working (even if out of specs) ? (sure there would be a perf penalty from ECC anyway)
    - are ECC error corrections reported to the OS/logged ? this could be insanely powerful for overclocking & stability, less need for stability testing tools
    =>if your logs get filled with errors, you probably went too far in pushing your RAM
    - OR alternatively, if the RAM is only "factory overclocked" it could indicate that this memory isn't able to maintain XMP settings therefore giving a reason for an RMA
    -additionally I'm on the ECC would be great for diagnosing bandwagon.
    I had a failing AMD Ryzen 5 3600 for months,
    it produced random errors at idle, but never under stress load, this made it very difficult to diagnose
    and for a long time I thought it was memory problem or a motherboard problem
    On average 3 errors per day... this made it even more difficult to diagnose.
    I finally found the proof : disabling CPB would stop all crashes altogether and then AMD made no difficulties for a very fast RMA.

  • @he1go2lik3it
    @he1go2lik3it 20 дней назад

    Well described. They should have given the on die ecc a completely different term, that would make it much easier to understand and differentiate.

  • @TheDoomerBlox
    @TheDoomerBlox 3 года назад +1

    RGB ECC memory would be good if the colour would change after an error has been corrected, so that we can see how (little) it impacts the memory operation. :^ )

  • @hightechsystem_
    @hightechsystem_ Месяц назад

    I see the concern you are raising.. If they added on-die ECC, but didn't reduce their manufacturing quality, then yes, your getting an uptick in reliability. The problem is, that the ECC 'may already' have been used just to ensure normal operation, resulting in no additional redundancy for that row/col/byte which needed ECC to pass minimum spec levels. -- I know some DDR SDRAM memory vendors added additional bits to provide redundancy (turn of an entire column and use the spare column), but thats more expensive then using ECC in this way.. Thanks for clarifying the behind the scenes design decisions.

  • @thestrykernet
    @thestrykernet 3 года назад

    I've said it before and will probably again, but thank you so much for putting these information videos out. Explanations for things which the industry treats at best murky is something I feel has been sorely missing in the tech YT space.
    I know you've gone through it before, and explained it on Anandtech, but I still think individual videos are so important due to how many people choose to get their information this way.

  • @dcviper985
    @dcviper985 3 года назад +1

    One of my physics profs at uni was working on magnetic spin based memory.

  • @0LoneTech
    @0LoneTech 3 года назад +1

    On-die ECC should deal with the majority of flips that ECC covers today. It's much less common to have memory bus errors than storage errors. As a secondary point, there's no true requirement for ECC in general to worsen read latency; you could do speculative execution on the values read in parallel with the error detection, and invalidate and reissue those operations from the pipeline only if an error actually occurs. Write latency would still be affected, though to a much smaller extent than it already is by memory bus designs that can't operate efficiently in smaller blocks than a cache line. One advantage on on-die ECC would be that it could integrate scrubbing in the refresh cycles, fixing errors long before the CPU actually reads the data, and therefore vastly reducing the risk that a second error causes corruption without spending any bus time on the task.
    Of course, whether manufacturers actually do these things is a different story. For instance, real ECC memory is exactly the same memory, just wider; there's absolutely no reason for it to affect timings including the maximum frequency. That's basically marketing wank. The CPU manufacturer just doesn't want to sell you the overclocking and error correction functions together. On the other hand, you can have memory modules that *fake* ECC, protecting *only* the transfer and not the storage. Those would be slower and more complicated.

  • @albertob.4638
    @albertob.4638 2 года назад

    This was hands down the best explanation I've found on this topic. Thank you!

  • @creed5248
    @creed5248 3 года назад

    I kind of had an idea of how everything worked but was in err on the underlying reasons . Thanks for the explanation !

  • @unvergebeneid
    @unvergebeneid 3 года назад +2

    But won't the majority of bit flips actually happen on the die and not during the short transfer window?

  • @andrekz9138
    @andrekz9138 3 года назад +1

    Just a heads-up for everyone watching full screen from their computer: the bsod @3:01 is just a part of the video. Cheers!

  • @doryiii
    @doryiii 3 года назад +1

    So my takeaway from this is: DDR5 on-die ECC is not as good as ECC DDR4 modules we have right now, but still better than current non-ECC DDR4 consumer modules, right? Since current non-ECC DDR4 don't even offer protections for memory cell bit flips.

  • @Trumanlol86
    @Trumanlol86 3 года назад

    Thanks Ian! I have not heard anyone talk about this before now. Great explanation!

  • @swenic
    @swenic 3 года назад +3

    Using regular binary you'd need two bitflips to change a five to a six

  • @kepstin
    @kepstin 3 года назад +1

    One annoying thing about DDR5 modules is that since they have two 32-bit channels instead of one 64-bit channel like DDR4, you need twice as many extra dram chips to enable ECC :/ (ECC modules will have multiples of 10 chips instead of multiples of 9 chips). This is annoyingly gonna raise the price premium of ECC memory even more :(

  • @domm6812
    @domm6812 3 года назад

    Thank you for clarifying this Ian! ...and for the duck at the end 😊

  • @AB-fp8xo
    @AB-fp8xo 3 года назад +2

    You forgot "row hammer" attack that can cause bit flips.

  • @happygimp0
    @happygimp0 3 года назад +2

    NOOOOOO. I really want to have ECC everywhere, i want data integrity and stability.
    I have a Workstation with ECC, i never want to have a PC without anymore.

  • @MrHaggyy
    @MrHaggyy 3 года назад +1

    RGB for ECC could be in fact useful. If you count you bitflips in a module and colour code them, a guy in a datacenter could swap bad modules without any further knowledge about the whole system. But i doubt the added cost per RAM will be lower than letting the tech guy work longer or exchange the entire rack once you can't live with a RAM Module anymore.

  • @J0k3r399
    @J0k3r399 3 года назад

    DDR5 does actually support transport security with it's (optional) read & write CRC8 functionality without using full ECC, but your point about the end-to-end security still stands.

  • @dod-do-or-dont
    @dod-do-or-dont 3 года назад +1

    Thx, i was actually thinking that all ddr5 will have ECC out of the box.

  • @shariarrahman7562
    @shariarrahman7562 3 года назад

    Would love a much deeper dive into mainline ECC memory especially about how the module ECC is different. I don't know if you're ever going to see this common Dr. Ian but a video dealing with that subject matter would be very much appreciated

  • @SytheZN
    @SytheZN 3 года назад +1

    What I want is RGB where it's useful. Think ethernet port link light - would be nice to see at a glance the negotiated link speed & duplex with a bit more granularity than on/off.

  • @woopsserg
    @woopsserg Год назад +1

    ECC (unbuffered) does not reduce performance at all. It's just that such memory does not get factory overclocked. Most of consumer memory simply and is marketed and has SPD for higher speed than RAM chips are rated for (speed grade marking). You can overclock it just as well as usual RAM given it has chips which overclock well. Actually I have overclocked ECC RAM in my PC, the difference is that I can monitor RAM stability long term. With certain overclock I got like 1-2 corrected errors in a month, which you most likely won't be able to figure out by running tests on usual RAM.

  • @windkit0124
    @windkit0124 3 года назад

    We do need real journalists like you to keep the quality!

  • @mr.potato9449
    @mr.potato9449 3 года назад +2

    Could you have something implemented so that the memory runs at JDEC spec giving you ECC but when you launch a game it changes to XMP or whatever you have the OC set to so it runs faster just with no ECC verified in game?

  • @varno
    @varno 3 года назад +2

    Um, since ddr4 all memory has had in transit error detection on writr, and retransmission. It is called write crc. Ddr5 extends this to read as well, and so along with on chip ecc, and retransmission this gives you everything in ecc.

    • @J0k3r399
      @J0k3r399 3 года назад

      Hey, do you work in the industry? I looked into this a while back out of curiosity, but I couldn't find if the CRC support is actually mandatory. I doubt it is, since it apparently incurs a 25% bandwidth penality.

    • @varno
      @varno 3 года назад

      @@J0k3r399 I don't work in industry. The crc support is an option that can be used or not, I don't think it is manditory in that it is allways used, I do think it will always be there though. Costs too much to make different chips. It does have a bandwidth penalty.

  • @andarvidavohits4962
    @andarvidavohits4962 3 года назад +2

    This was great. Subscribed.

  • @TechyySean
    @TechyySean 3 года назад +2

    my minimum specification is that you should have 10x as many subs

  • @thebosss435
    @thebosss435 3 года назад +3

    so it's a manufacturing extra bit so you can have a faulty one

  • @curb_shifter
    @curb_shifter 5 месяцев назад

    thanks for explaining this quick n easy. learned something today