Zen 5 Deep Dive: The Tech Poutine

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024

Комментарии • 52

  • @JOAOPENICHE
    @JOAOPENICHE Месяц назад +64

    3 hours needs times stamps

  • @couldntfindafreename
    @couldntfindafreename Месяц назад +6

    2:04:00 What they don't advertise here is that models need to be adjusted to work well with those shared exponents, which is non-trivial. Also, quantization of weights (what consumes most of the RAM) is not the same as quantization of activations (calculations, KV-cache). So the effect of this on inference quality is non-trivial.

  • @amibaabi
    @amibaabi Месяц назад +12

    1:41:08 A major design principle of $ in multicore era not mentioned by cheese and ian here is that when a core demand a data living inside other core's private cache we want to keep the interruption minimum. A L2 inclusive of L1 shields the core from such disruption. Intel used to do L3 inclusive of L1 that achieves similar effect, but large multicore L3 is so expensive that they have to cut the size since skylake x, and that leads to abysmal performance. To compensate they went the same route as amd, to make L2 inclusive and L3 exclusive. Then they went bonkers and decided it's not important and make L1 exposed. Now their cores are trash for parallelized code with competitively shared data structure.

    • @couldntfindafreename
      @couldntfindafreename Месяц назад +2

      I have a feeling CPU designers are forced to solve issues at the hardware (or microcode) level which should rather be handled at the OS, compiler or even programming language level these days. Software is lagging behind, so hardware has to be over-complicated and cannot be optimized to its full potential. This is one of the things which developers need to fix in the future.

    • @amibaabi
      @amibaabi Месяц назад +1

      While I agree many issues could've solved efficiently in sw, this is not one of those. Even gpu cares about data share latency...

  • @JunaidHasan23
    @JunaidHasan23 Месяц назад +9

    Wow Potato and Cheese. Great combo 👍

  • @BustaaNut431
    @BustaaNut431 Месяц назад +3

    Any plan to distribute this in an actual podcast form?

  • @dirg3music
    @dirg3music Месяц назад +10

    Couldn't agree more with your Cinebench comment, I feel the same way about Geekbench given the way it overwhelmingly weighs memory speed. Look at the scores of a 7800x3d vs a 5950x, no other multithreaded workload will reflect that score. lol. I've always felt that individually, benchmarks are useless, but can give you a ballpark idea of what to expect when aggregate a lot of them and see where the scores fall.

    • @chengong388
      @chengong388 Месяц назад

      I’m sorry are we comparing multithreaded performance between an 8 core and a 16 core part?
      What am I supposed to get out of this?

    • @dirg3music
      @dirg3music Месяц назад +8

      @@chengong388 the point was that Geekbench has the 7800x3d outperforming the 5950x in multicore which is pretty obviously false. The takeaway is that Geekbench is very flawed in its scoring, despite its prevalence. All other benchmarks have their own wonkiness too of course, which is why I agreed with Ian's comment about Cinebench. Really don't see what the confusion here is. lol

  • @mathieubelloir1098
    @mathieubelloir1098 Месяц назад +2

    Could you add time stamps for theses very long interviews ?

  • @coa8109
    @coa8109 Месяц назад +3

    i just want to comment on 54:43. I feel like we are reaching diminishing returns of branch prediction. I might be misremembering but papers in the 80s were saying Branch Prediction could only bring at most 10x IPC increase. Perhaps we’re way past the half way point on the logistic curve. At some point we need to rethink the way we’re writing code to incorporate OoO execution

    • @davidgunther8428
      @davidgunther8428 Месяц назад +1

      Branch prediction is about reducing memory access delays.

    • @kazedcat
      @kazedcat Месяц назад +2

      The point of Branch Prediction is to reduce bubbles. Yes they improve IPC by filling in the Reorder Buffers. But improving the reorder window is a more direct way of increasing IPC.

    • @coa8109
      @coa8109 Месяц назад +1

      @@kazedcat thanks for the clarification! I was just thinking that the TAGE predictors could already reach pretty high accuracy (I remember something like over 90% of branches in some benchmark) and probably runahead prediction will bring the prediction accuracy up even further, at some point I’m wondering if we should still focus on branch prediction rather than say, more instruction hinting or exposing more hardware details to the compiler?

    • @kazedcat
      @kazedcat Месяц назад

      @@coa8109 Yes Branch Prediction is now pretty damn accurate but their importance is now somewhat reduce because of the OP cache. If the correct instructions are stored in the OP cache it bypasses the long Fetch Decode pipeline so having incorrect prediction is not as impactful as before when there was no OP cache. Now to improve IPC for modern CPU is down to how much transistors can you allocate for OP cache and Reorder Buffer. Predication is not a solution because Branch Predictors has a runtime statistical data that is not available to the compilers. Zen5 has dual fetch decode so it can even fetch and decode both taken and not taken paths at the same time. You don't need a hint on which path when you can just decode both paths.

  • @predabot__6778
    @predabot__6778 Месяц назад

    Interesting! I had no idea that Zen5's branch-predictor is actually revolutionary in some sense! Will be interesting to see what this means for vulnerabilities though - could it increase the target-area for Spectre-like attacks?

  • @favalon79
    @favalon79 Месяц назад +2

    Is it a cooking video or a tech talk? :)

  • @jorry1992
    @jorry1992 Месяц назад +4

    Would be lovely if you could improve audio. Keep up the good work!

    • @ArdgalAlkeides
      @ArdgalAlkeides Месяц назад

      Yeah it's literally hurting my ears trying to listen to this

  • @lugaidster
    @lugaidster Месяц назад +3

    Zen 5 starts at 42:45

  • @ChrisJackson-js8rd
    @ChrisJackson-js8rd Месяц назад +1

    what happened was 4 nodes in 3 years or whatever lol
    they kept the schedule
    rather than disappointing shareholders over something that was possibly or maybe even probably not going to be an issue - despite not fully understanding the nature of the challenge
    sometimes when you roll the dice you lose. even where its a sure thing
    addendum: intel does selective destructive testing of sample dies in order to map the quality across the wafer combined with basic validation of all other dies from the wafer in a well understood and reliable process that has been developed over decades and is gives a very good measure of the relative quality, performance, and lifespan of the part. when you're on a new node and there have been acknowledged issues with contamination.... the issue is the baseline that all this data is related to. and this is where things become subjective and based on the experience and intuition of the engineers. at the end of the day the day there will be a a calculation made based on the confidence interval of the testing as related by the engineers and the acceptable level of risk as determined by the financial and accounting officers of the company. at intel the acceptable level of risk has increased greatly in recent years. i cant say they were wrong to release these parts, but i am quite confidant that 5 or 7 years ago they would have baulked at this level of risk.
    and also yea it's like the news is whatevers bad in the world. if the economy is doing well every headline is inflation amd prices, the as it starts to slow every headline is about jobs and wages. all of a sudden the consumer of the news is required to infer the unspoken and ignored side of the coin. and most people either cant or would rather not because nihilism is easy+cool.

  • @arteeFartee-e9
    @arteeFartee-e9 Месяц назад

    George's setup 💯 looks great, and the audio too. This is a good idea for YT Livestream. Wendell should be GRAVY yes!

  • @csollermoller
    @csollermoller Месяц назад

    very nice guys, understood half of it, but still enjoyed it a lot.

  • @henrikoldcorn
    @henrikoldcorn Месяц назад +1

    Make sure your computer gets at least 4 litres of PCIe to drink per day; 8 in hot weather.

  • @brunosalezze
    @brunosalezze Месяц назад

    About the greater gains with future software. Do you think just compiling existing code with an updated compiler (with the correct zen5 cost table) will give most of the benefit of the architecture? Or these gains are supposed to be for those programming at the lowest level?

  • @couldntfindafreename
    @couldntfindafreename Месяц назад

    2:49:00 I think it is better to wait until the X3D variants become available. By that time the X ones and the new motherboards/BIOSes will have their kinks worked out.

  • @florianb.1382
    @florianb.1382 Месяц назад +2

    "All this for 5-10% "

  • @ChrisJackson-js8rd
    @ChrisJackson-js8rd Месяц назад +2

    what you guys gonna ask the qualcomm presenters when they present snapdragon at hotchips

  • @csollermoller
    @csollermoller Месяц назад +1

    timestamps pls!!!

  • @linuxgeex
    @linuxgeex Месяц назад

    Best comparison of Int VS FP is Arithmetic vs Math. Integer is used for Arithmetic. That's counting tasks. It's good for and program logic and accounting tasks. FP is used for Math. That's mostly trigonometry, and Fourier transforms - physics simulations and multimedia.

    • @peterfireflylund
      @peterfireflylund Месяц назад

      “Integer” is very much address generation, too.

    • @linuxgeex
      @linuxgeex Месяц назад

      @@peterfireflylund Exactly. It's foundational program logic based on arithmetic. FP is based on Int as well.. the mantissa ops are done via hard-coded address generation into tables and addition, and the exponent is straight integer. If you boil it down it's ultimately all digital logic ops, and the digital logic ops are ultimately based on analog tank circuits lol... and there's a dozen analog ops backing every digital one, and those ops are much more like FP than they are like Int... and then below that there's the quantum physical interactions that are even more like FP in many dimensions. And who knows what's below that lol.

  • @lenmetallica
    @lenmetallica Месяц назад +4

    Where's the gravy?

    • @TechTechPotato
      @TechTechPotato  Месяц назад

      Any guest that comes on will be the gravy

    • @lenmetallica
      @lenmetallica Месяц назад

      @@TechTechPotato mmmmm guest gravy

  • @cannesahs
    @cannesahs Месяц назад

    Please someone help George to fix that horrible audio. Is he eating mic? I have no idea what there is wrong, but it has been really horrible to listen in (almost) all videos.

  • @CjqNslXUcM
    @CjqNslXUcM Месяц назад

    can you make this a podcast i can subscribe to on a podcast app?

  • @pistonsjem
    @pistonsjem Месяц назад

    How dare you talk about the leading authority in computer hardware reviews like that

  • @egalanos
    @egalanos Месяц назад

    Ian: your audio was out of sync again @16:49 (at least during the moment zoomed in just on you whilst George sorted out microphone issues. Not sure if before/after that as I wasn't watching)

    • @TechTechPotato
      @TechTechPotato  Месяц назад +1

      Nothing wrong here? It's pulling the AV1 for me

    • @egalanos
      @egalanos Месяц назад

      @@TechTechPotato hmmm, I definitely still see it out of sync on my phone which is playing the VP9 & Opus.
      Debug info:
      {"cplatform":"mobile","cff":"SMALL_FORM_FACTOR","c":"android","cplayer":"PLATYPUS","cmodel":"Pixel 8 Pro","cos":"Android","soc":"Google:Tensor G3","cver":"19.29.37","cbrand":"Google","cbr":"com.google.android.youtube","csdk":"34","cbrver":"19.29.37","cosver":"14.AP2A.240705.005.11942872","videoid":"Wl3O2eXy-cg","cpn":"FWHq--_dz3LawA5u","fmt":"247 vp9 1280x720@30","afmt":"251:CggKA2RyYxIBMQ opus","bh":49003,"conn":3,"volume":52,"loudness":"-3.590","bat":"0.690:0","df":"26\/122160","time":"2024-07-25T09:23:46.503Z","glmode":"RECTANGULAR_2D","drm":"","mtext":"G","error":"No errors","logged_in":"1"

  • @TheDaysOfAi
    @TheDaysOfAi Месяц назад

    I think intel should own the "intel economy core" by branding and releasing the "Intel i Economy 7 CPU" cheaper electric from 12am to 7pm from my energy supplier lol

  • @theexplosionist2019
    @theexplosionist2019 Месяц назад

    vpintersect is useless. Where's cldemote?
    Its disappointing there's no new instructions. These instructions would be useful:
    e.g. vpdivq vector integer division, vpleaq vector lea, vpsortd vector dword sort, vpmadd64l/huq 64-bit iFMA ?

  • @ArdgalAlkeides
    @ArdgalAlkeides Месяц назад +1

    The audio just keeps going from bad to worse throughout this, why didn't you do a basic test of how the sound setup is, record 30s of you talking shit and check how it sounds like before going live? It's literally straining for me to try and hear what is being said with the volume differences, the distortion from the gaining, the ridiculous compression, and the potato mic quality, my ears *hurt*

    • @TechTechPotato
      @TechTechPotato  Месяц назад

      We did a test prior. We adjusted. Everything was fine our end. But you do you

    • @peterfireflylund
      @peterfireflylund Месяц назад

      @@TechTechPotatoIan, your audio was fine but too low. George’s was downright awful. Did he use the external microphone or the internal laptop microphone? Was there some clipping somewhere in his audio pipeline?

    • @TechTechPotato
      @TechTechPotato  Месяц назад

      George has a mic on his shirt. But it's also all new equipment for him. The auto gain on his camera might have been on as we kept dialling the receiver back. I also turned him down in streamyard quite a bit

  • @doodledum2119
    @doodledum2119 Месяц назад

    this kind of thing never happens on apple machines
    .

  • @niknikmoore
    @niknikmoore Месяц назад

    Don't forget to LIKE the video