Intel's Plan to 1000x Performance with Raja Koduri

Поделиться
HTML-код
  • Опубликовано: 5 окт 2024

Комментарии • 89

  • @technologyanimals
    @technologyanimals 2 года назад +77

    Getting Raja drunk to gain information. Excellent plan, Patrick.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +15

      Do not forget tired too!

    • @aacasd
      @aacasd 2 года назад +1

      😂 always trust a drunk fellow

  • @metallurgico
    @metallurgico 2 года назад +39

    Yep! Exactly like the Pentium CPU at 10GHz they predicted in the year 2000 for 2005.

    • @squelchedotter
      @squelchedotter 2 года назад +6

      Thinking of the Itanium sales forcast graph

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +18

      There is one very important difference. Raja is not talking 1000x in a single piece of silicon or even the same number of packages.

    • @jannegrey593
      @jannegrey593 2 года назад +3

      @@ServeTheHomeVideo This is a very big difference. Even though Raja is known to hype his products a bit much - I'm certain that if they let him do his job, they will at least come close to the goal.

    • @metallurgico
      @metallurgico 2 года назад +4

      @@ServeTheHomeVideo I am always skeptical about predictions in general, it's not about Intel. It seems more marketing than anything else.

    • @krayzieka
      @krayzieka 2 года назад +1

      😆 right

  • @StenIsaksson
    @StenIsaksson 2 года назад +15

    "Hi. Let's have a beer or five"
    "Sponsored by Budweiser"

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +3

      I did go on the Anheuser Busch brewery tour earlier in the week. It was very cool to see the process.

  • @jfkastner
    @jfkastner 2 года назад +5

    The answer of that Zetta machine:
    42

  • @jannegrey593
    @jannegrey593 2 года назад +22

    If we remember something that most people didn't notice - that we're talking how to build more powerful Super Computers, not 1000x performance increase on the chip - then this is somewhat feasible. Frankly it depends on mount of money and R&D.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +11

      You are completely correct. This is 1000x in a ~50MW power budget, not on a single chip.

    • @jannegrey593
      @jannegrey593 2 года назад +3

      @@ServeTheHomeVideo Yeah. People are making emotional decisions based on emotions and prior screw ups. Or without watching the video. Yes - they might not achieve this goal. It happens. But a lot will come from this bundle of projects nonetheless.
      Also I'm AMD fanboy, but that doesn't mean that I don't know that Intel can do great things if they want to. They have tons of R&D and money. If it outweighs their greed and they get themselves together when it comes to treating their workers well - they will do wonders.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +10

      Yea. I mean, Raja was also telling me how when he was at AMD he named Infinity Fabric. It is a bit funny that people that are fans of Raja's AMD products think he cannot make something good at Intel.

    • @jannegrey593
      @jannegrey593 2 года назад +5

      @@ServeTheHomeVideo The only "fault" of Raja that is more widely known at least - is that he over-hypes his products sometimes. People forget that this "sometimes" is important. It happens to many people, but somehow Raja is known as the "hype guy". He is one of the best microchip engineers in the world. He can do all sorts of fantastic stuff - the question is rather: Will he have all necessary components and technologies to do this? That mostly depends on Intel and other Engineers.
      Shame that not the best management pushed out Jim Keller - together they could have done fantastic things. Though Jim had the chance to put a lot into Intel despite being there so short. CPU part of this "zetta-scale" plan is at least partially based on architecture he was thinking of and preparing. Damn corporations - never knowing when to recognize talent and have other priorities rather than only making as much money as possible in short term. Long term is even more important.
      I certainly believe that Raja has the plan and it is feasible - whether it works out is a different matter - but each part of this plan will give us tons of more computing power and new technologies.

  • @nekomakhea9440
    @nekomakhea9440 2 года назад +10

    Phase 1: Xeon-next
    Phase 2: pre-zetta
    Phase 3: ????
    Phase 4: *_PROFIT_*

  • @BurnsRubber
    @BurnsRubber 2 года назад +3

    Hitting fab nodes on schedule will be one of the largest hurdles. Laws of physics are becoming ever more the enemy.

  • @ramakrishna5480
    @ramakrishna5480 Год назад +1

    Just to clarify , Human brain is 10 to the power of 16 , zettascale is 10 to the power of 21

  • @briand01
    @briand01 2 года назад +2

    folding at home is in the ExaFLOP range now 2020-2021

  • @p3chv0gel22
    @p3chv0gel22 2 года назад +2

    Drinking 6 or so beer with Raja Koduri. Now i'm jealous

  • @LeonardTavast
    @LeonardTavast 2 года назад +7

    I believe it when I see it. Intel have to make some serious advancements in accelerators, fabrics and FLOPS/W to boost performance 1000x in 5 years. Without improved FLOPS/W the energy demand will be too great for many potential customers. Not many people have a spare nuclear reactor lying around to power a Zettascale cluster, so to speak. Fugaku uses 30MW for 450 PFLOPS. With linear scaling that would be a bit more than 60 GW for 1 ZFLOPS. That's starting to reach levels comparable to steel smelters.
    1000x means that Intel will have to maintain 10x per generation with a cadence of 20 months. It's optimistic and I hope that they manage to reach their target, even if I doubt it. Checking Top 500 shows that we have seen 5x increase of performance with 2.5x improvement in FLOPS/W in the past 5 years for the number 1 spot.
    This is just my napkin calculations of Intel's claims. I'm exited to follow the fierce competition in the coming years even if it only ends up at 10x or 100x.
    EDIT: On a brighter side accelerators do show a lot of promise. AMD, Intel and Nvidia will probably all reach 100 TFLOPS per accelerator in 2022.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад

      Totally. Raja walked me through a lot more than is here, but optimizing on FLOPS/ Watt is a big driver of some of the drastic changes coming

    • @solomonmatthews7921
      @solomonmatthews7921 2 года назад

      Exactly, it's wildly implausible for any apples-to-apples comparison. Fully expecting Raja to end up comparing multiplying reduced precision sparse matrices to today's general FP64.

  • @stuartlunsford7556
    @stuartlunsford7556 2 года назад +3

    So happy I clicked on this, you should have titled it "Beers with Raja."

  •  2 года назад +1

    Another amazing bit of work by the P man. I always feel like I've learned a module worth after watching these vids.

  • @ThineHolyBacon
    @ThineHolyBacon 2 года назад +2

    My god, imagine using silicon photonics to have DIMM style CPU Cache's. Water cooled, massive bandwidth memory WITH the capacity to match

  • @PlanetFrosty
    @PlanetFrosty 2 года назад +1

    Fascinating in photonics

  • @rem9882
    @rem9882 2 года назад +4

    That was a great video. More of these and water cooling videos please

  • @christopherjackson2157
    @christopherjackson2157 2 года назад +2

    There's def a lot of room to improve the rate of data flow to and between processors before some workloads are able to saturate the chips that are coming out. I suppose it really depends on workload

  • @APU-iGPU
    @APU-iGPU 2 года назад +1

    Heavy integration of several technologies in a package will be one of the key factors for the success of Zettascale computing.

  • @shapelessed
    @shapelessed 2 года назад +5

    I wonder why don't more datacenters reuse all the energy they capture while cooling chips...

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +4

      That is going to be a bigger thing in the future.

    • @shapelessed
      @shapelessed 2 года назад +3

      @@ServeTheHomeVideo I mean, they could heat the neighborhoods in winters, generate electricity back from the steam if they use liquids with lower boiling temps or just water in depressurized environments.

    • @creker1
      @creker1 2 года назад +5

      @@shapelessed some already do

    • @MarkRose1337
      @MarkRose1337 2 года назад +2

      You're starting to see crypto mines colocate in buildings. Data centers are trickier due to bandwidth and security needs. They heat produced is low density so not good for much other than heating to keep humans comfortable.

  • @EyesOfByes
    @EyesOfByes 2 года назад +1

    I love this kind of ambition

  • @irinakolcheva5212
    @irinakolcheva5212 2 года назад +1

    And Yottascale computers when ? :)

  • @zunriya
    @zunriya 2 года назад +1

    xeon phi and knight landing is greatest product on HPC on how to optimize and code i hope intel made new one

  • @visvamba
    @visvamba 2 года назад +1

    i miss the blue doors set

  • @farhanaf832
    @farhanaf832 6 месяцев назад

    We can help scientists by processing data from boinc distributed computing software

  • @motozest7856
    @motozest7856 11 месяцев назад

    2:42 - 3:00
    Precisely...
    1 zettaflop of Linpack FP64 peak performance won't happen before 2033.
    You've read it here first.

  • @excitedbox5705
    @excitedbox5705 2 года назад +7

    one api would be great if it wasn't intel. Any time a standard has a company name attached to it (especially an anti competitive one) it is already a no go.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +6

      Well... it is competing against CUDA here :-)

    • @Waitwhat469
      @Waitwhat469 2 года назад +2

      @@ServeTheHomeVideo it's too bad it's not more of a collaboration, like the mesa drivers are for intel and AMD
      Because from an outsider looking in it looks like AMD already did the same thing with ROCm, but you here we are still using cuda (though I have seen work trying to get ROCm and Vulkan support on Tensorflow and Pytorch!).

    • @creker1
      @creker1 2 года назад +1

      @@Waitwhat469 last time I checked, tensorflow and pytorch do support ROCm. I don't think you even need to resort to forks or anything. The source code does seem to contain necessary bits for it.

  • @kehoste
    @kehoste 2 года назад +1

    Next time you have beers with Raja while discussing the future of Intel in HPC, make it strong Belgian beers.
    I guarantee you that you'll learn a ton more stuff that you're not supposed to know about yet.
    You'll also remember nothing at all about the whole discussion the next morning though...

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +1

      I certainly learned a lot more, but this is what I was able to disclose

  • @System51
    @System51 2 года назад +2

    Ah nice it's back.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +1

      Yea, rough. The re-upload is never as good, but his one actually processed.

    • @System51
      @System51 2 года назад

      @@ServeTheHomeVideo Nice. Also awesome video. :P

    • @Adrian-jj4xk
      @Adrian-jj4xk 2 года назад +1

      @@ServeTheHomeVideo it cut off halfway through last night, i was worried maybe intel tried to redact the drunk chip notes

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад

      @@Adrian-jj4xk just rough. That video got stuck but went live on a schedule. So had to re-upload. Hurts the views on the re-upload a lot.

  • @aacasd
    @aacasd 2 года назад +1

    This is good reason to delay Aurora for another few years and deliver a new supercomputer.

  • @andytroo
    @andytroo 2 года назад +1

    fastest super computer in the world -> 50kw ... 7 million cores / 30,000kw so factor of 600 in power there .. so 500 w system (100th of that power) would run 7000 cores worth of processing?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад

      50kW seems low. It is more like 50MW

    • @andytroo
      @andytroo 2 года назад

      @@ServeTheHomeVideo the fastest computer in the world runs in about 30MW atm, so a factor of 1000 improvement in performance for a factor of 600 drop on power/performance could make sense
      The fastest computer in the world is about 1 million times faster than a decent PC ; a factor of 1000 performance improvement could lead to a PC capable of 1000x the capabilities also; (the number above of 7000 cores is only ~200x the best desktop workstations at 32 cores ; feels like a reasonable outcome for home use given a 1000x top end improvement)

  • @youtubecommenter4069
    @youtubecommenter4069 2 года назад

    AMD should get you drunk next Patrick, to download all your convo with Raja.

  • @guy_autordie
    @guy_autordie 2 года назад

    I don't get how moving data use that much power.
    Also doesn't PCIe gen5/6/7 (or similar protocol) will help with that? I mean, twice the data transfer by time unit should help about the power budget, right?
    Edit: Maybe using big.LITTLE should kinda help? Off-loading system routine on the E-core, even "transform" some E-core in DPU of sort, like Nvidia will do with mellanox soon?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +4

      Yea, it is a bit wild. People talk about cores a lot. In a modern accelerated server 10%+ of system power is just going toward spinning fans. I actually asked Raja if he could have Intel put together something around where power is used. It is a big deal in modern system design and I think it is one that is not well understood. You often hear Arm talk about core efficiency, as an example, but in the context of a system that impact is actually much smaller.

    • @creker1
      @creker1 2 года назад +5

      PCIe is dead slow compared to the speeds data is moving inside the CPUs. To better understand why it takes so much power you have to understand how CPUs work and what it takes for them to be as fast as they're. It all comes down to huge discrepancy between compute and memory bandwidth/latency growth. We're very good at building fast cores. It's actually relatively easy. What we're extremely bad at is building fast memory and interconnects. I say PCIe is slow but even DRAM is dead slow to CPUs. That's why they all have levels upon levels of CPU caches, internal buffers, queues, huge register files. CPUs can't run properly if they have to touch DRAM all the time. It means data has to constantly move between all of those things I mentioned. That takes huge amounts of energy. Much larger than the actual computation takes. That's why you see AMD going for V-cache and Intel for on-chip HBM. That's the only way they can properly feed compute with data. But that's only a temporary solution.
      Right now research is being done as to how we can properly solve that problem. One interesting solution is to actually move compute closer to the RAM called processing in memory or PIM. Samsung and others already have some working examples that show huge performance gains. It's a fundamental problem so no amount of e-cores and DPUs gonna solve it. They're all part of the same problem. Actually, DPUs do actually have something that's closer to what we want. Usually they contain a bunch of special logic for solving specific tasks. Generic compute can no longer sustain performance and power requirements and people have to build specialized hardware that's an order of magnitude more efficient.

  • @fffforever
    @fffforever 2 года назад +1

    This is "low energy"??

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +1

      15 minutes of sleep, two airplane rides, and a booster shot kicked my buttox. Sorry for the low energy on this one.

    • @fffforever
      @fffforever 2 года назад

      @@ServeTheHomeVideo i have never had as much energy as you have at "low energy"🤣

  • @Blackenese2plz
    @Blackenese2plz 2 года назад +5

    Sounds like drunk hype talk which isn't logically capable. 10x might be believable unless their plan is to make a 2U chassis which is just a CPU or bank of 6-8 future CPUs drawing 1.5-2kw for the chassis. Which at that point is just shifting the problem of storage/accelerators. It's like saying hey my server is 5x as powerful as your but by the way I need a two racks of networking and storage to make it all work.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 года назад +6

      You are correct, and this is discussed a bit in the video, but the plan is to use more but at lower power each. The constraint was 1000x in 50MW not making each of the pieces of silicon 1000x faster.

  • @karim.mmmmmmm
    @karim.mmmmmmm 2 года назад +1

    Not in 20 years

  • @aquatechie
    @aquatechie 2 года назад +1

    Patrick? Low on energy? Nah....

  • @EyesOfByes
    @EyesOfByes 2 года назад

    Zeta? Should be called Olive Lake 😉

  • @JohnWilliams-gy5yc
    @JohnWilliams-gy5yc 2 года назад

    You can now see the reason he dropped his launching card in LTT channel awhile back and then made himself fired indirectly. GPU is all about scaling. Maybe he needed some owning fab to pull this off and AMD wouldn't give him that.
    I just wonder if Dr. Su will buy him back to lead AMD eventually.
    I hope Raja will terminate the crypto-induced shortage frenzy for good eventually.

  • @level80888
    @level80888 2 года назад +4

    Hello guys, Im Patrick from... Intel? (c)

    • @dnmr
      @dnmr 2 года назад +3

      Beers were paid by Intel so this is a sponsored video /s

  • @chadweirick
    @chadweirick 2 года назад +2

    Intel needs to spend less time daydreaming and more time working on manufacturing process shrinkage then they can talk.

  • @calvint3419
    @calvint3419 2 года назад

    oh we can call these tips: chips leak

  • @alokcom
    @alokcom Год назад

    Oh with 6 bear bottle, u will millions from AMD :)

  • @PeterMarszalkowski
    @PeterMarszalkowski 2 года назад

    Can't they be clever build cups like Lego bricks that can connect the chips 2x 3x to 6x on ramming that they are fast enough to build over time and you can bevel from any side with lances on each side pull and link together to more units than adapters so people buy more chips at the limit more ram more to tinker an elective 256 cores with time if you want or less that you can use power supplies, tumblers and board plates on cpus like Lego bricks optimization Boost voltage and fire ma boards the Gpus in Sli dual they shouldn't build elective shit all the time that nobody wants to afford that should be combined and sensibly combined not slow down the whole package - for me whoever comes with small chips big to come to provide for years also driver hardware moderately that should work with adjustments yes boards with 4 times Cpus I'm talking about leg stones and ram adapter on powerful cabling for redirection and tension attitude fast enough for the simple user why would you want to spend 36,000 euros for 56 cores and more ^^ people get bored of buying the system no longer you should go straight up to be loved less Keep up the level for years and be compatible without restrictions, you can also work step by step sandwitsch wise so with 4 times the turnover to approach the limits small so at the limit with 4 driver data location on the same tack at the end make a new stop limit combine rule around the whole sin close . So that people buy more wisely buy and drive in the future not overpriced, for a product but normal for a mission. I would advise you to come up with something for the consumer room you are tormenting yourself with overpriced microprocesses, if the steps are enough that are planned and can be clicked, because in the end the same delivers with more cores ram better benchmarks fps sets principles adapter board ingeniously thought out. built on a generation, not on a one-off basis.

  • @rodjos5463
    @rodjos5463 Год назад

    Scripted

  • @PWingert1966
    @PWingert1966 2 года назад

    more like 2077!

  • @Ray_2097
    @Ray_2097 2 года назад

    What the hell am I listening... Get to the point!

  • @shvrdavid
    @shvrdavid 2 года назад +2

    The two biggest issues with getting to Zettascale, are distance, and idle clock cycles. Until the entire server fits into a very small space, as in on the same die, you have run out of time at one foot per nano second to get anywhere near close to a zetta flop. One foot per nanosecond is 10 to the 9th... A zetta scale is 10 to the 21st. As soon as they figure out how to go faster than the speed of light, it will be easy to get to a Zettascale using massive parallelism in a really small space. Changing the laws of physics is only one hurdle thou.. We have not even gotten to idle states yet.... The best hardware we have now already has gobs of wasted clock cycles, literally waiting for something to do. That is another issue that needs addressed that has gotten worse with every generation increase, not better. There is no magic wand to increase cpi to make a system 1000x faster in a handful of generations. That is a long way off, but sounds real good to investors that own stock that has dropped off again... The best present solutions to this issue are the Tesla Dojo DPU and the Cerebras Wafer Scale Engine. But even those systems can not break the laws of physics, putting unlimited scaling into question simply from distance and size alone...

    • @AndyGraceMedia
      @AndyGraceMedia 2 года назад

      Excellent comment. Absolutely correct.

    • @motozest7856
      @motozest7856 2 года назад

      You should send your resume to Intel, buddy - they'd be lucky to have you.

  • @AndyGraceMedia
    @AndyGraceMedia 2 года назад +5

    Really interesting and thanks for the detective work! Of course it's all contingent on Intel and we know from recent form they've been a long way short of targets. Silicon photonics is the best chance of making some really serious gains, but these "sub-3nm equivalent" process nodes are going to be extremely tough.
    The biggest risk for me is the one virtually nobody considers at the technology level; the economics side of it all. This is a brave and optimistic plan but it's contingent on the market not imploding in the next five years, another debt crisis, or a currency crisis and consequential big cuts to the top-tier customers who rely on government funding these types of supercomputing systems. I hope that's wrong, but it's far from zero risk considering the economics right now..