What's Wrong with the New Nvidia GPU

Поделиться
HTML-код
  • Опубликовано: 5 ноя 2024

Комментарии • 394

  • @AnastasiInTech
    @AnastasiInTech  Месяц назад +44

    Let me know what you think and share this video with your friends!

    • @lb5928
      @lb5928 Месяц назад +5

      AMD has been making the worlds most powerful GPUs and CPUs with many tiles and chiplets.
      Their latest GPU has 12 tiles and Nvidia struggles to figure out just a 2 tile design.
      AMD has much superior engineering.

    • @hdcomputerkeith
      @hdcomputerkeith Месяц назад +1

      xoxoxooxoxo

    • @CautiosulyOptimistic1440
      @CautiosulyOptimistic1440 Месяц назад +3

      I'm going into chip design. You were my inspiration. I'm also considering monolithic designs, though I'm focused more on the gaming side of technology.

    • @fullstackcrackerjack
      @fullstackcrackerjack Месяц назад +5

      You need to watch out for, and remove these scammer comment threads talking about “stocks” and “financial advisors”. These are posted by bots, and are run by investment scammers. You have one below right now.
      Don’t allow your fans to be preyed on.

    • @musicbro8225
      @musicbro8225 Месяц назад +2

      ​@@fullstackcrackerjack I agree wholeheartedly! But on top of those easy to spot threads there are so many other comments that are suspicious. It's all engagement as far as the channel is concerned so I doubt they will spend much time weeding out these BS comments. Who knows these days who is a real human and what is a bot and with so much orientation to marketing mindsets it all drives the system of algorithms, so no one does anything. It's frustrating and worrying, but who cares right? Just the tip of the 'extinction event' rising into view?

  • @HDfoodie
    @HDfoodie Месяц назад +142

    THIS is why I liked Intel’s idea of replacing organic substrates with glass. The thermal coefficient is closer to pure silicon and the manufacturing process gets easier for TSVs

    • @kashyapchodankar7568
      @kashyapchodankar7568 Месяц назад +24

      Maybe because glass is literally silicon dioxide

    • @myne00
      @myne00 Месяц назад +7

      Kinda back to the future.
      Ceramic is so 90s.

    • @mehow357
      @mehow357 Месяц назад +4

      The question is, when server & power-hungry solutions will move out from silicon... there are few prospective solutions on the horizon 🤔 15y?

    • @thereddog223
      @thereddog223 Месяц назад +1

      It will happen at some point

    • @mehow357
      @mehow357 Месяц назад +2

      @@grxwpr20725 not really, at the beggining it will be just an advancement and only with the time it will be more refined, ect. It will be like it was with silicone... Probably you don't remember the times when gates where measured in micro meters (with poor microscope or even naked eye you could see transitors), then hundreds then tens of nm, ect. - I do remember those times 🤣

  • @ProperScreenname
    @ProperScreenname Месяц назад +37

    Best videos, informative and in detail for non technical people!

    • @knofi7052
      @knofi7052 Месяц назад +3

      ...not only for non technical people!😉

  • @CarinaAdele5CA
    @CarinaAdele5CA Месяц назад +224

    The market trend can turn around very quickly. In fact, the indexes often switch from a bear market to a bull market when the news is at its worst and the mood of investors is at its lowest point. I read an article of people that grossed profits up to $150k during this crash, what are the best stocks to buy now or put on a watchlist?

    • @AmaliaGiselae8g
      @AmaliaGiselae8g Месяц назад

      In particular, amid inflation, investors should exercise caution when it comes to their exposure and new purchases. It is only feasible to get such high yields during a recession with the guidance of a qualified specialist or reliable counsel.

    • @ConradFriedrichj1z
      @ConradFriedrichj1z Месяц назад

      True, initially I wasn't quite impressed with my gains, opposed to my previous performances, I was doing so badly, figured I needed to diverssify into better assets, I touched base with a portfolio-advisor and that same year, I pulled a net gain of 550k...that's like 7times more than I average on my own.

    • @AlbrechtChristoph016
      @AlbrechtChristoph016 Месяц назад

      This aligns perfectly with my desire to organize my finances prior to retirement. Could you provide me with access to your advisor?

    • @ConradFriedrichj1z
      @ConradFriedrichj1z Месяц назад

      NICOLE ANASTASIA PLUMLEE’ is the licensed fiduciary I use. Just research the name. You’d find necessary details to work with a correspondence to set up an appointment.

    • @AlbrechtChristoph016
      @AlbrechtChristoph016 Месяц назад

      She appears to be well-educated and well-read. I ran an online search on her name and came across her website; thank you for sharing.

  • @skinthekat0530
    @skinthekat0530 Месяц назад +53

    I was part of a startup that built a multi-chip package with a silicon interposer containing pS transmission line interconnect. We had working prototypes but ran out of money before we could convince a packaging partner it could scale - in 1999.

    • @rattlehead999
      @rattlehead999 Месяц назад

      Yeah there are even books from the 90s about it. It's nothing new as a concept and design, just manufacturing.

    • @skinthekat0530
      @skinthekat0530 Месяц назад

      @@rattlehead999 yes, the devil is in the details of thermal mismatch with increasing power and shrinking dimensions.

  • @ahmedp8009
    @ahmedp8009 Месяц назад +3

    Wow!
    Explained better than many so-called tech channels.
    Thank you.

  • @jarjarcheng
    @jarjarcheng Месяц назад +8

    Glass or Glass ceramic substrate is expensive but can come close to the TCE of silicon while providing good electrical interconnect performance. We investigated that 25 years ago when designing Itanium MCM substrate in Intel.

  • @dchdch8290
    @dchdch8290 Месяц назад +12

    On point, technically accurate and informative. Thank you for your quality work.

  • @pouryaahmadi615
    @pouryaahmadi615 Месяц назад +13

    I wanted to say that there are many people on RUclips who talk about the big processor manufacturing companies, but few people go into it with your details and have high technical knowledge. Thank you very much for your channel 👍

  • @robertnatiello3814
    @robertnatiello3814 Месяц назад +8

    Wow very clearly presented - I understood this complex process with your very well done presentation.

  • @bdykes7316
    @bdykes7316 Месяц назад +17

    There is a saying in precision machining:
    On a small enough scale, everything becomes a thermal problem.

  • @JohnDontFollowMe
    @JohnDontFollowMe Месяц назад +4

    This explanation is superb! Keep it up and with love from the Netherlands!

  • @smartduck904
    @smartduck904 Месяц назад +34

    It reminds me of the Corpus Callosum that holds two hemispheres of the brain together these conections between both sides of the gpu

  • @bitegoatie
    @bitegoatie Месяц назад +2

    Congratulations on approaching the 200-level milestone for subscribers. With your growly voice and sharp insight into the tech world (especially chip development), you deserve the attention. Thanks for your efforts to keep us informed and thoughtful about the direction of this field.

  • @clintonelliott340
    @clintonelliott340 Месяц назад +18

    What is so interesting about this is that when inventing the light bulb they had the same issues around different expansion rates of the glass and metal…. Some things never change.

    • @cosmicraysshotsintothelight
      @cosmicraysshotsintothelight Месяц назад +1

      In making HV power supplies for some years, we found that "Stycast" potting material had very good electrical and thermal characteristics for the applications we were considering, but we soon found out that the stuff has a much higher thermal expansion rate than say, circuitry. So it was snapping components right off the board during thermal cycling. On the other end of the spectrum, RTV was what we ended up using. But it is soft and can detach from a surface and that means failure in an HV supply. So we had to prime those surfaces to insure adhesion. We did use the Stycast on some things, but we enhanced its thermal properties by mixing fiberglass fragments into it.

    • @aaronb8698
      @aaronb8698 Месяц назад +1

      In this case its not argon its graphene production cost Graphene's high thermal conductivity can help electronics cool more efficiently, with less temperature rise during operation, but its still to bloody expensive.

    • @kellymoses8566
      @kellymoses8566 Месяц назад +1

      The fact that concrete and steel have very similar rates of thermal expansion is why reinforced concrete is possible.

  • @Billwzw
    @Billwzw Месяц назад +7

    I'm sure FEA can model heat flows and thermal expansion very well - but everything has a tolerance. Maybe the micro connects are just too small. It seems like a solvable problem if the chips are slightly less ambitious in the sizing of the various elements. Thanks for explaining what's going on.

  • @fridaycaliforniaa236
    @fridaycaliforniaa236 Месяц назад +11

    This girl is hypnotic. And on top pf that her videos are very well made =)

    • @lilblackduc7312
      @lilblackduc7312 Месяц назад +3

      In my 66yrs, I've noticed that smart, attractive women can be very 'enchanting'...especially if they have something in common like Computer Science.

    • @VndNvwYvvSvv
      @VndNvwYvvSvv Месяц назад +1

      Simp

  • @TickerSymbolYOU
    @TickerSymbolYOU Месяц назад +4

    Great breakdown of what makes the 10 TB/s link between Blackwell dies so challenging. I wonder if there'll be a better packaging method for this link in the future or if the Rubin GPUs will go back to a 1-die design.

  • @mechamicro
    @mechamicro Месяц назад +1

    Thank you again Anastasi to deliver great news again about tech! Never skip a beat.

  • @ianbaxter9668
    @ianbaxter9668 Месяц назад +1

    I like the way you simplify complex topics. Also you are very easy on the eyes.

  • @Judeschwein88
    @Judeschwein88 Месяц назад +2

    WOW. Anastasi is so good at explaining!

  • @JoeBurnett
    @JoeBurnett Месяц назад +8

    Thank you for this explanation!

  • @MarkSeve
    @MarkSeve Месяц назад +3

    How was I not subscribed..... am now Anistasi.

  • @brn2bwild2001
    @brn2bwild2001 Месяц назад

    Love your videos...I was an ASIC Engineer through the 80s and 90s. It's absolutely fascinating to witness the evolution of semiconductor technology.

  • @jbinmd
    @jbinmd Месяц назад +13

    Maybe the single photomask changed the pads for attaching the silicon bridges to improve packaging yield?

  • @tonyitalia7798
    @tonyitalia7798 Месяц назад +1

    It's a great channel about technology.
    I subscribed and I appreciate the Portuguese subtitles.
    I'm from Brazil.

  • @chikuvyas7917
    @chikuvyas7917 Месяц назад +4

    Wonderfull!
    You make it so easy to understand
    Keep going👍👍

  • @416dl
    @416dl Месяц назад +1

    Very interesting. Chip design up until now has always seemed to proceed without much concern for geography. Distance seemed to relate only to speed but now we see that it has inherent qualities that cannot be ignored. I ran across similar problems years ago working in design for fused glass. Compatibility took on many forms. Cheers.

  • @dennissdigitaldump8619
    @dennissdigitaldump8619 Месяц назад +1

    I was a noise & thermal tech at Intel. I've always thought these bigger chips might just split themselves. I actually got a chip to split under a heatspreader with clever code. These were older smaller chips too.

  • @smartduck904
    @smartduck904 Месяц назад +6

    Thank you for these videos by the way always enjoy them

  • @SinisterSpatula
    @SinisterSpatula Месяц назад +5

    I absolutely love your videos. Thank you so much for continuing to make them. I find them fascinating and love the way you explain it to us 🥰

  • @R6ex
    @R6ex Месяц назад +2

    Nice, easy-to-understand video! 👍

  • @GULSHAN540
    @GULSHAN540 Месяц назад +1

    Interesting in-depth analysis of the GPU. Heat dissipation of the heat generated by the processor is quite challenging given the size of the GPU and the use of different materials. This also raises the question of reliability and this product's fault-free performance (durability, useful life, maintenance, etc.).

  • @JohnSmall314
    @JohnSmall314 Месяц назад +6

    Very interesting and well researched

  • @nusu5331
    @nusu5331 Месяц назад +2

    great explanation, thanks for your work!

  • @DaveEtchells
    @DaveEtchells Месяц назад +15

    I’m no packaging engineer, but as soon as I heard the word “organic” for the interposer I started wondering about problems with differing thermal coefficients.
    What I’m curious about is why would Nvidia and TSMC think they could make it work in the first place?
    Differences in thermal expansion rates are so fundamental that they must have thought they had some way of coping with them, either by coming up with a material for the interposer that magically has the same thermal coefficient as silicon, or by somehow limiting the thermal excursion with amazing heat sinking capability. - But 1,700 watts/chip TDP is going to get pretty warm almost no matter what you do. Even if you had some kind of active phase-change cooling, just the thermal resistance get the heat out of the package is going to result in a good bit of temperature rise.
    Does anyone in the comments have any ideas about or knowledge of advanced techniques or materials that would lead Nvidia and TSMC to think they could actually do this? It seems like a fool’s errand to me, to go away from a silicon interposer, but IANAPE (I am not a packaging engineer), so there may very well be things I’m not aware of.
    (Great vid as usual Anastasi, you did a great job of tracing the evolution and explaining the likely cause of the problems. Great thumbnail too 😂)

    • @paulsawyer9127
      @paulsawyer9127 Месяц назад +2

      My reaction is the same. What were they thinking? Its not just the coefficient of thermal expansion, but the different material must have different thermal conductivity.

    • @kazedcat
      @kazedcat Месяц назад +3

      It works on a smaller scale but with a larger chip the expansion is larger so the misalignment becomes a larger problem. The chip designer failed to factor expansion in their design and the fabricator failed to inform them that it will be an issue. These separate engineering teams are working in different companies so miscommunication is also an issue.

    • @DaveEtchells
      @DaveEtchells Месяц назад +3

      @@kazedcat That may be true, but TSMC has whole teams of engineers just working on packaging; thermal expansion is fundamental to everything they do.
      I guess it’s possible TSMC wasn’t involved in the multi chip packaging using the interposer, maybe it was just a PC board guy that designed it. Still, thermal expansion is such a _basic_ fact of engineering life, it’s hard to understand how they could have overlooked it.

    • @kazedcat
      @kazedcat Месяц назад +2

      @@DaveEtchells TSMC provides design rules but this design rules are base on some assumptions like the size of the package. If this size limitation is not communicated properly then the layout engineers in Nvidia could have followed the design rules not knowing that the rules are not valid to the packaging size they are designing.

    • @imaniwillis18
      @imaniwillis18 Месяц назад

      Other than altering the materials to react the same to heat, the only idea I have is to encase the chips in a rigid structure to prevent expansion and or have them under some amount of compressive stress to counteract deformation. But I'm not sure to what degree the expansion and contraction happens under max thermal stress so it most likely will just make it fail faster. Imagine it was that simple...

  • @JohnSmith762A11B
    @JohnSmith762A11B Месяц назад +8

    Explained this way, I'm surprised they ever build a working Blackwell GPU.😓

  • @garycard1826
    @garycard1826 Месяц назад +1

    Good video. Very well explained and understood. Thanks Anastasi.!

  • @melbar
    @melbar Месяц назад +1

    My idea would pre designing the assembly to work at a specific temperature, and making sure that during operation this temperature is held constant.

  • @ariesmarsexpress
    @ariesmarsexpress Месяц назад +4

    They need to preheat the entire thing to a set temperature slightly above what they expect the normal operating temperature will be and keep it there instead of allowing it to heat up on its own. This most likely will require being immerse in a liquid of some sort that can maintain higher temperatures. They may need to design it at those temperatures.

    • @melbar
      @melbar Месяц назад +1

      I just had the same idea ;-)

    • @hcfornwalt
      @hcfornwalt Месяц назад +1

      Like pretensioning concrete bridge sections. They might be able to get away with building it at some intermediate temperature, so it can tolerate shipping and the occasional cooldowns, but really do well if left running constantly.

  • @calvingrondahl1011
    @calvingrondahl1011 Месяц назад +4

    Thank you Anastasi for your professionalism on this AI technology. 🤖🖖🤖🇮🇹🇺🇸❤️

  • @dinarwali386
    @dinarwali386 Месяц назад

    Superb, I was doing research on it with a significant level of understanding the issue till this video popped up .

  • @PhilfreezeCH
    @PhilfreezeCH Месяц назад +20

    Cerebras: first time?
    I mean thats literally the big thing Cerebras solved with their wafer scale approach.

    • @MonsterSound.Bradley
      @MonsterSound.Bradley Месяц назад +1

      You're late.

    • @rabiatorthegreat6163
      @rabiatorthegreat6163 Месяц назад +3

      Having no defects at all on a wafer is quite unlikely. The larger any single chip gets, the more likely it is that it contains a defect. Hence, large chips have a worse yield and become more expensive per piece. A solution is dividing the design into smaller chips and mounting them to a common interposer.
      Cerebras did things differently: Their Wafer Scale Engine consists of many small processors and can tolerate the failure of a few processors. The WSE sort of routes around the damage.

  • @MediaCreators
    @MediaCreators Месяц назад +2

    Excellent explanation, Anastasia! Thank you. I am following the developments in this space closely. Silicon-based chip technology seems to be rapidly reaching its limits. I know that SMIC, in close cooperation with Huawei and several universities, is working feverishly on the development of photonic chips for AI training and inferencing. Size is not a limiting factor here. My assumption is that the world will be presented with a fully functional system out of China within the next 24 months that allows for the development and operation of LLMs at a fraction of the cost and power consumption of current Nvidia products like the H100 or B200. Jensen Huang is certainly aware of this fact, and so are many investors.

    • @clint_254
      @clint_254 Месяц назад +1

      True. IBM has been leading research on all-optical chips made of transistors which only use photons to switch on/off (not electric current). Promising nearly 1000x performance improvement and significant reduction in power consumption. IBM contributed significantly to the growth of the Chinese tech space.

  • @vladyslavkorenyak872
    @vladyslavkorenyak872 Месяц назад +1

    Next step is to use microfluidics based heat dissipation. Impregnate the substrate with thousands of capillaries and pump a steady current of some refrigerant through them.

  • @CrisIsBored
    @CrisIsBored Месяц назад

    very good, not alot of people can break down technology and explain it like this.

  • @rolandanderson1577
    @rolandanderson1577 Месяц назад

    Wow! I understood everything you said. And it's on substrates of computer chip manufacturing. Never thought I'd listen.

  • @thomaspahl9927
    @thomaspahl9927 Месяц назад +10

    1KW for a single chip? Our poor planet!!!

    • @clint_254
      @clint_254 Месяц назад +1

      😂 They are making nuclear reactors

    • @robinhoodhimself
      @robinhoodhimself Месяц назад +1

      Current AI by Sam Altman is mostly brute force. Bigger and bigger models. It's a beta. The sciences is not ready. Ylia know this. It's difficult to size the load. The current AI race to the cliff is a bonanza for nvidia and others. nvidia is a company specialized in seizing future marketing opportunity.

    • @anuardalhar6762
      @anuardalhar6762 Месяц назад

      GPU cum water kettle. Produce boiling water and steam as you play video games. Make tea and dinner as you play.

    • @jrwilliams4029
      @jrwilliams4029 Месяц назад +1

      We cannot sustain this flippant pursuit of this ASI boondoggle and these proposals for super clusters. . It will end badly from a water, food, or energy crisis or perhaps all 3 simultaneously i.e. a polycrisis.if humans don’t come to their senses.

    • @imconsequetau5275
      @imconsequetau5275 Месяц назад

      It could easily lead to higher prices for electrical generation and distribution.
      ​@@jrwilliams4029

  • @jackcoats4146
    @jackcoats4146 Месяц назад

    Thermal issues especially as going to multiple types of materials that work together is a huge issue. They have done well, but close doesn't count in mass production.

  • @SirMo
    @SirMo Месяц назад +9

    AMD is years ahead of Nvidia when it comes to chiplets. Nvidia is just now starting to use chiplets, while AMD has been using them for years.

    • @broose5240
      @broose5240 Месяц назад +5

      AMD has many patients doing this. Nvidia might need to buy from AMD

    • @rasmusnorberg13
      @rasmusnorberg13 Месяц назад

      So you're saying that Nvidia, who's "just starting" with chiplets are beating AMD at their own game? Sounds very good for Nvidia's future.

    • @SirMo
      @SirMo Месяц назад

      @@rasmusnorberg13 Nvidia is the incumbent, they started on the 3rd base. But their hardware efforts are being stunted by lack of a solid chiplet strategy. Blackwell has already hit one delay because of this. And next year mi350x will be on 3nm node, a year before Nvidia will have their 3nm solution.

  • @gator1984atcomcast
    @gator1984atcomcast Месяц назад +2

    Go with super conducting materials for connectors. Cryogenic will eliminate heat.

  • @ianuragaggarwal
    @ianuragaggarwal Месяц назад

    Interesting! I had watched launch event for Blackwell. Hopefully this manufacturing problem gets resolved.➡

  • @apollo-r5z
    @apollo-r5z Месяц назад

    Computer chips stacked and housed inside of pressurized metal, heat conductive gas cylinders with external fan cooled fins and external data bus connections through the cylinders may help cooling efficiency e.g. similar to heat pipes.

  • @DougPeters
    @DougPeters Месяц назад

    Gosh, I love your young voice. Thanks for all your coverage, but especially this one because I am invested in NVIDIA.

  • @micy9714
    @micy9714 Месяц назад

    To get around the thermal issues, they need to determine what the operating temp range is to avoid any permanent damage.. then design the water cooling technology to support it..

  • @dualokfonseca18
    @dualokfonseca18 Месяц назад

    Your channel is a gem. Thank you

  • @springwoodcottage4248
    @springwoodcottage4248 Месяц назад

    Clear, useful, interesting & all presented in by someone practically skilled & passionate in these exciting technologies. Thank you! The issue I struggle to understand is whether there is enough sellable product being produced by the buyers of Nvidia chips to support ongoing purchase from Nvidia at the current rate. There may be some new break through like Transformers that suddenly makes AI so useful that everyone must buy it, but of now AI has become commodity like, with much of the difference between the various offerings being the alignment with the philosophy of the designers rather than technical competence. A somewhat more extreme diversification than with web browsers at the beginning of the web & we know that many, like Netscape, did not survive. If we see a consolidation the intense pressure that has driven Nvidia sales may wane. Thank you for sharing!

  • @EverSpaceTime
    @EverSpaceTime Месяц назад +3

    Man I just got a 4060 and it pushes everything extremely well at like 115W max. The card is tiny. It just amazes me.

    • @pozytywniezakrecony151
      @pozytywniezakrecony151 Месяц назад

      People simply undervolt and underclock 3090 and such to get 30% or more lower energy use

  • @HeavenGuy
    @HeavenGuy 20 дней назад

    I always feel smarter after listening to you.

  • @thepom88
    @thepom88 Месяц назад

    Hi Anastasi, thanks for another great video. A quick question, do they anneal the wafers post-fab? I do understand the stresses between the different materials. Deformation, delamination, etc.... Surely, annealing could solve these problems, whether done post-fab or during each stage of fabrication. It doesn't matter whether it's a hammer or a photon hitting the material, it's going to bend.
    Also, where do you live? I want to steal your Cerebras Chip!😉 I want one, just to hang on the wall! It looks gorgeous!!!
    Love ya work! Take care! ❤

  • @Alexsandr-l8k
    @Alexsandr-l8k Месяц назад +1

    Anastasia, what do you think about Sohu? How realistic is this project from the technological standpoint?

  • @strictnonconformist7369
    @strictnonconformist7369 Месяц назад

    I hadn’t thought about all the other types of elements used in a die, but I figured it was likely a thermal mechanical expansion issue.
    But now they’ve got materials with different coefficients of expansion stacked on each other, with critical tolerances.
    Congratulations, nVidia, you’ve designed the world’s most complex and expensive bimetallic thermostat! Heats up, it likely opens, until it cools back down. Hopefully it starts working again.
    Their expected reach exceeded their actual grasp, it sounds like.

  • @imconsequetau5275
    @imconsequetau5275 Месяц назад

    Assembling packages at an elevated temperature midway between "room temperature" and peak operating temperature might both improve yield and reduce failure rates.

  • @theneverwas2835
    @theneverwas2835 Месяц назад

    You explain it very well for the layman to understand.

  • @Quickened1
    @Quickened1 9 дней назад

    The most incredible thing about all of this to me is, the level of precision they are able to achieve! It boggles the mind that anything could be soldered in place to within a few microns on all planes, on this planet... Even with robotics! The earth constantly vibrating, along with the expansion and contraction of things, it's seemingly the impossible... Then to do it consistently, what a feat! Fascinating...
    The technical data this incredible woman serves you, you can take it to the bank! What a beautiful brain!!! 👍🏻

    • @Quickened1
      @Quickened1 9 дней назад

      Just wait until robotics come online full scale. This will be the technology that powers the first interconnected robotic hive mind... The processing power will continue to progress until some new barrier is broken by ai. Which should lead to what it takes a building of GPUs to process now, to be handled by a processor that will fit in the palm of your hand...
      Fun stuff...
      Who knows beyond that?

  • @AdvantestInc
    @AdvantestInc Месяц назад

    The double-die architecture of the Blackwell GPU really shows how far we’ve come in chip design, but it also raises new challenges like thermal management. Exciting to think about where this will take AI workloads!

  • @juancarlospizarromendez3954
    @juancarlospizarromendez3954 Месяц назад

    For solving thermal troubles: more copper and more silver for lesser silicon. No gold because it is very expensive now. I believe that interconnecting substrates maybe unreliable when there is a micro-earthquake as the vibration of external sources.

  • @matthewzimmers1097
    @matthewzimmers1097 Месяц назад

    Subscribed. Love these videos.

  • @chrysalicechristopheranderson
    @chrysalicechristopheranderson Месяц назад

    Need to develop a solid-state converter of excess heat/dissipation into electricity to offset most of the chip power load... leading to solution to chip overheating problem...

  • @nicksanta
    @nicksanta Месяц назад

    There will be always those trying to push the envelope to get more and using existing tech. Generally. I like the trend towards lower temperature computers. There seems to be a lot of slop in large scale integration. This leaves much to be desired if accuracy is needed. Regards

  • @Emphasis213
    @Emphasis213 Месяц назад

    This reminds me of the packaging issues they had with nvidia chips in the xbox and PS3 that caused YLOD and a whole host of NVIDIA GPU issues in other devices back in the day.
    Theres a documentary on the nvidia chips on the ps3 on youtube that discusses it in great legnth.
    Manufacturing chips is a multi country edfort.
    I wonder how much it has to do with the current chip war and the havoc its bringing.

  • @sharonb.9128
    @sharonb.9128 Месяц назад +1

    Apple was rightfully praised for their efficient and innovative Ultra chip design. They figured out how to fuse multiple M chips together to be more powerful while requiring less energy.

  • @abdulshabazz8597
    @abdulshabazz8597 Месяц назад

    Attempting to put more logic on the same wafer to package components closer together to increase communication speed also increases thermals and reduces yield because a single glitch ruins the entire wafer. The only solution may be slower modular inter-chip interconnects, which nvidia has used in the past, and perhaps on separate wafers. ...Or we can jettison the electrical pathway design methodology altogether and switch to optical adoption for use in the deep inner core logic which does not suffer from these kinds of issues.

  • @SemiPolymath
    @SemiPolymath Месяц назад +1

    @AnastasiInTech 's video left me wondering two things--anyone have answers? 1) Even though different component coefficients of heating and expansion are almost certainly present on these huge chips, is there any evidence that they are a primary (or even significant) contributor the problems with NVIDIA'S GPU? (2) Even if NVIDIA increases its yield with a new die, won't the damage from heat-induced flexing take time to build up past the problems observed initially due to misalignment (if that is the problem, see question 1). What do you think?

  • @solidreactor
    @solidreactor Месяц назад

    I guess that you either want to go wafer scale as you mentioned or make much smaller chiplets to minimise the affect from the temperature related stress.
    If going with the chiplets design maybe having the substrate being cooled better could help, either by having dummy copper lanes for cooling purposes only or change the substrate material and its thermal properties.
    These are just some guesses, would be interesting to get some insights from this world, how the engineering solutions might look like.

    • @GameCookerUSRocks
      @GameCookerUSRocks Месяц назад

      Wouldn't that create more latency though?

    • @kazedcat
      @kazedcat Месяц назад +2

      They are developing a glass substrate to reduce the thermal expansion issue. Also you can mitigate the problem by designing a fewer and larger via.

  • @LAKEVILLEKONICA
    @LAKEVILLEKONICA Месяц назад +2

    Keep it cool. Greatly Enjoy the vids.

  • @BlankBrain
    @BlankBrain Месяц назад

    They probably need to use carbon nanotubes to connect chips to each other. But that would take a lot of development. When working with wood, you have to plan for seasonal expansion and contraction. I'm surprised chip engineers thought they could just slap some chips on a substrate without considering heat expansion and contraction. (I'm sure I must have misunderstood something.)

  • @Psychx_
    @Psychx_ Месяц назад

    The only thing that seems to be somewhat working at Intel is EMIB. Maybe Nvidia should package Blackwell there lol.

  • @lordcustard-smythe-smith9153
    @lordcustard-smythe-smith9153 Месяц назад

    If they do push this out, will be interesting to see how robust these products are against thermal damage. With Intel having problems with some of their CPU's are we getting to the point where longevity of a chip will become as important as raw speed.

  • @erictayet
    @erictayet Месяц назад

    So is this why AMD's MCM design for RDNA3 had performance issues?
    It seems like AMD was projecting a much larger performance uplift from RDNA2 but the Radeon 7000 series at the high end only had half of the promised performance.
    It is very hard to get information regarding the interconnect technology used in RDNA3 MCM design.

  • @darelvanderhoof6176
    @darelvanderhoof6176 Месяц назад

    They need to add a heater to maintain minimum temperature, and dynamically move the workload around to lower the temperature on hot spots. Or not.

  • @ronrouyer2069
    @ronrouyer2069 Месяц назад +5

    I think your spot on Ms. A. The infamous Coefficient of thermal expansion (CTE) mismatch is a pain in the a--. Fine analysis as usual. Concurrent engineer your process guys.

  • @leeloodog
    @leeloodog 24 дня назад

    I'm optimistic about the future of Nvidia too :) Understatement.

  • @arielINXS
    @arielINXS Месяц назад

    this mismatch of CTE is well known and should be addressed with uniform cooling but they will have tremendous difficulty to overcome this..

  • @ElectroOverlord
    @ElectroOverlord Месяц назад +2

    Been subscribed, love the content and have a crush.

  • @tiagotiagot
    @tiagotiagot Месяц назад

    Why don't they move to optical interconnects, hollow channels for lasers to go thru, between anything that's big/far enough that thermal expansion starts getting into play? Does the transducing process add too much lag? Can't make tiny enough laser emitters/sensors? Too costly?

  • @stefankoopmans2200
    @stefankoopmans2200 Месяц назад

    Oh so now It makes sense to me why the 5080 is essentially half the performance of the 5090, it's just a single chip, while the 5090 is more comparible to a dual GPU card, but now with both chips on one package and without the use of an internal SLI bridge. That is kinda funny since this is what the 90 series once stood for, they were often cards with dual GPU's, like the 690. Back in the day it was quite common for nVidia or AMD/ATI to offer a dual GPU card as their halo product. They still required use of SLI/crossfire methods to work, so performance was hit or miss. That 5090 is going to be mighty expansive... like 3000+ I wouldn't be surprised.

  • @gerrycrisostomo6571
    @gerrycrisostomo6571 Месяц назад

    Excess thermal buildup is indeed a challenge but that can be resolved. Do you remember the topic that you discussed earlier, the in-chip liquid cooling?

  • @darwinboor1300
    @darwinboor1300 Месяц назад

    Thanks Anastasi. Great NVidia engineering as is usually the case. They just failed to give Mother Nature enough credit and she through them a curve. I have confidence that they will find a way around her. It may be painful and could be suboptimal.

  • @cosmicraysshotsintothelight
    @cosmicraysshotsintothelight Месяц назад

    They should hook them together with "zebra strip" Yeah... that's the ticket! No, really... carbon nanotubes on a flexible film might remain attached above and below despite thermal shifts. They could mask and etch the nanotubes to be only where they want them to be. But they would act more like flexible wires than any firm mount would. The top and bottom remain connected while the thermals flex the film in the gap. So, maybe it only does 7 or 8 Tb/s instead of ten. What do you want, good grammar or good taste?

  • @paulharrison8379
    @paulharrison8379 Месяц назад +1

    NVidia should change their chip design to manufacture both GPUs and the inter GPU interconnect together on a single die. This will greatly reduce yield but at least then the die would work. This is the approach with the M2 Ultra chip from Apple.

  • @themarksmith
    @themarksmith Месяц назад

    Just subbed - great video!

  • @whisperingsquid5630
    @whisperingsquid5630 Месяц назад +1

    New chip manufacturing machine that is around the size of a shipping crate. Can build a warehouse and spam then in the available space. Then copy and paste the factory a few times and via la chips at scale.

  • @rakon8496
    @rakon8496 Месяц назад

    That means that only datacenter gpus are affected? Based on that "A" variant Chip and lacking that chiplet bridge... consumer chips could be realised on time?

  • @melchiorhof6557
    @melchiorhof6557 Месяц назад +1

    Can you make a video about the Intel microcode 0x129 problem of the 13th and 14th generation processors?

  • @moneypressoverdrive2020
    @moneypressoverdrive2020 Месяц назад +1

    yup- this is why all the finance bros were like why is the stock down- they have no idea what was going on. already nvidia canceld H100 basically

  • @sagetmaster4
    @sagetmaster4 Месяц назад +1

    If an acronym doesn't actually save any syllables it's not real

  • @henrycarlson7514
    @henrycarlson7514 Месяц назад

    So Wise , Thank You

  • @anata.one.1967
    @anata.one.1967 Месяц назад

    Well, what happens if this was used in a submerged system? Since the stress from heat sink mounts can accelerate this issue, a submerged system solution can help a lot, no?

  • @douglasengle2704
    @douglasengle2704 Месяц назад +1

    Thank you for your dedication to reporting and analyzing advancing electronics technology. Large gigawatt electrical power consumption is predicted for super large scale data centers. Since the electrical consumption is almost all due to generating heat as a resistive undesirable byproduct and cooling system to abate it, what is the possibly of new technology a decade from now or more being developed that does not have this heat generated byproduct or has it reduced to a millionth of what it is today greatly eliminating data center large scale electric power consumption?