Why Buying GPUs Is a Disaster

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 639

  • @kaizer-777
    @kaizer-777 23 часа назад +197

    We're not 32X faster than a 980ti today, so expecting us to advance that fast 10 years from now is more than optimistic. The only way this would happen is if we had a radical breakthrough and shifted to an entirely different way of manufacturing chips.

    • @no_mnom
      @no_mnom 21 час назад +36

      We can generate 32x the frames in 10 years :)

    • @hammerheadcorvette4
      @hammerheadcorvette4 20 часов назад +12

      Most of the articles and people kicking around those rumors are going off of the supposed help in architecture and Fabrication with AI. Having the AI build the most optimal path for the process and the taper is what is supposed to incrementally help . . . *Supposedly*

    • @TheKingOfSpain
      @TheKingOfSpain 18 часов назад +3

      For real, absolute moronic take.

    • @monad_tcp
      @monad_tcp 16 часов назад +7

      A 3060 would be perfectly fine for ALL inference in LLM if it only had 40GB of RAM. No need for stupid H100s. But nVidia themselves are scalping by not allowing your RAM controller to address more RAM, and I bet its locked, and the hardware can in fact address more RAM, just like it is locked for overclocking. (you think that wouldn't be the first thing I do when buying a graphics card, pick my hot solder station and replace the memory chips for bigger ones)

    • @Infiniti.151
      @Infiniti.151 15 часов назад +5

      32x fake frames maybe😂

  • @adissentingopinion848
    @adissentingopinion848 День назад +255

    You HAVE to talk to asianometry. He's the man to talk to about chip developments.

    • @nickreffner4574
      @nickreffner4574 День назад +23

      THIS! If you want to deep dive into the geopolitics and the future of CPU/GPU architecture, Asianometry is your guy.

    • @woofcaptain8212
      @woofcaptain8212 23 часа назад +3

      A Collab between these two would be wild

    • @petrkinkal1509
      @petrkinkal1509 23 часа назад +3

      This would be awesome.

    • @Neuroszima
      @Neuroszima 22 часа назад

      But did he do a face reveal?

    • @spicybaguette7706
      @spicybaguette7706 17 часов назад +3

      Or Ian Cutress

  • @RedSntDK
    @RedSntDK День назад +123

    crypto, covid and now llm.. us normal consumers just can't catch a break.

    • @__Brandon__
      @__Brandon__ 23 часа назад +6

      But they are producing so many chips. The second hand market goes crazy

    • @Blackmamba-ce3nb
      @Blackmamba-ce3nb 21 час назад +10

      Don’t forget about tariffs!

    • @yeetdeets
      @yeetdeets 10 часов назад

      Investment and progress is going nuts though. The newer cards are literally hitting the limits of currently known physics.

    • @Akio-fy7ep
      @Akio-fy7ep 9 часов назад

      Don't worry, civilization will have collapsed by 2035. It might happen this year; if not, it won't be for lack of certain people trying their hardest.

    • @shining_cross
      @shining_cross 6 часов назад +1

      ​@__Brandon__ most of the US chip factories are in Taiwan. So if China invades Taiwan, it will belong to China 🤣🤣

  • @JonLikesStats
    @JonLikesStats 17 часов назад +42

    I bought a 4090 RTX FE about a year ago not realizing I just bought an appreciating asset.

    • @addraemyr
      @addraemyr 2 часа назад +1

      I know, I was so hoping I’d be able to score a used one for cheap to replace my 3080 since all the whales would go for a 5090 but even the whales are backing out of this gen more than I thought

  • @ZeroUm_
    @ZeroUm_ День назад +154

    Whoever said 16-32x is out of their minds. We got only 6.31x since Titan X in 10 years(released in 2015), and the pace slowed way down.

    • @Swiftriverrunning
      @Swiftriverrunning День назад +6

      But the level of investment in manufacturing and development has increased by almost unimaginable levels, especially in the last two years. Ten years ago, NVIDIA's market cap was 11B and GPUs were a rounding error in global semiconductors. NVIDIA's market cap is now 3000B. The amount of money pouring into this space is wild. R&D at NVIDIA has increased over 10x in that period and that doesn't even take into account TSMC and every startup in the world working on AI hardware. It's getting harder to shrink transistors but the effort going into improving the process in increasing at ever faster rates. I don't know if we'll get 16x but progress is coming.

    • @Fiercesoulking
      @Fiercesoulking День назад +1

      The only way to do 16x is by linking 4 cards and going from the current N4 to a N1 process

    • @llothar68
      @llothar68 23 часа назад +7

      @@Swiftriverrunning Market cap has nothing to do with the money that a company has on their hands fior R&D. Often it does not even matter to the company at all. You don't understand what stocks are. How many stocks has NVIDIA sold in the last 5 years? And i don't mean employees from NVIDIA

    • @kjell744
      @kjell744 23 часа назад +8

      @@Swiftriverrunning More investment have a strong diminishing return on the speed of improvement.

    • @Veptis
      @Veptis 23 часа назад

      2x every two generations. Which isn't yearly. More like 2.3 years per gen.

  • @nict2
    @nict2 День назад +154

    I see Casey, I click. Always an awesome conversation.
    Thanks for this video, this was exactly what I have been thinking about right now.

    • @notionSlave
      @notionSlave 17 часов назад

      why doesnt he include his name in the description? anyways these two are worthless retards.

  • @SakshamG7
    @SakshamG7 День назад +248

    As a GPU owner, I approve this message

    • @EnterpriseKnight
      @EnterpriseKnight День назад +5

      which kidney did you sell?

    • @呀咧呀咧
      @呀咧呀咧 22 часа назад +1

      @@EnterpriseKnightboth ☠️

    • @SakshamG7
      @SakshamG7 22 часа назад +1

      @@EnterpriseKnight why not both?

    • @br4252
      @br4252 20 часов назад

      But u have one…

  • @HylianEvil
    @HylianEvil День назад +135

    Holding onto my 2070 till I see how this shakes out

    • @Cahnisama
      @Cahnisama День назад +11

      Bought a 7800xt last month, from a 1060. It is pretty good value/benefit imo

    • @faiz697
      @faiz697 День назад

      What are you doing with that ?

    • @DaveSheeks
      @DaveSheeks День назад +5

      @@Cahnisama I went from a 1060 to a 3060.

    • @ralnivar
      @ralnivar День назад +1

      Was running 1080ti until winter 2023 :)

    • @steelpanther88
      @steelpanther88 День назад +1

      This was me. I had 2070s and skipped entire 3000 gen. Got 4090 at discount last year at spring time. Cant believe cards are scalped again like prev gen cards

  • @Endelin
    @Endelin День назад +90

    Gamers Nexus put out a video about how they couldn't find a 5090 on day one. Truly a wild market.

    • @sokrates297
      @sokrates297 3 часа назад

      There's no market if the market never existed to being with

  • @SirSomnolent
    @SirSomnolent День назад +133

    "I just want to pay a $300-$500 more and have 48GB vram" nope. impossible.

    • @rkan2
      @rkan2 23 часа назад +9

      I can understand the GPU processing gates limitation relative to price, but more memory lanes and chips should be cheaper in comparison...

    • @RobBCactive
      @RobBCactive 22 часа назад +3

      GDDR is 32bits wide, moving from 2GB to 3GB modules which ARE coming in the next year or 2.
      Nvidia may have problems because they moved to GDDR7 which only Samsung supply currently.
      The lower end cards are going to need more VRAM, while bandwidth is improved by large caches.

    • @RobBCactive
      @RobBCactive 22 часа назад +4

      ​​@@rkan2they aren't, the 5090 is so big because all of the outside is driving i/o memory controllers at 512bits there's no room left.
      VRAM is organised differently from the DDR modules used with CPU to maximise bandwidth.

    • @rkan2
      @rkan2 21 час назад +1

      @@RobBCactive I can understand the 5090 being a bin of the datacenter SKUs that have more RAM or less defects, but surely you could still at least double the amount of RAM by limiting the processing performance..

    • @xwizardx007
      @xwizardx007 20 часов назад +1

      ​@@rkan2 have you seen the 5090 pcb?
      its fucking full no more space
      they can make it 48gb if they REALLY want by using 16 3gb gddr7 instead of 16 2gb chips
      but 5080 chip could have been larger with 2-3 more chips they just didnt want to give you 20-24 gb of vram for 1000$ this gen

  • @wsippel
    @wsippel День назад +126

    If you mostly care about LLM inference, and especially if you're on Linux, AMD is perfectly fine. Ollama works, llama.cpp works, VLLM works. Performance is pretty good, and you get a lot of VRAM for cheap. Things only really get hairy (sometimes) when Pytorch enters the picture. Also, current AMD cards don't support FP8 and FP4, which is a bit of a problem for image generation, but doesn't really matter for LLMs. I believe the 9070 will introduce FP8 support at least, but only has 16GB VRAM. That said, the upcoming Ryzen AI Max 395 might be a very interesting option for LLM inference, with 128GB unified RAM and a much wider memory bus than previous APUs.

    • @alc5440
      @alc5440 День назад +10

      I'm by no means a power user and I've only ever wanted to do inference but AMD ROCm has always worked fine for me.

    • @zenko4187
      @zenko4187 День назад +7

      This is solid advice (for LLM) its a pain and a half to work with AMD GPUs for image gen (Flux, Stable Diffusion, etc)

    • @Arcidi225
      @Arcidi225 День назад +7

      Didn't somebody run 408B llama on 8 mining AMD GPUs at 50 tokens/s?
      To be honest, the more I look into that stuff AMD cards make sense for inference, cheap and high vram.

    • @comradepeter87
      @comradepeter87 День назад +8

      I have an AMD GPU and I'm on Linux. I tried to apt install AMD ROCm, it asked for 50GB worth of library downloads 💀. Tried to push ahead anyways, and ended being bottlenecked on space in my root partition :(

    • @leucome
      @leucome День назад

      ​@@comradepeter87 Find where it put the files then make a symlink to an other drive. It might also be possible to only install Rocm runtime. It is a way smaller but some software may need the full dev if they to compile stuff. Anyway I use symlink a lot to keep my most used model/checkpoint on the NVME drive while offloading everything else on the Sata drive. And the temporary download files are also sent to a HDD with a symlink to avoid filling my home SSD with temporary trash.

  • @meppeorga
    @meppeorga День назад +73

    I see chat spamming "Naive" / "Just you wait" to Casey's comment how we are at a point where we can barely push these new GPUs further, how dumb can people be.
    People are hearing this from a veteran game developer who has some of the greatest insight into these things and don't believe him.
    We are living at a time where you can have like a 1080 ti a goddamned 8 year old GPU and it can still compete with the lower tier of current generation of graphics cards (and it was also not that expensive at release btw, at least before the crypto boom).
    There's a reason Nvidia is pushing AI and software hard, because they know current rate of hardware improvement is ass, Moore's Law died a while ago, it's not early 2000s anymore.

    • @gggggggggghhhhoost
      @gggggggggghhhhoost День назад +8

      Moore's law is "dead" because of nvidia and/or tsmc monopoly. You can only innovate so far with a single brain. The world needs more fab from other countries

    • @richardnpaul_mob
      @richardnpaul_mob День назад +4

      The reticle limit is a thing, but then so are chiplet designs, both from AMD and Nvidia, even though only AMD have sold such cards as gaming GPUs.
      Nvidia didn't go for N3 this gen and stuck with N5 derivative node N4. So Nvidia are holding back, they could have gone further and didn't. N2 is just about to release and there are pathways to 18, 16 and 14 Angstrom nodes so whilst some aspects of chips are not really scaling anymore there's more than enough room for logic to keep shrinking and so GPUs to get more powerful in the next 10 years

    • @adissentingopinion848
      @adissentingopinion848 День назад

      ​​@@gggggggggghhhhoost my dawg, finfet and gate all around are literally scraping the bottom of the barrel. Silicon wafers don't have the atomic radii for your precious electrons not to escape your increasingly delicate gates. We're relying on asml here, not even tsmc! You don't even know what euv is! There isn't any nanometers below 9nm, it's all marketing!!!

    • @DustinShort
      @DustinShort День назад +15

      @@gggggggggghhhhoost the monopoly for sure doesn't help the situation but Casey is right about physics. A silicon atom has a diameter of 0.2 nanometers and our best process nodes are right around 2nm. We only have 10 atoms to play with between traces and at that level. At that scale everything from simple optics (diffraction) to quantum mechanics like tunneling become limiting factors. At 4nm, a single atom out of place is within a 10% (+-5%) manufacturing tolerance while at 2nm you need a 20% margin of error.
      Until we have tech that individually place atoms, lithography process improvements are dramatically slowing down the closer we get. I also didn't even talk about die size growth and how that affects yields. AMD does chiplet design which helps mitigate yield defects, but they are currently not that competative at the top end and the stranglehold of CUDA adoption does hurt them as well.

    • @adammontgomery7980
      @adammontgomery7980 День назад +4

      I don't know enough to claim that Moore's law is dead, but we are at some physical limits with chip production. Most people claiming /naive probably don't understand any of the manufacturing challenges. I mean, they already can't use optical lenses because the EUV light won't pass through glass.

  • @windwalkerrangerdm
    @windwalkerrangerdm День назад +14

    if they can't produce enough of those chips, all they have to do is to activate mfgx4 to interpolate between two existing chips and it'll all be fine...

  • @hamzagoesgym
    @hamzagoesgym День назад +14

    Pulled the trigger on a 7900 XTX, cause I can't keep waiting forever. Nvidia only leaving crumbs to consumers.

  • @davibelo
    @davibelo День назад +24

    Prime, if you feel bad for asking for a 5090.. Ask for a H100 to use at home. Will be a 1st YT content about it being using at home 😂 and I want to see this content

  • @hb-hr1nh
    @hb-hr1nh День назад +7

    I like the people saying the 4090 will be the 5th fastest videocard after all the 50 series are out and think its a own. It quite didn't work out that way.

  • @Fan_of_Ado
    @Fan_of_Ado День назад +64

    I just bought a RX 570 (2017). Maybe I'll get a 3090 in 10 years time...

    • @javierflores09
      @javierflores09 День назад +4

      should just get a rx 6600, it is $190 at the cheapest right now (or could splurge 50 bucks more for the XT, but the Arc B570 is the better choice at that price range), ~50% more performance than the card you have right now at a relatively good price

    • @deefeeeeefeeeeeeeeee
      @deefeeeeefeeeeeeeeee День назад +6

      Just stay on AMD a 7700 XT is 400 USD and is way than enough for most users in 1080p and 1440p

    • @javierflores09
      @javierflores09 23 часа назад +1

      @ that is still a very decent card, can play GTA V with very high settings at 1080p with 60+ fps and Cyberpunk with low settings at 1080p with 50-60 fps (can squeeze some more with FSR). People tend to forget about the older cards since everyone wants to have the latest shiny thing, however these cards still got a lot of potential, especially if you don't plan on playing the latest, most demanding AAA games.
      Though I don't know how well that one is going to perform if you used it for mining haha

    • @Fan_of_Ado
      @Fan_of_Ado 23 часа назад +1

      @ I got the RX570 for $40 and the games I play aren't really that intensive

    • @thesenamesaretaken
      @thesenamesaretaken 19 часов назад

      ​@@Definesleepaltyeah, you can use them to play videogames, who knew

  • @soulextracter
    @soulextracter День назад +12

    I'm so happy that I don't need a beefy GPU. Mine is like a decade old or something.

  • @Kartman-w6q
    @Kartman-w6q 23 часа назад +5

    26:38 bear in mind that combining architectures (Ampere and Ada) might give unexpected edge cases. most often it will result in either disabled Ada features (best case) or, depending on what you're doing, simply refuse to combine the vram

  • @davidding8814
    @davidding8814 20 часов назад +9

    There's a one word explanation for this phenomenon: MONOPOLY. ASML is a monopoly, which has little incentive to boost production and reduce sales prices. The high price/scarcity in turn raises the barrier of entry for chip manufacturers, resulting in TSMC being almost a monopoly with just a little more competition and a little more incentive to reduce scarcity/prices. That in turn makes Nvidia just a little less of a monopoly for the same reasons. The AI companies and their investors have been hoping that the same concept will make them monopoly/oligopolies, which is why the Deepseek advancements tanked stock prices.

    • @SPeeSimon
      @SPeeSimon 15 часов назад +1

      ASML does have competitors. Except not for their High-end machines that are being used to create those chips. They paid a high price to create a machine to create chips using EUV, which nog gives them a competitive edge.
      Just like nVidia also has a competitor in AMD and Intel. Except also there AMD has given up on the high-end chip and Intel is just beginning (again).
      So for now, we must wait until the AI hype is over. Just like for the beginning of the 4090 release we had to do with GPU usage for blockchain.

  • @leoSaunders
    @leoSaunders 2 часа назад +1

    5090 costs ~€3,800 - €5,000 in Europe
    5080 €1,300 - €2,300
    Europe only has prices after taxes

  • @trietang2304
    @trietang2304 День назад +13

    Remember the crypto bubble that make me unable to buy a dream pc I save up

  • @NytronX
    @NytronX День назад +21

    Prime, I'll sell you my RTX 4090 for like $2.3k. Would ship from MN. Excellent condition, never used for mining or AI.

    • @krunkey
      @krunkey 23 часа назад +3

      Get this to the top

    • @flipwonderland
      @flipwonderland 10 часов назад +1

      I just bought one new for 1.8k :0

  • @HamsterHearthstone
    @HamsterHearthstone 23 часа назад +6

    Prime, just give it 1-3 months, you'll be able to get a 5090 by then if you're faster than a snail checking out at an online retailer.

  • @jasontang6725
    @jasontang6725 19 часов назад +4

    I bought my RTX 4090's two years ago when they came out, and now they are somehow worth 30% - 50% more than I paid for them. Wild times. I feel your disillusionment, prime.

  • @yuu-kun3461
    @yuu-kun3461 День назад +227

    "Why Buying GPUs Is a Disaster". Sorry.

    • @_plaha_
      @_plaha_ День назад +6

      Good job, A.

    • @Shywizz
      @Shywizz День назад +20

      Good job 47, Fall back to base.

    • @kirabee4134
      @kirabee4134 День назад +7

      I tought for a sec that "buying gpus" was a new category, as opposed to "gaming gpus" 😂

    • @GerardoScript
      @GerardoScript День назад +6

      As a non-native English speaker, I could understand that he did it on purpose, why couldn't some others?

    • @theondono
      @theondono День назад +3

      Prime is biting the forbidden apple of rage bait

  • @Dani_el_Duck
    @Dani_el_Duck День назад +37

    Just wait for the Chinese GPUs to prosper and the US will suddenly have a lot of chips

    • @Psikeadelic
      @Psikeadelic 20 часов назад

      so another 5 years + then

    • @nakashimakatsuyuki4077
      @nakashimakatsuyuki4077 18 часов назад +1

      ​@@Psikeadelic More like 2 years, if not 1. I have solid sources.

    • @Kwazzaaap
      @Kwazzaaap 18 часов назад +2

      There are GPUs in China you can buy that are ~1080 performance, for more than a year now. They struggle with driver support and aren't really viable commercially, but supporting AI applications with software is a lot easier than supporting all of gaming. China's bottleneck remains whether TSMC is allowed to take orders from China on the latest node or not.

    • @pr0newbie
      @pr0newbie 14 часов назад

      ​@@Kwazzaaapharbin uni had an euv lithography breakthrough so more like 3 years. We dont need chinese gaming GPUs, all we need are AI ones to slash nvidia's margins and make gaming attractive again.

    • @suminshizzles6951
      @suminshizzles6951 5 часов назад

      I agree but the tariffs will hurt. The 5090 shortage is a manufactured event. The chinese are at least 10 years behind. Withe economic espionage then could shorten that gap. And they are trying the espionage route.

  • @peterprokop
    @peterprokop День назад +17

    deepseek-r1:70b runs fine on a 64 GB M3 MacBook, at around 30 characters/second output, using ollama.
    To run the full deepseek r1 model, you will need 800GB of memory, to train it 1.5TB. You can use a few big CPUs with 128-256 cores. It will be slow, but it will work. Otherwise you need like 10 GPUs with 80GB Memory, or 20 with 48GB to run your model. First one might draw up to 5kW of power, 2nd up to 10kW. Thats $1-2 per hour in power alone 24-50 per pday, 1500-3000 per month. Double that if you want to train your model.

    • @llothar68
      @llothar68 23 часа назад +3

      Well i can get a 1.5TB RAM server with 44 cores for around $3000 at the moment.

    • @mryellow6918
      @mryellow6918 17 часов назад

      ​@@llothar68bro how?

  • @mattbillenstein
    @mattbillenstein 23 часа назад +4

    Database Mart - you can rent a cloud machine there for like $450/mo with an A6000 - I think you can even get an A100 for under $1k/mo. Or LambdaLabs - rent by the hour.

  • @ErazerPT
    @ErazerPT 23 часа назад +4

    Welcome to budgeting hell, have a nice stay. And be REALLY conscious about the VRAM aspect. The moment your model can't fit, you have different parts moving at different speeds, and the fast side will always be waiting on the slow side. Rough comparison, memory cache vs on disk. Thus, you have to account for what the speed up on GPU side means compared to the slow down on the "non GPU" side means.
    Ah well, me still waiting on the 5050, assuming it comes out, 5060 if not. It's a blessing when your models can fit on "small stuff" because they're not trying to be everything for everyone.

    • @monkemode8128
      @monkemode8128 12 часов назад

      CAN'T U JUST PUT IT IN MAIN SYSTEM MEMORY AND IF U CAN LOAD IN THE PARAMETERS YOU NEED FASTER THAN THE GPU CAN PROCESS THEM YOU'RE GOOD RIGHT? (I'm being fr)

  • @markdatton1348
    @markdatton1348 День назад +20

    Unfortunately, the most cost effective option for someone not running a continuous service is to rent space on the cloud...

    • @christianferrario
      @christianferrario 22 часа назад +2

      That’s not really ungortunate, it’s exactly why the cloud was born

    • @markdatton1348
      @markdatton1348 19 часов назад +9

      @@christianferrario It IS unfortunate if the whole goal was to run these models offline

    • @Sub0x-x40
      @Sub0x-x40 16 часов назад

      @@markdatton1348 well it is offline but just on someone else system lol

    • @christianferrario
      @christianferrario 7 часов назад

      @@markdatton1348 Yeah, but it depends why, if the goal is to run it offline to avoid using their chatbot to preserve data leaks, you can still do so by using your own cloud space, if it was to use it without a network connection, yes you have to pay for your own machine, unlucky.

  • @hastyscorpion
    @hastyscorpion 20 часов назад +7

    Dude all the people in the chats going “ITS THE SCALPERS” are so clueless. If you magically snapped your fingers and made scalping impossible, that wouldn't make magically more cards be available. They just aren’t making enough cards. You wouldn’t get one either way.

    • @Psikeadelic
      @Psikeadelic 20 часов назад

      apparently

    • @GregoryShtevensh
      @GregoryShtevensh 15 часов назад

      Not to mention, scalping thrives on supply and demand. This wouldn't be an issue if you were always able to go to the store and get one at msrp. Scalpers will always exist, but they're only successful when supply falls short of demand.

    • @monkemode8128
      @monkemode8128 11 часов назад

      I'd much rather have a lottery system at MSRP than high prices. Although I don't think fighting supply/demand like that will work (at least not without high costs and being intrusive - like suing everyone, implementing hardware limitations, background checks, and monitoring individual customers - that's not gonna happen, and even if it did it might just add barriers and raise the prices more).

  • @frentsamuel7533
    @frentsamuel7533 18 часов назад +3

    Gtx 980 (165W) was released in 2014, and it scores 11110 benchmark points on passmark,
    Gtx 5080 (360W) was released in 2025, and it scores 37287 benchmark points on passmark,
    Basically you have a x3,34 improvement on this specific software benchmark.
    I believe that's not even fair because Gtx 5080 consumes x2.19 more power than Gtx 980.
    It was a bigger jump between 8800GTX released in 2006 and gtx 1080 released in 2016 , basically x26,9 better the newest one. I tend to agree that, if nothing special will not be discovered, this way of building GPUs will not bring much improvement.

    • @Kwazzaaap
      @Kwazzaaap 17 часов назад +3

      Comparing models of that age difference has caveats as they will be 3x in some aspects and 10x in others but your point stands that it is nowhere near 16x or 32x. AI hypers and Nvidia fanboys will just keep lying for free until the end of time though.

    • @sh_chef92
      @sh_chef92 2 часа назад

      In raster gaming its more like 6x plus more performance. Not even talking about RT performance which can leverage something like optix for path traced 3d renders etc. with much better boost than 6x. And lastly the tensor cores AI performance that is another world and it's uncomparable. Also games can leverage tensor cores now so performance difference is multiplied by double digit numbers.

  • @DeltaV64
    @DeltaV64 День назад +7

    To add to the AMD side, AMD on windows also works pretty good. Never really ran into torch directml issues, and ollama itself runs nicely. XTX is such an underrated card.

    • @sh_chef92
      @sh_chef92 2 часа назад

      Now considering enormous HW accelerated AI power, transformer model DLSS features, RT performance, CUDA and OptiX and many more that Nvidia cards have, 7900xtx compared to rtx 4080 is much worse value. Basically its outdated already.

  • @itmecube
    @itmecube 17 часов назад +15

    Tariffs are only going to inflate the costs of GPUs. It's about to get a lot worse.

    • @britneyfreek
      @britneyfreek 14 часов назад +5

      which will just crash this ridiculous hype train. ppl need to get grounded.

  • @Grubse
    @Grubse 18 часов назад +2

    Hey @prime, also, I hear you saying you want to not only run the models (inference) but train models. Training models require more VRAM then running them for inference. If a 16B model takes 24GB, then for training you’d need about 100 GB VRAM. This is because in training you also need to store the gradients for back propagation.

    • @Grubse
      @Grubse 18 часов назад

      Mostly an FYI in case you didnt know

  • @Tenmar.
    @Tenmar. 14 часов назад +1

    If I recall, the reviewers get a review unit which they have to send back after. While it is true that some of the big youtubers do get GPUs like that for free (see the video game industry), it's mainly due to network connections, getting into the big club, and years of being towing the line.

  • @HandsOC
    @HandsOC 20 часов назад +5

    The market needs a 48gb $5k Titan to relieve some of the datacenter market pressure off the 5090

    • @4m470
      @4m470 19 часов назад +6

      That doesn't benefit Nvidia. They can overprice both Datacenters and Prosumers with their current strategy.

  • @wwllsswtrustong
    @wwllsswtrustong 6 часов назад +1

    The 3090s used Samsung’s custom 8nm (8N) process for its GA102 die, packing 28 billion transistors. While powerful, this node was less efficient than TSMC’s alternatives, leading to higher power draw and thermal output. What are you even talking about, bro?

  • @NegraLi34
    @NegraLi34 21 час назад +2

    The root of the problem seems to be TSMC. It's not like we had this shortage problem happen right now, this shi has been going on for years and TSMC seems either unable or unwilling to scale up production. At these premium prices you would expect competition to prosper, but we are going in the opposite direction. I still don't understand how billions of USD can't reproduce whatever they are doing there.

    • @LubosMudrak
      @LubosMudrak 20 часов назад +1

      Because money is not enough and you need EXTREMELY competent people to do it right.

  • @dataolle
    @dataolle День назад +10

    For hardware config maybe a collab with Wendell from L1techs would be cool.

    • @LtdJorge
      @LtdJorge 23 часа назад +2

      Yes, fully agree

    • @102728
      @102728 17 часов назад +1

      I kinda wanna see wendell and casey nerding out on a call

  • @hanes2
    @hanes2 21 час назад +3

    Sad they never could get the SLI work properly since the professional cards, with NVLink you’re just stacking more. 6gpus working together as one big GPU.

    • @taylor-worthington
      @taylor-worthington 21 час назад +1

      Yeah, honestly you would think this would be built into the OS (or standard drivers I guess) by now, and SLI would have been a temporary bandaid.

    • @mryellow6918
      @mryellow6918 17 часов назад +1

      They could, it worked but the method they used was to put it on the developer to integrate. They have more than enough staff and money to make a functional version now but they didn't like how you could get 2 cheaper cards and beat the flagship for cheaper. They are 100% gonna move to chiplets making any form of newer sli kinda pointless now

    • @mryellow6918
      @mryellow6918 17 часов назад +1

      ​@@taylor-worthingtonthats what dx12 tried to do but nobody cared.

  • @AI-xi4jk
    @AI-xi4jk 57 минут назад +1

    I’ve tried running DeepSeek with ollama on RTX 6000 Ada. 32B param model takes about 20+GB vram on the GPU, so should fit on 3090/4090. 70B model takes like 43GB and although fits on my GPU it’s quite slow - really depending on a question. Don’t ask “loaded” questions. I haven’t tried to optimise the models so those are out of the box as is. I’d say 5090 will be much more future proof, however still might be limited by its memory. Obviously if you use unified memory to let model spill over to RAM performance will suffer like in swapping scenario. Hope that helps someone. TBH I was impressed by DeepSeek at first but now kind of disillusioned. I’ve got some better answers from ChatGPT and Claude on some C++ libav programming. But maybe the model is not trained much on that.

  • @victorcadillogutierrez7282
    @victorcadillogutierrez7282 13 часов назад

    Well, it's called 1.58 bit quatization because a model is rounded to ternary weights instead FP32, FP16, or whatever and the new weights have only {-1,0,1} elements, this reduces the multiplications of matrices complexity to a binary operation level in LLMs, 1.58 bit comes from 2^(1.58) =~ 3 , and 3 is the said ternary weights quantization. Prime you are considering on having many apple's you can also parallelize many NVIDIA 3090, it's really hard to get a 4090, or just wait and try to buy them over time. You can also parallelize different NVIDIA brands as long it runs in CUDA with auxiliar pytorch libraries.

  • @timisa58
    @timisa58 21 час назад +1

    The 'racket' is the pricing for less performance. Not necessary the current limitations of the tech. I learned about the idea that lower chips are actually more defective chips. It is nuts.

    • @thesenamesaretaken
      @thesenamesaretaken 9 часов назад

      What's nuts about it? You have a factory that spits out products. Some of them have more defects and some have fewer. Are you suggesting only keeping the perfect few and throwing the rest into landfill?

  • @mairex3803
    @mairex3803 День назад +3

    The only things you should look at are:
    VRAM (nothing matters if you can not fit the model)
    Tensor core precision support. You really want BF16 since you keep the exponent size of fp32 with half the cost. Ampere and newer support this. Working with lower precision is annoying if you want to do it yourself. You have to do a lot of work to maintain stability and accuracy.

    • @Sl15555
      @Sl15555 7 часов назад

      VRAMis most important. and the mother board and cpu needs to have enough PCIe lane support. alot of mainboards only support 1 full PCIe lane while the remaining ports are nerfed.

  • @Ray-gs7dd
    @Ray-gs7dd День назад +19

    I litterally started programming Fortran because of your guy's rant lol. I love whenever you two start cookin' on stream

    • @Henrik_Holst
      @Henrik_Holst День назад +4

      I hope you limit every string to 100 chars ;)

    • @dgo4490
      @dgo4490 День назад +5

      With an ever increasing amount of transformers, fortran might be back in business!

    • @victorgabr
      @victorgabr 23 часа назад +1

      ​@@Henrik_Holst yep, the plot twist is that "benchmark" was done using LLM slop and the Fortran code seemed faster because it limited the char size to 100. Lol😂

  • @methane1027
    @methane1027 17 часов назад +1

    Yep. It isn't that they're selling all their Lemonade to one customer because it's easier for them no. It's actually WAY WAY WORSE.
    They're selling us what they call "Lemonade" and maybe at some point in the past (GTX1080) It was still mostly Lemonade, but now they've been slipping in so much of this synthetic oil to the product because they want to mainly sell it as lubricant to their giant Corporate Oligarchs at a lower cost and get all the gamers to pay for it still. That's what's essentially going on here.
    Poisoning us, while charging larger premiums, so giant Corporations can profit even MORE from our labour.
    Welcome to late stage Capitalism, baby. It's only going to get worse from here.
    They haven't been making graphics chips in quite some time. They're only pretending to.

  • @TitelSinistrel
    @TitelSinistrel 5 часов назад

    Until 2 years ago I worked in HPC and I can tell you that there are 2 classes of cards in the "enterprisy" category. There are the RTX A(Ampere/3000) series, which replaced the qudro cards built around being pun into workstations, have their own fans/coolers are more consumer friendly, and stuff like the A10/A40/A400 class passive cooled cards that go into server chassis. At the base, they are pretty much the same thing as the consumer cards of similar class but with double the VRAM, same or better TDPs, higher VRAM bandwidth. They perform almost identically to the consumer card. The A40 or RTX6000 is in margin of error to the 3090 for this use case with the difference that the 3090 uses a lot more power.

  • @charlesscholton5252
    @charlesscholton5252 17 часов назад +1

    I am wanting to explore the use of Project Digits unit over setting up old server hardware loaded with GPUs.

  • @real_krissetto
    @real_krissetto 22 часа назад +1

    @ThePrimeTime Going for the beefiest single gpu you can get is probably the most satisfying setup right now, especially compared to using just 2 or 3 gpus in total (and not many more).
    Data transfer rate between the cards during inference puts a pretty hard cap on tokens/sec when the model is spread on multiple gpus, with the main benefit being you can run bigger models without parts of the model going into ram.
    If you can fit a model entirely in one gpus vram, then you can really see them fly on modern gpus.

    • @defeqel6537
      @defeqel6537 10 часов назад

      Strix Halo could also be good setup, seeing how it can address quite a lot of memory

  • @complexity5545
    @complexity5545 15 часов назад +1

    I made 2 A.I. builds in 2022. All server parts and welding and 3d printing and fans.
    This is disheartening: I want a 5090, but the situation is not a smart investment.

  • @dataolle
    @dataolle День назад +6

    Public cloud gpu instances perhaps?

  • @Unordinary-lg4yt
    @Unordinary-lg4yt 22 часа назад +1

    So not only are people punching air over AI, they’re punching air over pricing and availability - despite the fact this overlap doesn’t even care (so they claim anyway).

  • @jonton6981
    @jonton6981 День назад +2

    Mac mini route should be fine for your use cases. You probably only need RAG for the doc search / coding anyway. Finetuning without a sufficient dataset often only hurts performance.

  • @cristian91re
    @cristian91re 7 часов назад

    We need competition, from producing chips to GPU/cpu

  • @ZDoherty
    @ZDoherty 23 часа назад +1

    1:30 NVIDIA calls this Speed of Light, it’s a company value

  • @sakchais
    @sakchais 15 часов назад +1

    Prime if you read this what I recommend is get a cheap x99 motherboard + cpu and minimum 128 GB ram and 3090 card.
    Setup linux + CUDA run podman / lxd and setup Ollama + open webui
    You’ll be able to do pretty much exactly what you want without finetuning. Or if you want to experiment with finetuning you can do that too.
    I’d be happy to walk you through my setup and help you get up and running.
    I spent about 3800 USD on my ai rig.

    • @dalmighty5568
      @dalmighty5568 3 часа назад

      This is what I did and at my current level of learning, it's more than enough. Training on small models isn't bad at all, Inference is very usable on FP4 and FP8, Training on billions of parameters will be painful if I were to guess.

  • @kalasmournrex1470
    @kalasmournrex1470 День назад +1

    I’d expect the gpus to go to 3d layouts. Since they are emabarassingly parallel, it’s the perfect case for 3d layouts.

    • @LtdJorge
      @LtdJorge 23 часа назад +6

      Not exactly, because that reduces thermal transfer so much. What AMD is doing is putting their 3D V-cache below the CPUs, and that could be done for the GPUs too. But right now, if you stacked cores vertically, they would cook themselves.

  • @Talic29
    @Talic29 15 часов назад

    I am at least 500% more likely to click a prime video when I see Casey. Prime you're great, but Casey is a GOD.

  • @BaronCAD
    @BaronCAD 5 часов назад

    There is a product called SCALE (a compiler toolkit) that is library compatible with cuda. It creates rocm (AMD GPU) linked binaries with almost no source code changes to a project that normally would use cuda directly. So instead of requiring these incredibly scarce blackwell chips, you can buy twice the navi31 cards (with 24GB vram), and end up in basically the same ballpark.The gddr6 vs. 7 will be a slight performance downgrade, but the price per unit of compute is WAY lower. As for PCI lanes, go with an EPYC server board with hundreds of lanes (vs. 24 usable for a regular AM5 cpu), so you can put a bunch of gpus in one box.

  • @lLvupKitchen
    @lLvupKitchen 3 часа назад

    Putting tariff on TSMC is absurd when the US don't have a competing product. Basically the quantity of the imported GPUs will remain the same since the big techs can't get enough of them, and price will rise because of the tariff, but TSMC is not paying for that, the US companies will.

  • @Gerry484
    @Gerry484 6 часов назад +1

    You know what we are not pushing enough? GAME OPTIMIZATION! Its a joke...

  • @aufkeinsten7883
    @aufkeinsten7883 5 часов назад

    Bro use your status, nobody here will be mad at you. You earned it. You're not doing anything nefarious with it and neither are you part of the GPU shortage problem because you get a single one.

  • @SPeeSimon
    @SPeeSimon 15 часов назад

    I would suggest to use a cloud solution where you rent a GPU (cluster), do your work and end it (to save costs). And for local development only use your current GPU (or update that to the best available). You don't have to process a big AI/LLM while streaming. Just use a small one.
    Or use a full spec macbook, where the RAM is also usable by the GPU because of their chip design.

  • @ZeroInDaHouse
    @ZeroInDaHouse 20 часов назад +1

    It is time chip manufacturing went open source as well. Chips have become ubiquitous and putting a break on technology because of 'mine' mentality did have a nice run but is no longer going to cut it for the future. ASML can lead the way or go bust when others do go full open source on their chip making tech so much so that we will end up in an era of at home chip fabrication akin to a 3d printer anyone can have at home.

    • @SPeeSimon
      @SPeeSimon 15 часов назад +1

      I have seen a YT channel that created a chip in his garage. The basic principle is not that difficult. Only the small scale of a commercial chip makes it so difficult. You cannot be an atom off or you have a failing chip. That's not possible for DIY.

    • @JonJon69420
      @JonJon69420 6 часов назад

      do it then, the knowledge is out there, get a degree if needed, and opensource your findings/process, you can be the pioneer

  • @thekingofallblogs
    @thekingofallblogs 23 часа назад +1

    I bought a 3090 some time ago for $1500. Seemed like it was way over priced at the time, but i wanted to do AI dev. Works well enough I don't feel need to upgrade.

  • @adammontgomery7980
    @adammontgomery7980 День назад +1

    Somebody commented "here comes a 10 minute answer" when Casey started his explanation of CPU sockets. Must have been a zoomer. Where's the respect?

  • @danielhoover5169
    @danielhoover5169 19 часов назад +1

    You may be very interested in the tests done on AMD GPUs with the DeepSeek models. The 7900 XTX outperforms the 4090 on the 14G distilled R1 and all smaller distills, and barely loses for 32G.

  • @Telopead
    @Telopead День назад +4

    Id honestly go for amd 7900xtx, if im just trying run smaller models.
    If im going for deepseek r1 671B, cheapest way is somewhere between Mac Studio or some retired server parts with huge numbers of ram.
    Gpus are too expensive and hard to get rn

  • @TheStanglehold
    @TheStanglehold 23 часа назад

    There is a market open for anyone that just figures out how to min max this decision making. Instead of selling via specs of the gpu, motherboard, etc. just sell based on the model size you can run on a rig and the tokens per second.

    • @krunkey
      @krunkey 23 часа назад

      This, good catch

  • @andrewevanyshyn1709
    @andrewevanyshyn1709 15 часов назад

    Hey Editor, is it possible to add the date the clip was recorded in one of the corners at the beginning of the video?

  • @TheEconomicElder
    @TheEconomicElder 7 часов назад

    De Beers withheld diamonds in the 1800s. Its no longer the case that there is manufactured scarcity FYI

  • @slebetman
    @slebetman 16 часов назад

    They won’t manufacture the 3090 because of the Sinclair lesson: don’t compete with your own product or you will end up with massive inventory you cannot move

  • @zhe2en171
    @zhe2en171 21 час назад

    My hot take / understanding (please corect if I'm wrong) - the fact that you can do 4bit and below (there are even 2-bit quantizations !!!) suggests that the current LLM architectures are oversized in terms of parameters for their compute ability. I think that if the neurons were "saturated", nearly any further quantization should significantly degrade the model's output.

  • @lokisinary_play
    @lokisinary_play 23 часа назад +1

    Things are expensive until we do not have options. And all these intel and nvidia or amd are taking advantage of these things. If we have multiple options for cpu and gpu these tin cans would have been under price not overpriced. And also we see major innovation each years.

  • @lilpepe545
    @lilpepe545 23 часа назад +7

    The definition of a monopoly. Government has to step in and divide NVIDIA up.

    • @LtdJorge
      @LtdJorge 23 часа назад +6

      no

    • @Neuroszima
      @Neuroszima 22 часа назад +2

      @@LtdJorge yes. But the correct answer is they will never do that

    • @warasilawombat
      @warasilawombat 22 часа назад

      Not really. There are alternatives but they aren’t as good. You’re absolutely allowed to do that.
      What they step in for would be anti-competitive behavior. Hard to say if they meet that bar.

    • @MrKlarthums
      @MrKlarthums 22 часа назад +1

      The problem is partially manufacturing capacity. There's no way TSMC can accommodate demand at this point, let alone allow for a competitive market. The other problem is that companies are using traditional gpu compute rather than ASICs. nVidia GPU prices will drop like a rock once some company figures out how to build a competitive AI-focused chip, cut costs from not needing 3D graphics support, cut costs by keeping traditional compute hardware external, and transpile CUDA (at least in some capacity) for adoption. It must be a very hard problem as this has been a needed area for about 15 years when scientific computing needed cheaper, more scalable alternatives to supercomputing clusters with thousands of traditional Intel/AMD CPU cores.

  • @cricketbatman
    @cricketbatman 14 часов назад +1

    Never been happier I forked out MSRP for a 4090 over a year+ ago when I found one in stock. Big OOF

    • @Sl15555
      @Sl15555 6 часов назад +1

      same, i thought the supply issue was over. guess not.

  • @piotrkmiec6590
    @piotrkmiec6590 20 часов назад

    Who's the other guy though, no mention in the description. Ouch!

  • @DavidGM94
    @DavidGM94 21 час назад

    Which stream is this one? Which date? I would like to watch the whole vod

  • @Quaquaquaqua
    @Quaquaquaqua 5 часов назад

    Why is it so hard to create a graphic card competitor. If one factory /process was built why can’t we build 2 or 100?

  • @krilektahn8861
    @krilektahn8861 17 часов назад +5

    And that's why deepseek was such a shake up. Proved that good AI models don't need cuda. And if you don't need cuda, it's *much cheaper* to run AI.

  • @CytreenSpiegel
    @CytreenSpiegel 11 часов назад

    It’s going to get worse, they recently had a 6.4 magnitude earthquake. It will take them a bit to recalibrate the machines in the fab.

  • @mt1104uk
    @mt1104uk 21 час назад +1

    I'd just wait for the digits platform personally.

    • @Sl15555
      @Sl15555 6 часов назад

      good point, but i bet they are gonna be hard to get as well.

  • @c2454
    @c2454 21 час назад +1

    It's possible to run PyTorch code on Apple Metal API and I believe AMD ROCm as well. You just need to set PyTorch device to 'mps' for Apple instead of 'cuda'.

  • @paulwary
    @paulwary 16 часов назад

    So where is the market for used data centre chips?

  • @CaptTerrific
    @CaptTerrific День назад +1

    3090s are going for $1600!? Geez, almost makes me want to sell my dual 4090s

    • @jwr6796
      @jwr6796 День назад +1

      Or half of them...

  • @MasamuneX
    @MasamuneX 16 часов назад

    asianometry would absolutely blow your mind with the depth of the process

  • @gapspt
    @gapspt День назад +3

    @ThePrimeTime the 4090 is available right now in Amazon Spain for example, for just under 3000€.

  • @petedoyle
    @petedoyle 17 часов назад

    Could probably use lambda labs (or similar) and figure out a way to easily spin an instance up/down (Terraform / OpenTofu?). Might be more interesting for watchers, too, since it's hard to drop $2-3k on a machine when just starting to experiment.

  • @mattmmilli8287
    @mattmmilli8287 10 часов назад

    oml prime I am doing that now about docs on my MacBook 64gigs of ram and using(now) distilled models. You can just start exploring creating agents that add to a RAG for that. Is still fun if you want to build out a crawler/agent to ingest, summarize and then add it to RAG that goes recursively through the site/docs

  • @jouniosmala9921
    @jouniosmala9921 10 часов назад

    The best thing you can hope for if you don't go for a threadripper in terms of socket is 8 lanes to two different GPU:s on a highend motherboard. The screw up would be going to a motherboard where second socket goes through the chipset and has 4 slow lanes. Another issue is does the second GPU physically fit in the case motherboard combo, and the support racket for the GPU weight not getting blocked. I decided to stop using my desktop with two users at same time because my primary GPU couldn't handle its own weight without support bracket too well, and secondary GPU socket blocked using the support bracket. Instead of getting tower getting a horizontal case could have solved that issue.
    But issues I had with getting physically 1650 with 2070super model that uses 2080ti cooler, makes me think you would have serious problems if you don't think it through to fit two of your 3090's or 4090's or 5090's in your system.
    Ideal probably would be a threadripper, with horizontal case. But still it would be tight fit to get multiple cards inside one PC.

  • @rydmerlin
    @rydmerlin День назад +2

    Did you watch Digital Spaceports videos? Ask the twitter guy you interviewed how to obtain the GPUs ... He had one in that video. Also, how many GPUs is Nvidia going to sell due to DeepSeek R1?

    • @Kwazzaaap
      @Kwazzaaap 17 часов назад

      It's gonna sell more, because now every company that can do a 30-200k investment into a local AI assistant will. You no longer have to worry about your trade secrets leaking.

  • @vicaya
    @vicaya 16 часов назад

    You can rent an H200 (140GB VRAM, 256GB RAM) for ~$3/hr, if you know where to look. Unless you're gonna run them 24/7, there is no good reason to buy any of the RTX cards mentioned here.

  • @Unordinary-lg4yt
    @Unordinary-lg4yt 22 часа назад +2

    People don’t understand economies of scale and continue to yap about “1080ti” price to performance or older gen as if production, engineering, and R&D cost scales linearly. Nearly everything is described by a hyperbolic function, not a linear function.

    • @mryellow6918
      @mryellow6918 17 часов назад +1

      Doesn't take much rnd to add more vram for almost no extra price.

  • @TimmyBlumberg
    @TimmyBlumberg 20 часов назад

    If you are trying to train models, renting 8xA100 or 8xA6000 is pretty cheap. Then you can just turn them off when you aren’t training anymore. You will end up spending less money almost guaranteed.

  • @velorama-x
    @velorama-x 18 часов назад

    "What happens if you start selling GPUs in capitalism? For a long time: nothing. Then GPUs are getting scarce." SCNR.

  • @lokisigmatron6536
    @lokisigmatron6536 3 часа назад

    We are slamming I to the limits of Moore's Law with CPU and GPUs is my opinion.

  • @BanditZA
    @BanditZA День назад

    You need to look into problems with “specialisation” of a model (fine tuning) for a domain.
    Catastrophic forgetting hasn’t been solved.

  • @edwardallenthree
    @edwardallenthree 18 часов назад +1

    It used to be crypto. Now it's AI. What will we spin GPU cycles on next that is worthless?