6 Best Consumer GPUs For Local LLMs and AI Software in Late 2024

Поделиться
HTML-код
  • Опубликовано: 12 янв 2025

Комментарии • 191

  • @sorinalexandrucirstea1994
    @sorinalexandrucirstea1994 3 месяца назад +84

    "hello, let's cut right to the chase..."
    * immediately thumbs up *

    • @tyunpeters3170
      @tyunpeters3170 2 месяца назад +6

      Still takes half the video to talk about the best GPUs for running LLMs

    • @AbdoZaInsert
      @AbdoZaInsert Месяц назад

      loooool I just did the same

  • @thelaughingmanofficial
    @thelaughingmanofficial 2 месяца назад +38

    I'm running a 7900XTX with ROCm on Windows 11 and not having much issues running local AI's. Currently the only thing holding me back is the AI devs not paying any attention to ROCm. But ZLUDA makes that a small issue.

    • @InstantNameOfficial
      @InstantNameOfficial 2 месяца назад +5

      I use zluda for all my ai stuff aswell, image gens, text gens, audio gens. I love zluda

    • @shahabemam5264
      @shahabemam5264 Месяц назад +2

      Thank's Man

    • @mkchong2824
      @mkchong2824 25 дней назад +1

      hi there, do u mind to share what cpu are you using to match with the gpu? and what is your RAM storage ya? I am planning to upgrade my 8GB rx 6600 to 7900XT / XTX. Just not sure which CPU is good to go with . Thanks

    • @ESGamingCentral
      @ESGamingCentral 18 дней назад

      @@InstantNameOfficial slow

    • @MrDoBo95
      @MrDoBo95 15 дней назад

      @@mkchong2824my 2600x is even fast enough for that.
      I'm gonna upgrade to a 5900x or 5950x some day as I'm running a full scale server on the little guy but it works perfectly fine. You really don't need more power in cpu.

  • @patriot0971
    @patriot0971 2 месяца назад +16

    I bought an Intel ARC A770 new with 16GB VRAM for $280. I am running local LLMs on my computer and support for the hardware is increasing. Most of the new AI tools support it, or are in process of adding support. AMD cards are also a good choice here in this situation.

  • @CrypticEnigma23
    @CrypticEnigma23 4 месяца назад +35

    3090 all day every day and cost effective . 24gb vram 384nit bus lane and if youre advantageous connect NVLINK on multiple 3090 and you have a super juiced cluster AI beast.

    • @snickle1980
      @snickle1980 3 месяца назад +2

      Im planning on grabbing a 24gb juicer of some kind this year. Is there Anything i need to be worried about heat or power-wise on the 3090?
      I remember there was some issue at the time related to a cord/connector or heat.

    • @ph8808
      @ph8808 Месяц назад +1

      @@snickle1980 that was with the 4090. Out of the box 3090s consume high wattage but with a nice underclock you can get up to 1900 MHz with up to 100W less power

    • @ESGamingCentral
      @ESGamingCentral 18 дней назад

      is a russian rouletter finding those old cards in good quality

  • @luisff7030
    @luisff7030 5 месяцев назад +44

    The RTX 4070 TI Super is close to the performance of the 4080, and cost 200€ less.

    • @jnx4803
      @jnx4803 3 месяца назад +6

      I was also eyeballing 4070ti super with 16gb VRAM.
      Pretty nice for AI related tasks and you don't go fully bankrupt.

    • @fabriziop8044
      @fabriziop8044 29 дней назад

      @@jnx4803I choose to buy this gpu

  • @patrickflanigan3903
    @patrickflanigan3903 9 дней назад +2

    Currently using the AMD 7900 XTX for the meta-llama/Meta-Llama-3.1-8B-Instruct.

  • @GraveUypo
    @GraveUypo 3 месяца назад +13

    if you do it on linux, there's a huge speed bonus compared to windows. LLM inference runs 50 to 100% faster on linux than windows for me for some reason.

    • @cfbmoo1
      @cfbmoo1 3 месяца назад +3

      I've noticed that for games a lot of times. Even Windows games running under Wine/Proton sometimes run better in Linux then on the native platform. Mileage will vary obviously.

  • @simonhill6267
    @simonhill6267 2 месяца назад +25

    The AMD cards also run ollama and llm studio just fine. You get way more VRAM for your dollars.

    • @dakshbhardwaj7481
      @dakshbhardwaj7481 2 месяца назад +3

      😂😂

    • @dakshbhardwaj7481
      @dakshbhardwaj7481 2 месяца назад +2

      What about cuda and tensors and rt cores😂

    • @bewaterbewater
      @bewaterbewater Месяц назад

      @@dakshbhardwaj7481 ZLUDA and ROCm

    • @brunojr9
      @brunojr9 Месяц назад

      @@dakshbhardwaj7481it has support for ROCm and uses the AMD cores…

  • @realityos
    @realityos 4 месяца назад +18

    Apple Silicon may not be as fast, but has ~120G VRAM usable. Works very well with ollama.

    • @practeff
      @practeff 2 месяца назад +4

      Also significantly lower power draw. For people who live where electricity is more expensive, this is a major consideration considering how power hungry the PC options are.

    • @tahreezzmurdifin52
      @tahreezzmurdifin52 2 месяца назад +1

      which one

    • @MuhammadFahreza
      @MuhammadFahreza Месяц назад +2

      what do you mean about 120GB Vram? are they using the ssd?

    • @realityos
      @realityos Месяц назад +6

      @@MuhammadFahreza You can spec out Macs with up to 192G unified RAM and its usable for AI models.

    • @johnne86sd
      @johnne86sd Месяц назад +1

      That's why I'm considering a Mac for the first time ever. M chip unified memory allows to share the system memory with the GPU. This makes it an easier option to just buy a Mac with the necessary amount needed to run an LLM that's around 7B-14B parameters and load in memory. I don't necessarily need my GPU to be tailored for gaming, so another reason why I'm leaning Mac. Also M chips draw a lot less power compared to a beefy Windows gaming desktop with an RTX card that has 16+GBs of VRAM. I just want to be able to run at least a 14B model, so I'm leaning towards a 24 or 32GB Mac. The earlier M1 Macs Studio also decked out with lots of memory for a reasonable price. Besides memory, I also don't know the proper GPU cores to go with a Mac. I'm curious what inference speeds are with a Mac vs a comparable RTX card. Be nice to see videos comparing.

  • @joyflowmonger248
    @joyflowmonger248 Месяц назад +1

    Thank you for summing it all p so quickly and succinctly! Great video! Exactly what I needed!

  • @dhrumil5977
    @dhrumil5977 4 месяца назад +41

    What abt 4060 ti 16 gb?

    • @boulate
      @boulate 3 месяца назад +4

      Same question here 😊

    • @jubinha
      @jubinha 3 месяца назад

      I got a normal 4060, besides being a bit slower than what would be "decent ChatGPT-like speeds" and the issue with context window filling up that he mentioned, it works fine for it, so I think a 4060ti 16gb will be good enough.
      Not GREAT, but it'll work well enough and without headaches for 90% of people, just don't expect 128k context windows at the same speed, 70b models or insane output speeds

    • @Marisueksu
      @Marisueksu 3 месяца назад +3

      Running an RTX4060ti 16gb here, ollama's fine. definitely much more headroom for stable diffusion and llm stuff with VRAM alone. (was previously on rtx2070 8gb and rtx3060 8gb)

    • @maglat
      @maglat 2 месяца назад +2

      @@Marisueksu you are talking about 8b ollama i guess, right?

    • @Marisueksu
      @Marisueksu 2 месяца назад +2

      @@maglat can do a lil more than 8B unquantized. 13B quantized is doable too.

  • @kickheavy8982
    @kickheavy8982 3 месяца назад +8

    Thank you for being straight and to the point. Very concise video. Most videos regarding AI usually have long intros and tries to get you to buy stuff before providing any value. So you have earned your like sir.

  • @konstantinlozev2272
    @konstantinlozev2272 3 месяца назад +12

    No mention of RTX 4060Ti 16GB?
    Also, even a GTX 1070 8GB has some capabilities and goes for around USD 100 second hand.
    Also, setting the partial layer offloading on your own is really preferable and gives best results if your model cannot fit into VRAM.
    Another option if you want to play with APIs for free is Google AI Studio and the Gemini Flash 1.5 API.

    • @alanslicegarcia5066
      @alanslicegarcia5066 2 месяца назад +5

      I would recomend 4060 Ti 16gb. For
      A: Price to ram capacity.
      B: Lower power draw.
      C: If you are going to run 2 of them. It already runs at PCI-e 4.0x8. And you won't see any preformance loss when gaming, if you game too. Since most are dual slots you can fit them in a normal case.
      The only down side is that they don't have the fast bit-bus. But for normal folk it's more than fine.

  • @sridhartn83
    @sridhartn83 2 месяца назад +4

    I am casual LLM user, using ollama openwebui and flowise RAG chatbots for smaller LLM models, 4070 12GB performs decently, didn't want to spend too much on GPU for my hobby spending.

    • @nosuchthing8
      @nosuchthing8 Месяц назад +1

      Good to know, thanks!

    • @sridhartn83
      @sridhartn83 Месяц назад +1

      @nosuchthing8 if any model that doesnt fit in GPU vram, it goes straight to system RAM and processed by CPU for the spill over data. So for large models larger GPU vRAM is crucial.

  • @MrDoBo95
    @MrDoBo95 15 дней назад +1

    Idk.. I'm running big llama 3.1 8b q8, qwen2.5, mistral nemo 12b q4km, even Gemma 27 fully as a try, which didn't go in usable speed but worked fine with my 3200 ddr4 ram in addition to 12gb vram,
    Llama vision llms and homey for my Hass for detection, flux schnell with text encoding in comfyui
    And imagine: it all runs on my 6700xt, which to this day did not make a single error or fail. The only little thing to do with the 6000 gen is to add the HSA override 10.3.0 as it's not recognised automatically. It's not that hard to add --HSA_OVERRIDE_GFX_VERSION=10.3.0 in a command line.
    I upgraded from a 6600 as 8gb just wasn't quite it, especially for the image creation but the 6700xt is a dream.
    I'm either getting a second one or go with a 7900gre additionally with 20gbs of ram.
    Stop creating dumb shit with bullshit claims about amd.
    The mi300x flagship is a lot better than the NVIDIA h100, they just became the standard too early.

  • @timmygilbert4102
    @timmygilbert4102 Месяц назад +3

    Wouldn't integrated GPU had access to ram directly, making the vram size and bandwidth issue moot?

  • @nortonwedge
    @nortonwedge 12 дней назад

    I snagged an open box 3090 from amazon for $1300 CAD (usually2500-3k), paired it with a 7900x3D for $500 (usually $700), with 64GB 6400 ram. Been running various image and video models on comfyui, works pretty well running large batches.

  • @alx8439
    @alx8439 3 месяца назад +32

    AMD is now supported quite well. Whatever posts you were highlighting in your video to back your point are 1-2 years old

    • @Matlockization
      @Matlockization 3 месяца назад +4

      Yes, people still go around thinking AMD is lightly supported when it is well-supported.

    • @mug786
      @mug786 3 месяца назад +1

      proof?

    • @Matlockization
      @Matlockization 3 месяца назад

      @@mug786 ruclips.net/video/VXHryjPu52k/видео.html

    • @Matlockization
      @Matlockization 3 месяца назад +2

      @mug786 ruclips.net/video/VXHryjPu52k/видео.html

    • @凯哥思游记
      @凯哥思游记 3 месяца назад +3

      working but not well

  • @clajmate69
    @clajmate69 2 месяца назад +2

    i instantly hit subs because you dont take my time for those messy intro others have you go to the point im thinking of 3060 too since its the only gpu on my price range so thanks for this vid

  • @HaraldEngels
    @HaraldEngels 3 месяца назад +3

    For large language models RYZEN AI MAX will be the answer. Up to 96 GB RAM (from max. 128GB ) can be assigned to the GPU.

    • @alanslicegarcia5066
      @alanslicegarcia5066 2 месяца назад +2

      Rams speed is the issue really. Normal PC ram is slow compared to a GPU's and even Apple's Unified memory.

    • @gwramim4807
      @gwramim4807 5 дней назад

      It has more bandwidth then m3 pro ( by using 256gb bus to connect the ram and also maybe it's a soc)and there is also Infinity cache to help more​@@alanslicegarcia5066

  • @isbestlizard
    @isbestlizard 5 месяцев назад +43

    All I want is a wAIfu with an infinite context window that listens and responds 24/7 and will be my forever friend :>

    • @kalak4083
      @kalak4083 5 месяцев назад +11

      this is the ultimate dream

    • @DIYKolka
      @DIYKolka 4 месяца назад +8

      If your wAifu dont need to biy smart is a gtx1080ti enough :D

    • @carvierdotdev
      @carvierdotdev 2 месяца назад +1

      @@DIYKolka I have a gtx1080ti and so far so good, I don't think I change it anytime soon

  • @doesthingswithcomputers
    @doesthingswithcomputers День назад

    You can set the layers to offload from gpu to other devices…

  • @Shpinog
    @Shpinog Месяц назад +1

    Lol. Budget variant is p102-100 x4. Only 200$ for all 4 pieces, that's 40gb vram, if you don't use render (aka stable difusion, since multi-card support there is pretty crude), if you only use llvm, a rack with 4 p102-100 is the cheapest option. You also save on heating house, you can say you make money from it.

  • @darkman237
    @darkman237 3 месяца назад +5

    I would have thought the 3090 would be cheaper but, OH NO! The prices are INSANE!!!! Maybe after the 5000 series comes out?

    • @mytechantics
      @mytechantics  3 месяца назад

      I'm still waiting to grab one used, but yeah, the prices are still less than ideal. Hopefully it changes a bit after the 5th gen drops on the market.

    • @FuZZbaLLbee
      @FuZZbaLLbee Месяц назад +1

      Ended up getting a second hand 3090 for €600

    • @darkman237
      @darkman237 Месяц назад

      @@FuZZbaLLbee Ebay or where?

    • @FuZZbaLLbee
      @FuZZbaLLbee Месяц назад

      @ Dutch EBay called Marktplaats. It is acutely owned by EBay

    • @FuZZbaLLbee
      @FuZZbaLLbee 12 дней назад

      @@darkman237 Dutch second hand site

  • @Larimuss
    @Larimuss 2 месяца назад +2

    Anyone ever tried 2x 2080ti sli ? For diffusion models? It would be like $200 22gb vram.

  • @thedarkglovemusic
    @thedarkglovemusic 26 дней назад

    Great information

  • @RandomPickles
    @RandomPickles 3 месяца назад +2

    You dont need SLI for LLMs. You can still pair ram off the cards with no sli cable. At least from what I have been doing and everything I have read so far. SLI is only worth it in gaming the last time I checked.

    • @GraveUypo
      @GraveUypo 3 месяца назад +2

      sli is dead and isn't worth it anywhere

    • @RandomPickles
      @RandomPickles 3 месяца назад +1

      @GraveUypo kind of my point. "Last time I checked" was like a decade ago.

  • @JarppaGuru
    @JarppaGuru 7 дней назад

    what eva you run next year new model is better to take your money bcoz second is so long time

  • @TazzSmk
    @TazzSmk 4 месяца назад +3

    what about 4070 Ti Super (16GB) and 4060 Ti Super (16GB) ??

    • @mytechantics
      @mytechantics  4 месяца назад +10

      As a rule of thumb, the Ti/Super Ti versions are most of the time much more powerful than the base GPU versions, and it's often best to compare their benchmark performance with the card that is one model number higher from the Ti/Super card you want to choose (ex. comparing a 4070 Super ti to a 4080). If you can get your hands on the 4070 Ti Super and you can work with 16GB of VRAM, in my opinion this is a pretty good pick.

  • @Graybeard_
    @Graybeard_ 10 дней назад

    4:54 you said the RTX 4070Ti was 12gb vram, but it is 16gb. What about AI Accelerator cards? Asus makes one that goes in a pcie slot. Will these pair/work together with the GPU cards to increase productivity/workflow?

    • @Olds79Starfire
      @Olds79Starfire 10 дней назад

      No it's not. The Ti super has 16gb

    • @Graybeard_
      @Graybeard_ 10 дней назад

      @@Olds79Starfire He was mousing over the TI super when he said 12gb. You cab see 16gb in the listing he is mousing over which is why I put a timestamp in my comment. Try it.

  • @tsizzle
    @tsizzle 2 месяца назад +1

    What can be done on a mobile rtx 4090 with 16GB vram?

  • @SocialNetwooky
    @SocialNetwooky Месяц назад

    there is another option you missed : RTX 3090, 24GB non-ti . Costs between 650 and 1200 euros on ebay.

  • @JoeVSvolcano
    @JoeVSvolcano 3 месяца назад +3

    Excited for the future. 32GB RTX5090

    • @berkuth
      @berkuth 2 месяца назад

      i think 5090 will have less vram than 4090 and then they sell another hardware for ia

    • @JoeVSvolcano
      @JoeVSvolcano 2 месяца назад +2

      ​@@berkuth Nvidia's RTX 5090 will reportedly include 32GB of VRAM.

    • @alanslicegarcia5066
      @alanslicegarcia5066 2 месяца назад +1

      @@JoeVSvolcano Only down side. What will it cost?

  • @MrWhiskey1
    @MrWhiskey1 Месяц назад

    But... Why are we not discussing "SLI"? Because it'd make a LOT of sense to me with zero background on it to just pick up like 4 3060's, which would be double the vram for half the cost of a 4090?

  • @radu1006
    @radu1006 3 месяца назад +4

    Llama supports Amd.

  • @thebrainfan
    @thebrainfan 17 дней назад

    Great video, unfortunately I've already bought a 3080 with only 10GB VRAM.

    • @glenswada
      @glenswada 17 дней назад +1

      Doesnt really matter. I have slower 8GB card (aka 4060) and runs 8B great (responses in 1-2 seconds). The larger 70B will not fit into vram so will spill into local ram anyways. I had to upgrade from 32 to 64Gb ram to get 70B to work. But response times for 70B is something like 2 to 8 minutes. Not even a 4090 will run these larger models at a speed that will make people happy. So best to use 8B mostly. Only use 70B when you want a more detailed answer. Be nice to run 405b but it will not perform on cunsumer based gpu's.

  • @Manicmick3069
    @Manicmick3069 2 месяца назад

    I am running the 7800xt without issue on rocm. Both windows and Linux. Getting insane inference speeds. Only paid about $550 for it, and it has 16gb of ram. I'm getting dual 7900xtx with the 7950x3d for my next build.

  • @liora2k
    @liora2k 19 дней назад

    Good video , thanks for putting this together.
    I have some comments around AMD GPU, everything can run on them , you need to know how to …

  • @CaimAstraea
    @CaimAstraea 9 дней назад

    what about the rtx titan

  • @aratoval1
    @aratoval1 3 месяца назад +1

    How about Nvidia Tesla P40, P100, M80 itp?

    • @uss-dh7909
      @uss-dh7909 2 месяца назад

      I got a tesla m40 with 24gb of ram and it's... okayish for stable diffusion. 720x1280 in about 1 minute.
      If you don't know already, it's a passive server class card, no active cooling here. I have a front 140 that I ducted to it with cardboard and sealed with electrical tape, plus a 60mm exhaust zip tied to the back. Running both at about 60% cools the card back down from 80c to about 40c (which is where you want to start each run) in about two minutes.
      Hoping to get a 2080ti next year to replace my 57xt, that is unless AMDs situation massively improves, then I might upgrade to... idk what AMD card...

  • @pleb.0_0
    @pleb.0_0 3 месяца назад

    flux is a 12gb model so having more vram helps with latent difussion as well. I can't load any flux loras with an RTX2080 w/ 11Gb vram. Really trying to wait for RTX5k series to land before upgrading. Dual 3090s would be a big investment right now that likely turns obsolete next year

  • @Fish4Joe
    @Fish4Joe Месяц назад

    is there a reason you skip over the 4070 Ti Super, which has 16GB of VRAM?

    • @mytechantics
      @mytechantics  Месяц назад

      4070 Ti Super although not mentioned in the video is also a valid choice if you can find a good deal

  • @DedicatedLeon
    @DedicatedLeon 4 месяца назад +1

    Hi, great video. It's one of the few videos that talk about GPUs for AI and not for gaming. I have a question that I hope you can help me answer: What do you think about using two RX7600XT (16GB each, 32 VRAM total)? (I want to use locally ollama or lm studio with llama 3.1 70b)

    • @mytechantics
      @mytechantics  4 месяца назад

      I don't have much direct experience with AMD cards and AI software, but if ollama/lm studio support AMD graphics cards and there is a way to connect two RX7600XT's together and your motherboard supports it, then both the performance of such setup and the amount of VRAM you'd get would probably be more than enough. Unfortunately that's everything I can say, as I haven't dabbled with any multi-GPU AMD setups yet.

    • @souravroy8834
      @souravroy8834 4 месяца назад +4

      dont buy amd r ight now lack of rocm support for windows is horrible for any ai tools currently . at the same price you can get 3060 12 gb card right now that would serve you in long run and you will be able to use any ai tools doesnt matter what it is for coming exciting years . and yes i am a amd fanboy.

  • @CryptoRealAlpha-dp8jy
    @CryptoRealAlpha-dp8jy 4 месяца назад +2

    Is there a big difference running LLMs on 24GB vs 16GB VRAM ?

    • @mytechantics
      @mytechantics  4 месяца назад +3

      With more VRAM you can load up higher quality models with more parameters and have access to larger context windows during inference. Depends on your use case really, but for less constraints I would go with 24GB.

    • @GraveUypo
      @GraveUypo 3 месяца назад +4

      no.
      the next step up in model size requires 48gb, so even with 24 you're limited in the same way, just with marginally longer context window.
      my rtx 4080 with 16gb gets 10750 context length with llama 3.1 uncensored q8 without glitching, which is quite a lot for almost any use you can give to a local LLM. (it's about 43000 characters, or 23 pages of a single-spaced book).
      With an rtx 4090, you'll get around 19k context length, which is a lot more, 41 pages of a book, but not "a lot more useful" if you get what i mean. it's still not a whole book. it does help run more AI in parallel though, i guess. but i don't do that often so idk, idc.

  • @Tarantella.Serpentine
    @Tarantella.Serpentine 5 месяцев назад

    would it be worth getting 2- RTX 3060 12GB and sli or.... what are you thoughts on Runpod?

    • @mytechantics
      @mytechantics  5 месяцев назад

      Hi, I'm not sure that the 3060 supports SLI at all, but if it does, you already have a compatible motherboard, and you're able to get two of these cheap it might be worth trying it.
      When it comes to Runpod I personally haven't used it, but I see many people advertising it and looking at their prices and what they offer, they seem legit especially if you need access to more than 24GB of VRAM without breaking the bank.

  • @ifnotnowwhen2811
    @ifnotnowwhen2811 2 месяца назад

    OK, I think it's time to build a local LLM machine. Are you saying I can get away with a single 3090?

  • @nosuchthing8
    @nosuchthing8 3 месяца назад +1

    I saw someone run a LLM on a raspberry pi. No gpu. 4gb ram.

    • @derekmusial1318
      @derekmusial1318 Месяц назад

      I'm doing that right now while i build a dedicated home server. Takes FOREVER to load basic responses. Would not recommend.

  • @bougiri
    @bougiri 14 дней назад

    4060 ti 16 gb vram?

  • @capturedbyfabian
    @capturedbyfabian 2 месяца назад

    I am considering adding another 3090ti so I can nvlink for stable diffusion using Comfyui but not even sure if is compatible or not with a nvlink/sli setup. Any feedback?

    • @nortonwedge
      @nortonwedge 12 дней назад

      I asked chatgpt for you. "ComfyUI, a user interface for Stable Diffusion, does not natively support VRAM pooling through NVIDIA NVLink. This means that while multiple GPUs can be utilized to distribute workloads, their memory capacities are not combined into a single, larger pool within ComfyUI."

  • @Techonsapevole
    @Techonsapevole 3 месяца назад

    ollama has native amd rocm support with ollama:rocm

  • @denirodarkqwerty
    @denirodarkqwerty 3 месяца назад +2

    4070 ti 16gb?

    • @MineSum10
      @MineSum10 3 месяца назад

      I was also wondering why he didn't mention that or the 4060 ti 16gb

    • @AliComputering
      @AliComputering 3 месяца назад

      4070 ti super 16GB

  • @burncloud-com
    @burncloud-com 3 месяца назад

    Thank for shareing this with us, I am here for ad

  • @منانمنان-ظ7د
    @منانمنان-ظ7د 5 месяцев назад

    Rtx 4060 ti the 16gb one? Is it batter or do k need to go with somthung in the same price but old like rtx 3080

    • @Johan-rm6ec
      @Johan-rm6ec 5 месяцев назад

      Better 2 X 4060 ti 16gb. You need to find a trade off. Is 1 rtx 4070 ti 16 gb better than 2 x 4060 ti 16gb. That are the questions you should ask.

    • @منانمنان-ظ7د
      @منانمنان-ظ7د 5 месяцев назад

      @@Johan-rm6ec and what do you think? I just want to buy a build to learn about pyTorch and AI ml also play with it a little, also as i see the gpu has batter t-cores then a rtx 3080

    • @luisff7030
      @luisff7030 4 месяца назад

      Both 4060ti 16gb and 3080 are good to learn, because we can learn with small AI.
      For the general use is better the 4060ti 16gb, because has more vram.
      For smaller models that work with 10gb vram, the 3080 is faster, like video enchanting. But some programs cannot use multiple GPUs, like the Nvidia Neuroangelo, and if the user don't need to make work in time, then one slower 4060ti can do it.
      We are getting more programs that demand more Vram, like the Flux text2image, image to 3d Nerf. The LLMs models that use 10gb vram are fast with both cards.
      Expect the to be 10 times slower or more when GPU will use many GB of system RAM.

  • @kingcuda-y7b
    @kingcuda-y7b 3 месяца назад

    I have question, you mentioned about buying two 3090 Ti 24 each and combine them to reach the level of RTX 4090 wwith better price, however you didnt explain about SLI is not supported so how can combine the VRAM

    • @AliComputering
      @AliComputering 3 месяца назад +3

      SLI is need for gaming not Ai or other things like render 3D
      u can install even more like 3x 4090 at some mother boards support it (I mean mainstream series boards not server or worksations)
      even though it will be X8/X8/X4 at AMD and X8/x4/x4 at intel systems who has Triple x16 slots (from length) cause 2nd x16 slots are only x8 lanes
      and if u look closely to slot u can see only 8 of lanes has connection. in some boards even 2nd has 4x (I have MSI Z790 TOMAHAWK DDR5 its x4 on 2nd or had ASUS Z790 TUF)
      while when I had MSI Z790 Carbon 2nd was x8 and at MSI Z390 Carbon it was x16/x6/x6 (from length ) and from lanes was x16/x8/x4
      which while I installed 3 rtx 3090 Turbo dual slot on it, they worked on X8/X4/X4
      without using SLI Bridge of course,
      and at dual mode it worked x8/x8 no matter sli bridge install or not install
      even if u install another card like PCI-E USB Card - Lan Card - Sound card - M.2 RAID cards (u can see many multi M.2 SSD cards who support 4 or 8 M.2s which are X8 (by both length and lanes) or even x16 ones
      but even install X1 USB card at 2nd X16 will make that 1st x16 work at x8! (so its not have to be graphic card for that)
      SLI is just for games or if some softwares cant support multi gpu , when be SLI they will detect them as one (but most non gaming software cant detect SLI Cards either and
      by my tries, they didn't detect any vga (Microsoft basic vga or just 1 card)
      at render, mining, Ai softwares SLI doesn't matter at all.
      we tried up to 6x 3090s and later 4090s on Workstation board with 6 PCI-E (Dual Threadreaper with enough PCI-E Lanes from CPUs and Chipset)
      of course had to use many x16 gen 4 risers and triple 1600 Wat power supplies
      they worked perfectly
      if someone access to nVIDIA Quad or even better nvidia Tesla Cards will be far better too.
      cause there are 48GB, 96GB even 192GB variant GPUs there.
      with far less power drop and far easy to cool them. they come with just one Blower silent FAN and many are even use passive cooling,

  • @worgle123
    @worgle123 3 месяца назад +1

    I am watching this with integrated AMD Graphics, 15GB RAM, and a Ryzen 7 4000U 💀

  • @luisdavid4109
    @luisdavid4109 4 месяца назад

    Hi! Thank you so much for the video!
    There is a version of the 4060TI which contains 16gb of vram instead of 8gb. I'm hesitating a lot, I'd like to get the 4070 super, but since you mentioned memory is more in AI, 12gb vs 16gb of the 4060ti pufff.
    What do you think?

    • @mytechantics
      @mytechantics  3 месяца назад

      Tough choice here, for most purposes the 4070 Super would be the best pick as it is faster and more recent than the 4060 Ti, but the 4060 Ti has a bit more VRAM and you can find these for a better price. If you're going to be locally hosting any larger LLM's, you need to estimate how much VRAM you will need for your purposes here, and if the 4GB will make a difference for you. If you're just interested in local image generation, voice changing and such, and you're not going to be locally training or fine-tuning larger models, 12GB of VRAM is usually plenty.

    • @Nik.leonard
      @Nik.leonard 3 месяца назад +4

      @@mytechantics 4060ti 16gb = Mistral small / Codestral Q_4_0 (ollama default) with a context window of 8192. 22B models just doesn't fit on 12gb.

  • @drinerqc7434
    @drinerqc7434 8 дней назад

    Rtx 6000 ada is rly good to

  • @DIYKolka
    @DIYKolka 4 месяца назад +1

    Its not that hart guys, just wait for a good price, u can buy used rtx 3090ti for 400-500€ just wait for the deal.

  • @miroslavmajer5155
    @miroslavmajer5155 14 дней назад

    Or buy Macbook - it's actually cheaper for LLM comparing to build PC for LLM. Mac have sharable ram between CPU and GPU. So, in case od MacbookPro with 128GB RAM, you can actually have GPU with 128GB RAM

  • @raghuls1908
    @raghuls1908 5 месяцев назад

    Bro i have an intel arc 770 can it run llm better than amd or does it doesn't support

    • @MisterKyle93
      @MisterKyle93 5 месяцев назад +1

      Amd is way better than intel

    • @MisterKyle93
      @MisterKyle93 5 месяцев назад

      i used to use a arc a770 because it's the cheapest GPU with 16gb vram in my country but there are too much crash, now i use a rx6800xt and it works very well for llms

    • @mytechantics
      @mytechantics  5 месяцев назад +1

      @raghuls1908 Depends on the software you're going to use. The Oobabooga text generation WebUI for instance, doesn't have official support for Intel Arc as far as I know.
      Check out this thread from a few months back: www.reddit.com/r/LocalLLaMA/comments/1bffh19/intel_arc_for_llms/

  • @marverickbin
    @marverickbin 3 месяца назад

    Seems you can buy almost 6 3060 with the money of a 4090.
    So, why not just make a gpu grid with at least 4 of them? Seems the cheapest way to achieve 48gb.

    • @mytechantics
      @mytechantics  3 месяца назад +1

      I see where you're coming from but as far as I know, the 3060 does not support NVLink so you cannot really connect them together. And besides that, taking other cards into account, in most cases you would need to count in the additional costs of the gear needed to put together a multi-GPU system (assuming most people don't have compatible mobos just laying around).
      I think for now, if you have appropriate hardware on hand, 2xRTX 3090/Ti is still the way to go if you want to get your GPUs used and get functional 48GB of VRAM.

  • @jorgennorberg7113
    @jorgennorberg7113 Месяц назад

    Man, that WAS fast...

  • @noodles02076
    @noodles02076 27 дней назад

    22GB mod 2080ti SLI enters chat.

  • @ALEXHANS1383
    @ALEXHANS1383 8 дней назад

    Another video which could have been a blog post.

    • @mytechantics
      @mytechantics  7 дней назад

      It actually is, and the post features some more recent updated info, here: techtactician.com/best-gpu-for-local-llm-ai-this-year/

  • @PaulRoneClarke
    @PaulRoneClarke 3 месяца назад

    4070 ti Super has 16 GB and is a thousand dollars less expensive than the 4090
    3090 are now almost Impossible to buy even 2nd hand.

    • @snickle1980
      @snickle1980 3 месяца назад

      Why have they become impossible to buy?

  • @ESGamingCentral
    @ESGamingCentral 4 месяца назад +3

    Ridiculous he doesn’t talk about 4060ti

    • @AliComputering
      @AliComputering 3 месяца назад +2

      Its pretty weak card with very low memory bus, expensive for its price,
      u can buy 2 rtx 3060 12gb used for its price

    • @Jaggukajaadu
      @Jaggukajaadu 3 месяца назад

      ​@@AliComputeringrtx 3060 doesn't support multi gpu . But rtx 4060 ti work

    • @ESGamingCentral
      @ESGamingCentral 18 дней назад

      @@AliComputering your comment aged like milk

  • @BogdanTestsSoftware
    @BogdanTestsSoftware 23 дня назад

    All the content on YT presents only Nvidia this and Nvidia that. I would like first a _feature availability / capability_ comparison between AMD Radeon vs Intel Arc vs Nvidia. I only include Nvidia to be able to relate to all other comparisons I´ve seen. I wouldn´t buy even a used Nvidia other than it really comes out as a steal (I mean a cheap deal, not actually stolen goods!!). I find Nvidia licensing & software practices as being toxic *for myself and for the market*. Now once capabilities are clear and set, I would like to know how performance stacks up for each. I.E.: is Radeon 2x (twice) as slow or 10x?? If performance is around 2-3x slower for Radeon or Intel, I would find the tradeoff quite acceptable. 5-10x would be a bit harder tradeoff, but I would still avoid Nvidia, the best I can. Nvidia is at "kill it with fire, before it spreads" toxicity and infectiousness levels, just my $0.02.

  • @philoxoper
    @philoxoper Месяц назад

    4060 or 3060

  • @theskyspire
    @theskyspire 2 месяца назад

    4070ti super has 16gb not 12 gb , and the 4070ti super outperforms the 3090 from the benchmarks i've seen.

  • @luisff7030
    @luisff7030 5 месяцев назад +1

    I purchase a new RTX 4060TI 16GB instead of the RTX 3090 because I don't trust sellers with 0 reviews or with 1 bad review from not sending the device, and they don't accept returns. The sellers that I trust more sell 3090 cards for about 900€, close to the 1000€ for 2 GPUs 4060ti 16gb.
    I also can add another 4060ti and get in total 32gb of vram.

    • @GraveUypo
      @GraveUypo 3 месяца назад

      that's probably the best route for this. but beware of pci-e bandwidth limitations. you're fine if you're on pci-e gen 4 or better, but at gen 3 you might start bumping into bandwidth bottlenecks with more than one gpu.

    • @luisff7030
      @luisff7030 3 месяца назад

      @@GraveUypo The PCIe performance will be similar for 1 or 2 setups since the RTX 4060 Ti is designed to operate with 8 lanes, regardless of whether it's in a 16x slot or an 8x slot created via bifurcation.

    • @luisff7030
      @luisff7030 3 месяца назад

      @@GraveUypo you are right that GEN3 slow

  • @kyrkbymannen
    @kyrkbymannen 3 месяца назад

    way to little memory on 12GB

  • @eatfrenchtoast
    @eatfrenchtoast Месяц назад

    Lets cut to the chase... 3:16 later

  • @Ritcheyyy
    @Ritcheyyy Месяц назад

    RTX A6000…

  • @Manicmick3069
    @Manicmick3069 2 месяца назад +3

    It's obvious this dude doesn't pay attention to the advancement in rocm. I'm running a 7800xt dual boot without issues. Pytorch and tensorflow works flawlessly. $550 for 16gb of ram. Keep using your mortgage payments supporting nvidia. About to build a dual 7900xtx system when the 79503dx drops. That's 48gb of ram for the price of a 4090. 😅😅😅😅. You can have NVidia high ass prices😅😅😅😅😅

  • @animation-nation-1
    @animation-nation-1 2 месяца назад

    best card for running LLM locally.... cheapest NVIDIA that has 24GB VRAM that you can get your hands on :P
    Nvidia to keep their cards so low VRAM though is so scummy. a $1000 4070Ti has 12GB VRAM. their pushing to 4090s and selling 4090s at 2x the price.
    where as an AMD same price is 24GB VRAM vs 12GB Vram. I just wish AMD had more support. it would literally be half the price. its just sad nvidia is screwing their customers just for their tier scale. if RTX 50 series doesnt go 24GB minimum on $1k + cards I'm done running locally lol.

  • @Manicmick3069
    @Manicmick3069 5 месяцев назад +3

    Bro, i have 3 pcs And a mini pc running llms in amd graphics cards. I stopped the video after you made that asinine statement. 😅😅😅😅😅😅

    • @mytechantics
      @mytechantics  5 месяцев назад +2

      I see. It's just that my own experience with AMD cards and local AI software alongside with what I've read online personally does not really make me optimistic when thinking of experimenting more with LLMs on AMD cards.
      I'm very glad it works good for you, and I know that it does for many others with some tinkering. I just think that in general, AMD is still a little bit behind NVIDIA when it comes to all things AI. Of course my opinion may be biased, glad you're pointing this out and thanks for the comment.
      Out of curiosity, what Mini PC are you using?

    • @GraveUypo
      @GraveUypo 3 месяца назад

      LLMs are fine on AMD but there are other types of AI that have poorer compatibility. most voice ones are kinda annoying with cuda for instance. he does have a bit of a point still (less and less as time goes by)

  • @Viewable11
    @Viewable11 5 месяцев назад +4

    "For hosting large language models locally you mainly need ... and fast GPU clock speed."
    That is false. GPU clock speed is irrelevant.
    "Currently, the two cards with 24GB VRAM are the RTX 4090 and RTX 3090".
    That is also false. There are four consumer graphics cards with 24 GB VRAM: RTX 3090, RTX 3090 Ti, RTX 4090, RX 7900 XTX. In addition, there are many professional cards with 24GB or more VRAM, but those have insufficient cooling and need additional cooling.
    Downvoted for publishing disinformation.

    • @mytechantics
      @mytechantics  5 месяцев назад +1

      I partially agree, but I don't think that this is disinformation. I would argue that clock speed is always important, but as it has been said and emphasized in the video VRAM is what you really need to be able to run many larger models. I'm not including AMD cards in this video, and in the sentence you're referring to I am kind of putting the 3090 and the 3090 Ti in one bag so to speak, which I feel like is clarified later.
      Thanks for the comment!

    • @souravroy8834
      @souravroy8834 4 месяца назад

      @@mytechantics yes vram is necessary for the large models but at the end if you get something like 4060 16 gb version that is way faster than 3060 gb 12 gb due to more clock speed and simply some new ai optimizations and more cuda cores .

    • @edrahimovic
      @edrahimovic 3 месяца назад

      Fuck me, you must be fun at parties?

  • @gr8b8m85
    @gr8b8m85 2 месяца назад

    Current consumer GPUs aren't suited for generative AI at all. They're barely keeping up now and they won't be able to run models one or two years from now, and closed source will be so far ahead of anything else, there's no point anyway. You'd be wasting money to get dedicated hardware for this today when a whole new class of hardware in TPUs that massively outperform even specialized GPU farms is coming soon.

    • @SocialNetwooky
      @SocialNetwooky Месяц назад

      that's ... well ... it's an opinion with lots of speculations. Meanwhile, you can run local LLMs that perform as well (and sometimes outperform) Commercial offerings in *specific* areas, and as you can have as many models stored locally as you have storage space, the only problem is to switch model depending on what you need.

  • @Johan-rm6ec
    @Johan-rm6ec 5 месяцев назад +3

    Video not quite helpfull.