LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • Sitting down to run some tests with i9 9820x, Tesla M40 (24GB), 4060Ti (16GB), and an A4500 (20GB)
    Rough edit in lab session
    Recorded and best viewed in 4K

Комментарии • 34

  • @andrewowens5653
    @andrewowens5653 2 месяца назад +3

    Thank you. It would be interesting to see some evaluation of multiple consumer gpus working on the same llm.

    • @RoboTFAI
      @RoboTFAI  2 месяца назад

      I have another video of testing 1,2,3,4, and 6 4060's (which I consider consumer level) together on same LLM here - ruclips.net/video/Zu29LHKXEjs/видео.html but if you have more specific ideas please let me know.

  • @nithinbhandari3075
    @nithinbhandari3075 Месяц назад +1

    Thanks for comparing the different GPU hardware.
    Can you run a test like, there is 6k input token and 1k output token.
    So, we can known that how large LLM perform under 6k input and 1k output token.

    • @RoboTFAI
      @RoboTFAI  Месяц назад

      Yea we can absolutely run some tests with much larger prompts/etc!

  • @fooboomoo
    @fooboomoo Месяц назад

    great content and relevant to me since I recently bought a 4060 ti 16gb for ai.

    • @RoboTFAI
      @RoboTFAI  Месяц назад

      thanks for watching!

  • @jeroenadamdevenijn4067
    @jeroenadamdevenijn4067 2 месяца назад +2

    If I run Codestral 22b Q4_K_M on my P5000 (Pascal architecture), I get 11 t/s evaluation, so that means the P5000 performs around 75% of a 4060TI. But now, when I open Nvidia Power Management I can observe it only consumes 140W when under load while it should be ablte to go up to 180W. B.T.W. both these cards have 288GB/s memory bandwidth. I must have a bottleneck in my system which is a Intel 11th gen i7 laptop (4-core CPU) and eGPU over Thunderbolt 3.

    • @RoboTFAI
      @RoboTFAI  2 месяца назад +1

      That's pretty decent speed in that setup

    • @jeroenadamdevenijn4067
      @jeroenadamdevenijn4067 2 месяца назад +1

      @@RoboTFAI It does slow down though with larger context, let's say 8~9 t/s and when I go for Q5_K_S that becomes 7~8 t/s, still doable.

    • @stevenwhiting2452
      @stevenwhiting2452 Месяц назад +1

      Play with your data chunk sizes, it's usually unoptimised memory movement that limits the throughput. Nvidia has a tutorial that explains cuda much better than I can. The P40&P100 do the same thing on some models too.

  • @jackflash6377
    @jackflash6377 Месяц назад +2

    A4500 vs RTX3090 ??

    • @RoboTFAI
      @RoboTFAI  Месяц назад +1

      Attempting to acquire a 3090 for the channel, stand by!

  • @fulldivemedia
    @fulldivemedia Месяц назад

    thanks

  • @georgepongracz3282
    @georgepongracz3282 26 дней назад

    it would be interesting to compare a 4070 ti super to the 4060 ti if the scaling is proprtional to cost

    • @RoboTFAI
      @RoboTFAI  25 дней назад

      Don't have one to test with, but if you want to send me one I am happy to throw it through the gauntlet hahaha

  • @tsclly2377
    @tsclly2377 Месяц назад

    P40 vs 3090ti .. just because there is so much of a price difference and what can you get in loading speeds if your files are on a P900 Optane (280GB) [assuming that one is setting up batch processing]

    • @RoboTFAI
      @RoboTFAI  Месяц назад

      I don't have either card to do testing with, will ask around friends/etc. Or might try to trade for a 3090 since everyone goes after them for their rigs...power hungry though

  • @marsrocket
    @marsrocket 26 дней назад

    Llama 3 7B runs in near real-time on an Apple M1 processor, and presumably faster on an M2 or M3.

    • @RoboTFAI
      @RoboTFAI  25 дней назад

      It does, I haven't brought Apple Silicon into the mix on the channel just yet - but I have a few M1, M1 Max as my daily machines

  • @fulldivemedia
    @fulldivemedia Месяц назад

    great content my problem is choosing an am5 motherboard, I have 3 that I have got my eye on but I don't know which one is more future-proof
    msi meg x670e ace
    asus proart x670e
    asus rog strix x670e-e gaming
    can you help?
    i want it mostly for AI art and such, msi costs more, rog and proart are the same price (but I still don't know between these two which one is better, proart 2 PCI x8 x8 but rog is x8 x4) is msi is better than proart?

    • @noth606
      @noth606 18 часов назад

      Old question but just commenting in case anyone else wonders - this sort of question has no answer properly since you provided no info whatsoever about what your planned config is. The primary differentiator is price most likely, if all you do is run one GPU on them and that's it, probably the cheapest is the best bang for your buck. The PCI stuff makes no sense but doesn't seem likely to be true either. My 2ct is that I've at times had issues with both ASUS and MSI, but the differentiator is ASUS did "fix it" by issuing a refund, MSI did not. So I personally would not pay money for an MSI board. Well, I'd give maybe $20 for one at most. ASUS I have continued to use for years after and never again had issues with.
      I run ROG boards myself, several of them. Main box is a X299 ROG Rampage VI EE right now. My 'BS tolerance' is very low, I'm an ex IT pro and run mostly professional gear, Dell and HPE, but I do run ASUS custom stuff next to that.

  • @donaldrudquist
    @donaldrudquist Месяц назад

    What application are you using to run this?

    • @RoboTFAI
      @RoboTFAI  Месяц назад

      It's custom built by me - combo of Streamlit, Python, Langchain, etc, etc

  • @six1free
    @six1free Месяц назад

    so I swung a 4060 laptop and a 4070tisuper and have spent the last couple days migrating my PC into an AI server, haven't yet gotten to the AI but in the meanwhile I'm putting the warranties to the test with some hardcore mining, almost nestalgic to when bitcoin was $10/btc
    I am realizing the 16Gvram is a bit of a bottleneck though, do you think adding an M40 or two would help? will the GPUs be able to crosstalk each others vram?

    • @RoboTFAI
      @RoboTFAI  Месяц назад +1

      Yes, and I will answer some of this question in next video! Mixing GPUs/Tensor splitting

    • @six1free
      @six1free Месяц назад

      @@RoboTFAI sweet sounds like a good video

  • @mohammdmodan5038
    @mohammdmodan5038 27 дней назад

    I'm planning to buy gpu i have 2 choice P100 and M40 24GB i want to run 8B model is it's enough for it currently i have RYZEN 5 3600 16GB DDR4 1T NVME

    • @mohammdmodan5038
      @mohammdmodan5038 27 дней назад

      You have M40 right can you provide tokens/s

    • @RoboTFAI
      @RoboTFAI  27 дней назад

      P100 is a Pascal architecture and newer than the M40 which is Maxwell architecture - so I would always recommend the newer cards of course depending on your budget and needs. Both will be power hungry.
      Llama 3.1 8B? Depends on context size....it defaults to 128k which is going to be heavy on your VRAM depending on quant/etc.
      To give an idea - Meta publishes this as guide (taken from huggingface.co/blog/llama31) on just context size vs kv cache size. You still have to load the model, other layers, etc, etc....
      Model Size 1k tokens 16k tokens 128k tokens
      8B 0.125 GB 1.95 GB 15.62 GB
      70B 0.313 GB 4.88 GB 39.06 GB
      405B 0.984 GB 15.38 GB 123.05 GB
      I actually have 3 old M40's sitting around in the lab as that is where I started in my AI journey over a year ago! So yea can do testing with them.

  • @Johan-rm6ec
    @Johan-rm6ec 13 дней назад

    With these kinds of tests, 2 x 4060 ti 16gb must be included. And how it performs. 24gb is not enough 32gb on a Quadro kind of is 2700 euro"s. So it seems its a sweetspot. That you shpuld cover. Know your audience know sweetspots and that are the video's people want to see.

    • @RoboTFAI
      @RoboTFAI  11 дней назад

      Adding in 2x 4060's won't really increase the speed over 1 of them, at least not noticeable. There is some other videos on the channel addressing this topic a bit. Scaling out on # video cards is really meant to just gain you that extra VRAM. So it's always a balance of your budget, costs, power usage, and your expectations (this is the more important one).
      Lower, lower your expectations until your goals are met! haha

  • @STEELFOX2000
    @STEELFOX2000 Месяц назад

    iS POSSIBLE TO USE USE AN RX 6800 TO DO THIS TASK?

    • @RoboTFAI
      @RoboTFAI  Месяц назад

      I do not have any AMD cards to test with, but there is ROCm for AMD and llama.cpp/LocalAI/etc etc do support it these days.