Llama 3.3 70B Tested LOCALLY! (First Look & Python Game Test)

Поделиться
HTML-код
  • Опубликовано: 1 фев 2025

Комментарии • 26

  • @MikeTheBard
    @MikeTheBard Месяц назад +1

    Love your videos. Keep them coming!

    • @Bijanbowen
      @Bijanbowen  Месяц назад +1

      Thanks very much! I will for sure.

  • @youroldmangaming8150
    @youroldmangaming8150 Месяц назад

    Good work mate, keep it up!

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      Thanks very much! Been meaning to send you a note!

  • @marcomerola4271
    @marcomerola4271 Месяц назад

    It was a very fun and instructive video. It would be interesting to see a comparison of the same game coded by the free versions or paid of the other providers

    • @Bijanbowen
      @Bijanbowen  Месяц назад +1

      Thanks very much! I have used the same prompt in some separate testing videos for paid providers, though I will next time perhaps throw in a quick comparison in the video as well!

    • @marcomerola4271
      @marcomerola4271 Месяц назад

      ​ @OminousIndustries yes, you're right. Apologies for not checking before asking. But a 1 to 1 comparison would still be cool. Maybe for the future, you could paste the link of related videos in the description down below 😊

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      @@marcomerola4271 I agree direct comparisons are a good idea. Good thought on the additional links. I will keep note of this for future videos with similar testing conditions!

  • @KayWessel
    @KayWessel Месяц назад

    Nice test! I was planning to build a dual 3090 system, but I guess I need to reconsider. This was slower then expected. Would a dual 5090 perform somewhat better or just 2x?

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      I can't speak to this aside from what I have experienced, but fwiw I have had exl2 70B models running in the text-gen-webui that were much faster than this. I have seen some discussion on speed differential between gguf and exl2 but I am not knowledgeable enough to make any definitive statements on this - just personal anecdotes.
      Not sure how much faster, but a dual 4090 let alone 5090 should be a nice speed increase based on some of the user benchmarks I have seen on r/localllama

  • @jeffwads
    @jeffwads Месяц назад +1

    Better late than never. Sweet model. Been using the 8bit and the 128K context really smokes.

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      You are giving me quantization inferiority complex mentioning the 8bit LOL! it is a rather impressive model indeed.

  • @MontyMcRib
    @MontyMcRib Месяц назад

    would be interesting to know the difference between llama 3.0, 3.1, 3.2, and 3.3 in 4 bit quant. i got hardware running 70b in 8 bits, but i still cant make the jump from 3.0 to 3.1 or 3.2, from my own testing it seems like 3.0 with 8k context is still superior to the 128k models (although i didnt test 3.3 yet). im testing on real world use cases.

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      I would assume they benchmark better between .0/.1/.2/.3 etc, but like you say real world use cases are often more important than benchmarks for folks like us.

  • @sevilnatas
    @sevilnatas Месяц назад

    On these locally hosted models, it would be interesting to know how many tokens per second that you're getting back.

    • @Bijanbowen
      @Bijanbowen  Месяц назад +1

      Good thought, I will try to get speed results for local testing in the future.

    • @AmrAbdeen
      @AmrAbdeen Месяц назад

      ​@OminousIndustries just run : ollama run modelname --verbose
      will give you full Statistics after each response, including tokens per second

  • @proterotype
    @proterotype Месяц назад

    Come on man, where’s that mike you’ve been talking about? I know you can afford it. /s
    When you add it, your vids will level up. Thanks for the walk throughs!!!

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      I spent the mic budget on a ChatGPT pro subscription LOL. Thanks for the kind words, I actually have a nice akg mic I used to use for music related tasks so perhaps I will hook that up to the system and use that for screen recording audio.

  • @sevilnatas
    @sevilnatas Месяц назад

    Is DO Anything managing the hosting of the local model? I am interested in running multiple gpus and am trying to figure out the best way forward, as far as performance.

    • @Bijanbowen
      @Bijanbowen  Месяц назад +1

      No, Ollama is handling the hosting of the local model here, Anything LLM is just providing a user interface to be able to interact with the model.

    • @sevilnatas
      @sevilnatas Месяц назад

      @@Bijanbowen Ah, good to know, thanks.

  • @mostwanted2000
    @mostwanted2000 18 дней назад

    What do you recommend? Linux? (What version) or Windows?

    • @Bijanbowen
      @Bijanbowen  17 дней назад +1

      I personally prefer Ubuntu, but if someone is used to windows and does not want to have to trouble shoot a lot it might not be a bad idea to stick with windows haha

  • @KonstantinsQ
    @KonstantinsQ Месяц назад

    How much did it took of ram?

    • @Bijanbowen
      @Bijanbowen  Месяц назад

      It was using about 19gb on one card and 22 on the other, so a total of about 41gb or so for this Q4K_M quant.