RoboTF AI
RoboTF AI
  • Видео 24
  • Просмотров 58 623
Halloween Stories via Streamlit, Langchain, Python, and LocalAI (or OpenAI) with Text to Speech!
RoboTF Halloween Stories via Streamlit, Langchain, Python, and LocalAI ( OpenAI) with Text to Speech
This week in the RoboTF lab:
I want to encourage everyone to go build something, anything, learn, and have some fun along the way.
We introduce the RoboTF Halloween Stories application, walk through it a bit, listen to a few stories, play with different setups. Even do a quick demo of LocalAI running in Docker.
We use Streamlit, Python, Langchain, Requests, Local AI, Docker, and more to create a chatbot that not only writes stories but will speak them.
Terrible quick code base to get you started in your own project:
Codebase: github.com/kkacsh321/robotf_halloween_stories
Interact with the hosted ...
Просмотров: 133

Видео

Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU 20GB Where does prosumer/enterprise land?
Просмотров 52414 дней назад
Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU Contender Where does prosumer/enterprise land? This week in the RoboTF lab: Lab is a disaster but we take an Nvidia A4500 20GB out of my main AI machine, throw it in the bench testing machine and take it through the Mistral 7B Leaderboard tests. Is it worth it to dump 💰? Final results at 16 Min Mark. Leaderboard is live: robotf.ai/Mistral_7B_L...
LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2
Просмотров 29614 дней назад
LocalAI LLM Tuning: WTH is Flash Attention? What is it effects on memory or performance? Lab Session with Llama 3.2 3B Q8. This week in the RoboTF lab: It's fall, I spent a ☕️ lab session addressing the topic of Flash Attention and why you should care about it. Let's do some quick testing of it and compare the results. Final results at 11 Min Mark. RoboTF Website: robotf.ai/Mistral_7B_Leaderboa...
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS?
Просмотров 47121 день назад
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS? This week in the RoboTF lab: While allergies are taking me down it won't stop the tests! We unbox a brand new RTX 4090 and take it through the Mistral 7B Leaderboard tests. Is it worth it to dump 💰? Can it break 100 TPS? Will it suck enough power to cook a pizza? Final results at 12 Min Mark. GPU Lin...
Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down!
Просмотров 55521 день назад
Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down! This week in the RoboTF lab: We put the current king in the lab through the Mistral 7B Leaderboard tests as the 3090 said it wanted to de-throne the 4070Ti Super from the last video in the series. Can the 3090 do it? Will it suck enough power to cook a pizza? Final results at 11 Min Mark. GPU Lin...
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4070Ti Super 16GB and giving it run!
Просмотров 820Месяц назад
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4070Ti Super 16GB This week in the RoboTF lab: A friend of the channel loans us an Nvidia RTX 4070Ti Super 16GB to run through some tests. We unbox it (or try to)...and run it through the Mistral Leaderboard tests! Final results at 15 Min Mark. GPU Link: 4070TI Super amzn.to/4eyOPsj Leaderboard is live: robotf.ai/Mistral_7B_Leaderboard Lea...
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB
Просмотров 556Месяц назад
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB This week in the RoboTF lab: The standard card on the channel an Nvidia RTX 4060Ti 16GB gets put through the Mistral 7B leaderboard gauntlet. Final results at 12 Min Mark. GPU Link: 4060ti 16GB amzn.to/3NeSEGT Leaderboard is live: robotf.ai/Mistral_7B_Leaderboard Leaderboard reports (from these videos if you want a hands on loo...
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB
Просмотров 526Месяц назад
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB This week in the RoboTF lab: I pull an Nvidia Tesla M40 (amzn.to/3Yf4yXC) out, and put it through the Mistral 7B leaderboard gauntlet. How does a GPU from 2015 compare? And this will be first GPU contender to make it to FP16. Final results at 22 Min Mark. Leaderboard is live: robotf.ai/Mistral_7B_Leaderboard Leaderboard reports ...
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660
Просмотров 372Месяц назад
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660 This week in the RoboTF lab: I pull out a GTX 1660 (amzn.to/4eQiJYE) from the cabinet, and put it through the Mistral 7B leaderboard gauntlet. Can we even get to Q8 on 6GB of VRAM? Final results at 22 Min Mark. Leaderboard is live: robotf.ai/Mistral_7B_Leaderboard Leaderboard reports (from these videos if you want a hands on look): ro...
Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000
Просмотров 315Месяц назад
Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000 This week in the RoboTF lab: We go over the goals, and rules of engagement along with launching the Mistral 7B leaderboard! We then bring in an older Quadro P2000 (amzn.to/3NeWWht) and put it to the tests. Leaderboard is live: robotf.ai/Mistral_7B_Leaderboard Leaderboard reports (from these videos if ...
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X
Просмотров 277Месяц назад
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X This week in the RoboTF lab: Setting the rest of the baselines for a Mistral 7B Leaderboard with CPU inference. We will test Q4,Q5,Q6,Q8, and FP16 quants and then bring the together with Q3 from the last video for our full set of baselines. Stay tuned for the leaderboard. Leaderboard is live: robotf....
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X
Просмотров 314Месяц назад
Mistral 7B LLM AI Leaderboard: Baseline Testing CPU Inference i9-9820X This week in the RoboTF lab: Setting a baseline for a Mistral 7B Leaderboard with CPU inference....more to come! Stay tuned for the leaderboard. Leaderboard is live: robotf.ai/Mistral_7B_Leaderboard Leaderboard reports (from these videos if you want a hands on look): robotf.ai/Mistral_7B_Leaderboard_Reports Model in testing:...
LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!
Просмотров 1,4 тыс.2 месяца назад
Part 2 on the topic of Distributed Inference! This week we are taking Llama 3.1 405B at a Q2 quant running 8k of context through the gauntlet with several GPUs and across Nodes in a distributed swarm of llama.cpp workers! The whole lab is getting involved in this one to run a single giant model. Both GPU Kubernetes nodes 3x 4060Ti 16GB amzn.to/3NeSEGT 6x A4500 20GB amzn.to/3TXtAYR 1x 3090 24GB ...
LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes
Просмотров 5 тыс.2 месяца назад
LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes
LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB
Просмотров 10 тыс.2 месяца назад
LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB
LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4
Просмотров 10 тыс.3 месяца назад
LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4
LocalAI LLM Testing: Can 6 Nvidia A4500's Take on the WizardLM 2 8x22b?
Просмотров 1,2 тыс.3 месяца назад
LocalAI LLM Testing: Can 6 Nvidia A4500's Take on the WizardLM 2 8x22b?
LocalAI LLM Testing: Viewer Questions using mixed GPUs, and what is Tensor Splitting AI lab session
Просмотров 2,4 тыс.3 месяца назад
LocalAI LLM Testing: Viewer Questions using mixed GPUs, and what is Tensor Splitting AI lab session
What's on the Robotf-AI Workbench Today?
Просмотров 5803 месяца назад
What's on the Robotf-AI Workbench Today?
LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500
Просмотров 9 тыс.4 месяца назад
LocalAI LLM Testing: i9 CPU vs Tesla M40 vs 4060Ti vs A4500
LocalAI Testing: Viewer Question LLM context size, & quant testing with 2x 4060 Ti's 16GB VRAM
Просмотров 1,6 тыс.4 месяца назад
LocalAI Testing: Viewer Question LLM context size, & quant testing with 2x 4060 Ti's 16GB VRAM
LocalAI LLM Single vs Multi GPU Testing scaling to 6x 4060TI 16GB GPUS
Просмотров 12 тыс.6 месяцев назад
LocalAI LLM Single vs Multi GPU Testing scaling to 6x 4060TI 16GB GPUS

Комментарии

  • @RaadClub
    @RaadClub 18 часов назад

    Hi man, Should I get a used 3090 or a new 4070 ti? Sadly 4070 ti has the half Vram so I leaning towards 3090 but what you think?

    • @RoboTFAI
      @RoboTFAI 8 часов назад

      I would look for a good deal on used 3090, it's a bit faster (should watch the leaderboard series) and can be had for cheaper than a brand new 4070 ti. And of course more VRAM!

    • @RaadClub
      @RaadClub 8 часов назад

      @@RoboTFAI Thanks for the knowledge man, I am looking for doing some Reinforcement learning. Thats why I am curious about these gpu.

  • @____________________________.x
    @____________________________.x День назад

    Just a note: sitting here with a 32" 1440P screen, and I can barely read the text you are showing

    • @RoboTFAI
      @RoboTFAI 8 часов назад

      Thanks for the feedback, recorded and best viewed in 4k - but I have tried to do much better in the newer videos. This one a bit older.

  • @animationgaming8539
    @animationgaming8539 День назад

    I liked every comment on this video!

  • @pubswork2063
    @pubswork2063 День назад

    work on parallel every thing multi lan multi optical fibres theır not that expensive but ı feel ai and lights are better fit! multi nmve on different pcies you need to map every single pcie lines to that cpu loaded up to the point that oems have use fans too cool motherboards multi rams that u do even training the main mode to use small swarmed agent models in smaller vrams something like treadreaper even the cheapest oldest has 4 time lines like 4 16 lane pcie 4090!

  • @pubswork2063
    @pubswork2063 День назад

    im working hard to run it with amds i feel youve been a miner, a miner can run gpus no mater how many. if gpu miner could. sell ai tokens! also multi agent approach can us small devices

  • @alx8439
    @alx8439 2 дня назад

    Two questions sir. Have you tried other inference backends? There's a project called Cortex cpp - a kind of alternative to ollama made by guys, who developed Jan AI. The main pro for it is that it supports TensorRT - Nvidia inference engine (and specific model format, different from ggml). These guys are claiming it's 40-60% faster, than llama cpp. They made a big comparison between post about it, running different GPUs and everywhere TensorRT was a huge win. Second question is - did you try to equip tensor parallelism?

    • @RoboTFAI
      @RoboTFAI 7 часов назад

      I have played around a bit, but not a ton as don't have much time. I do run LM Studio/Ollama/etc locally - and Local AI supports many types of backends, not just Llama (which I mainly use). VLLM, etc. I don't think llama.cpp (under LocalAI) support tensor parallelism yet, believe there is open PR/feature request. Vllm might - so answer is no I haven't.

  • @DeepThinker193
    @DeepThinker193 3 дня назад

    Looks like it'd be better to buy ram and offload the models there since it has little to no difference in speed between cpu only or 4x 4060ti's. Also would save a lot of money from expensive gpu's. Perhaps just buying one or 2 4060ti's for smaller 8b models would make more sense, but for larger 70b models it's useless. If one is super desperate and already have one 4060ti and want to load a 70b model it'd make more sense to simply use the one 4060 with the rest of the model offloaded to Ram (if you have enough ram) since you won't be missing out on anything in terms of speed compared to 3 or 4 4060ti's.

  • @animation-nation-1
    @animation-nation-1 4 дня назад

    nice. but then there is price too if its just a test lab. considering in australia a 3090 is USD $1500-$2000 in australia. and 4090 $2500 USD. so tempted to get an old tesla. but the 3090 just works in simple motherboard.

  • @hablalabiblia
    @hablalabiblia 6 дней назад

    Superb! Could you make a tutorial on how to setup and implement everything needed (SOFTWARE WISE) to achieve what you did here?

  • @Matlockization
    @Matlockization 7 дней назад

    Very interesting.

    • @RoboTFAI
      @RoboTFAI 5 дней назад

      Glad you think so!

  • @krisiluttinen
    @krisiluttinen 8 дней назад

    Can someone explain in a nutshell what this is? Is it an Ai language model like chatgpt that runs entirely offline on my own computer?

    • @RoboTFAI
      @RoboTFAI 8 дней назад

      That's exactly what it is, if talking about LocalAI (localai.io). Open source API that mimics OpenAI (ChatGPT) to run open source models.

  • @xtvst
    @xtvst 9 дней назад

    Noobie question, can you use the same 6x 4060TI 16GB for model training (as in, would it make a total of 96GB of available memory) in order to overcome memory limitations on a single GPU?

    • @RoboTFAI
      @RoboTFAI 8 дней назад

      Yep, depending on the software you are using and it's interaction with Cuda.

  • @perrymitchell7118
    @perrymitchell7118 11 дней назад

    405b

  • @Rewe4life
    @Rewe4life 11 дней назад

    I have two Tesla P40s here but I unsuccessfull in my trys on making use of both for my AI workloads. especially my stable diffusion trainings are taking very long. do you know how i could make them appear as one large gpu?

  • @HellTriX
    @HellTriX 11 дней назад

    good stuff, deserve more subs.

    • @RoboTFAI
      @RoboTFAI 11 дней назад

      Much appreciated!

  • @pedroantonioibarrafacio5641
    @pedroantonioibarrafacio5641 13 дней назад

    Could you try the amd radeon instinct series? Those are cheap and have a lot of vram

    • @RoboTFAI
      @RoboTFAI 12 дней назад

      If I get my hands on one I will

  • @jcdenton7914
    @jcdenton7914 13 дней назад

    How many shrouds and fan sizes have you tried on Tesla GPU's? I want to get a quieter run which a larger fan could theoretically do but the shroud funneling might be a source of noise so I don't know what to get for best silence.

    • @RoboTFAI
      @RoboTFAI 12 дней назад

      Hmm a few, I originally had them in a server that got retired so just some 3D printed shrouds. For bench testing I use high speed fans (very loud)....They require a good amount of air through them to keep them cool.

  •  14 дней назад

    running optiplex 7040 sff with 24 gig ddr4. i5 6700 .3.4 gig 4 cores. no gpu. i get 5 tokens per sec in ollama run llama3.1 8b --verbose, 9 tps on the new 3.2 3b. on the single test "write a 4000 word lesson on the basics of python". its usable. ollama run codstral. 22b pulled a 12 gig file. same test : results use 99%cpu 0%gpu 13 gig ram. it crawled 7 min.. 1.8 tps. it ran.

  • @tedguy2743
    @tedguy2743 14 дней назад

    Your content is what I define as Gold, much appreciate the work

    • @RoboTFAI
      @RoboTFAI 14 дней назад

      Glad to hear it! Hope you check out more of the videos.

  • @stuffinfinland
    @stuffinfinland 14 дней назад

    I was waiting big models to be ran on these :) How about 6x a770 16GB GPUS?

  • @kingkong3233
    @kingkong3233 14 дней назад

    thx

  • @Zebhie
    @Zebhie 14 дней назад

    Are you going to test Intel A770 performance?

  • @novantha1
    @novantha1 15 дней назад

    I think where this gets really interesting is not necessarily as a chatbot, but as an agent. Agents capable of running in the backround over a large quantity of time without user intervention don't really depend on latency, but depend a lot on bandwidth. In that case, it's not uncommon to have a Raspberry Pi running in the backround somewhere, and it sounds really stupid to try and run a huge model on it at first, but in a distributed setting it gets really interesting because you could imagine a swarm of other people with a similar device, and you could queue up requests such that everyone "gets their turn" on the model. It's also potentially useful for large batch requests (ie: synthetic data generation, database labeling, dataset labeling, potentially coding agents, etc) where you kind of just want it done, and as cheaply as possible, but don't really care exactly when it gets done. I could imagine some player in the space (Kobold swarm, etc) offering some measure of contribution (credits), and you could trade in slowly generated passive credits (ie: lending your SBC to the swarm) for a smaller number of fast tokens run on a GPU cluster, etc.

  • @miltostria
    @miltostria 17 дней назад

    I was watching the GPU utilization percentage and it seems that the average is around 25% for each GPU or am I wrong? Is it expected to be so or is there any configuration to utilize more GPU % during the inference?

    • @RoboTFAI
      @RoboTFAI 12 дней назад

      It's more like spreading the load, and using across the board VRAM rather than getting more speed or processing power. Also llama.cpp (under the hood here) isn't as great at multi-gpu setups as say Vllm or some others depending on your use cases/etc.

  • @shawnvines2514
    @shawnvines2514 18 дней назад

    Excellent explanation of this feature.

  • @critical-shopper
    @critical-shopper 19 дней назад

    PNY 3090 Revel response tps: Q8: 88.3 tps; FP16: 52.4 tps.

  • @critical-shopper
    @critical-shopper 19 дней назад

    Asus Strix 4090 response tps: Q8: 99.2 tps; FP16: 60.4 tps.

  • @C650101
    @C650101 20 дней назад

    I want to run big models cheaply, I use a 1080 TI now on 8b llama, fast enough but would like a reliable code assistant with bigger model. Suggestions? Can you test multiple 3060s in parallel on big model?

  • @SamJoshua-p9r
    @SamJoshua-p9r 22 дня назад

    Just to be clear, can vLLM host llms like LLaMA or Mistral and distribute queries across multiple server racks, such as three racks, each with four GPUs?

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      Sure! I have a few videos on distributed inference over a network and have helped a few viewers get it going in their labs also. LocalAI (llama.cpp under the hood) supports swarms/federation for load balancing requests or adding in pools of workers with GPUs/CPUs.

    • @SamJoshua-p9r
      @SamJoshua-p9r 22 дня назад

      @@RoboTFAI That's amazing bro. I'll go check them out. Doing the world a favor by publishing these vids.

  • @shawnvines2514
    @shawnvines2514 22 дня назад

    Excellent video. Definitely feel better now about not getting a 4090. Now it's between a 3090 and a 7900 XTX

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      Thanks! Exactly what the leaderboard was meant for, letting people decide based off feel (perception) and as close of comparison as we can with data from different hardware. Hope I am doing that justice with AI robots 🤖 🍕

    • @critical-shopper
      @critical-shopper 19 дней назад

      Don't worry too much! I’m actually getting 30-40% better performance on both my 3090 and 4090 compared to his results, so there might be some bottleneck in his setup. Both cards are amazing, but the 4090 does offer a noticeable boost if you're looking for top-tier performance!

  • @iniyan19
    @iniyan19 22 дня назад

    5090 is going to be amazing , with that gddr7 memory modules.

  • @rhadiem
    @rhadiem 22 дня назад

    If you don't need speed, the fact that the M40 24gb can run the same models as the 4090 though is impressive for the value.

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      Like I say a lot it's so much about your perception of speed, and then how much your wallet can handle 💸 - Even CPU inference can be tolerable if you set your expectations. However the 4090 will eat everything for lunch in the consumer realm at the moment....until 5090's hit of course.

  • @AkhilBehl
    @AkhilBehl 22 дня назад

    The price to performance ratio is not convincing at all. Why would I want this over say a dual 3090 setup? Or am I reading the data wrong?

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      That's why we put this leaderboard together, with data for everyone to decide on their own for dollars vs wants!

  • @rhadiem
    @rhadiem 22 дня назад

    Love my 4090, bought it at launch. Watercooled version was $2k, now selling for $2400 💀 Can't wait for the 5090. 32gb 👍. Wish it was 64.

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      Don't we all wish it was going to be 64GB! 🤖💰

  • @neponel
    @neponel 22 дня назад

    Keep cranking this out. love it.

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      Much appreciated and we hope everyone getting value out of the leaderboard data! 🍕

  • @FrankHouston-v5e
    @FrankHouston-v5e 22 дня назад

    4090 is the Godzilla of LLM and gaming GPU, it’s 30% faster than my 24Gb AMD Radeon 7950 XTX 🧐.

    • @coololplay3196
      @coololplay3196 22 дня назад

      Wait 7950xtx ???

    • @RoboTFAI
      @RoboTFAI 22 дня назад

      In so many ways! Price, size, weight, speed, etc 🦾

  • @iniyan19
    @iniyan19 22 дня назад

    If you get the chance could you run tests on intel a770 and 7900xtx . I want to know how far behind are them compared to nvidia.

  • @AndroidFerret
    @AndroidFerret 23 дня назад

    My new phone runs rocket 3b llm (~3gb) on my phone ams gives answers in under 2 seconds. I have 3.3ghz based on 4nm and with ai hardware support + 16gb ddr5 ram. I can use an offline picture generation ai which finishes a 512x512 picture with 20 steps in around 2 minutes. Thats absoluteley INSANE IMO

    • @RoboTFAI
      @RoboTFAI 23 дня назад

      It's crazy, mobile is where I always predicted small models would reign. The technology and the software are advancing at a pace I haven't seen in my career.

  • @RaadClub
    @RaadClub 23 дня назад

    mates I have a question. currently I have a 3060 ti 8gb and I want to pair it with a 4060 ti 16gb for deep learning. do you thinks it is a good chois to go with a 3060ti 8gb and a 4060 ti 12gb for deep learning?

    • @RoboTFAI
      @RoboTFAI 23 дня назад

      Sure can absolutely use mixed multiple gpus. If that's what you have available it's a great choice, else it's a matter of your budget and needs (or wants). I hope these type leaderboard tests help put some information out there for people to decide those things for themselves. 4060's are great for price/power usage/form factor - but the more expensive cards will also smack them down in TPS. But how fast do need responses? or again how fast do you want responses?

    • @RaadClub
      @RaadClub 22 дня назад

      ​@@RoboTFAI Honestly I do not want it to be super fast and based on your leader board I am ok with the 4060 ti 16gb result. The key note is i am from IRAN and due to heavily sanctions I cannot buy a new 3090 or else I would buy it and the used market are filled with overly mined 3090 that ruined the card and that is why I asked about rhe 4060. I just wanted to know if I can keep my 3060 ti 8gb and add a 4060 ti 16gb since my motherbord(TUF gaming b660m-plus) supports two gpus. another note that I have a 650w psu and I think I can sqeez the 4060 on to the system without changing the psu but I was thinking not to take any risk and buy a 750w psu. what is you recommend on the psu wattage? and the other thing is with these dual gpu if I use parallelism, can I run 30b models or I need to upgrade?

    • @RaadClub
      @RaadClub 22 дня назад

      ⁠​⁠@@RoboTFAI Sorry for the late response , man I do not want it be super fast and based on what I saw on your leaderboard I think the 4060ti alongside my current 3060ti 8gb would be a good starting combo. Im Persian and due to the heavily sanctions by the US, now I am unable to buy 3090 and the used market is very untrustworthy due to the vast mining and I though the new 4060 ti alongside my 3060 ti would be better than the used 3090 which have bing heavily mined. The other thing is do you think with parallelism, Can I run 30b models?what about some fine-tuning? The last thing that I want to ask you is that a my 650w psu would be enough to support the 3060ti and 4060ti or i need to get 750w psu?based on my research the two gpu and the rest consume about 600w.

  • @GraveUypo
    @GraveUypo 23 дня назад

    a reminder - you could get a gpu with lots of memory banks and upgrade the memory chips in to have huge capacity cards. you could for instance mod a 12gb rtx 3080 to have 24gbs

  • @GraveUypo
    @GraveUypo 23 дня назад

    this is due to extreme pci-e bandwidth bottlenecking, probably. putting a ton of gpus together without the bandwidth to push the data to them and back yields no improvement.

  • @Matlockization
    @Matlockization 24 дня назад

    Mistral is the only AI I like and trust. These AI's eat a lot of power, yet they expect us to go green ??? What do you mean by fully offloaded ?

    • @RoboTFAI
      @RoboTFAI 23 дня назад

      Fully offloaded means we are offloading all of the KV cache and the tensors to the GPU. For really low VRAM cards we do keep the KV cache on the CPU/RAM if the card can load all the tensors.

    • @Matlockization
      @Matlockization 23 дня назад

      @@RoboTFAI Thanks.

  • @ArtificialLife-GameOfficialAcc
    @ArtificialLife-GameOfficialAcc 24 дня назад

    you can undervolt the 3090 and will use around 280 watts with basically the same speed :) (only like 3% slower) (And if you use tensor cores, the core can run at 1500mhz while the tensors run the same speed and around 280 watts total)

    • @RoboTFAI
      @RoboTFAI 24 дня назад

      Yep you absolutely can! We don't power limit for these tests however - just let it eat! 🔌

  • @shawnvines2514
    @shawnvines2514 24 дня назад

    Excellent video. I've been thinking about getting a AMD 7900XTX since it has 24GB but it is less than $900 new since ROC supports interface on LM Studio. I checked out the website and didn't find any AMD GPU testing. Any thoughts?

    • @RoboTFAI
      @RoboTFAI 24 дня назад

      Not yet! I don't have any AMD cards in the lab as of right now... I do forsee a few entering the battle on the leaderboard soon. We are just ripping through what I have in the lab right now, which is too many.....but hopefully people are getting good information through them.

    • @peppernickelly
      @peppernickelly 24 дня назад

      @@RoboTFAI Excited to see how AMD's 12GB and 16GB cards can share memory like the Nvidia card.

  • @Ulvens
    @Ulvens 25 дней назад

    I'm trying to do this locally. I have two separate systems, with 3090's connected via NVlinks. It's on a 2.5gbps network, not connected to the internet. Do I need to get a 10Gbps to make this work better, or should I run them all docked in the same system, and drop the NVlinks. Eventhough the NVlinks give me 48gb Vram. My hope was to get them to network, to push 96. Sorry for the technical errors in my question, but I'm pretty noob. It's the first time I have setup a network, and first time I've used SLI/NVlink. So this might not make any sence, but I have to try. Thank you for the great video.

    • @RoboTFAI
      @RoboTFAI 24 дня назад

      Don't need 10 gig network - but it would be faster to "load" the model the first time. Network during inference is relatively low in the swarm. NVLink isn't something I have tested, as is not necessary for multi-gpu setups - however I am sure it has it's benefits, and more likely during training which is different beast than just inference.

  • @sondrax
    @sondrax 26 дней назад

    I know it’s be SLOW… but very curious what 80B FP16 (Q16) would do? As that’s what we need even if we have to wait all night for an answer (but couldn’t wait 4 nights!). 🙃

  • @bechti44
    @bechti44 27 дней назад

    around 4 Times faster than cpu only... But around 100x more expensive...

    • @RoboTFAI
      @RoboTFAI 24 дня назад

      It's just money and power! like always.... 😁

  • @moozoo2589
    @moozoo2589 29 дней назад

    I'm interested in more tests on how two or four 4060ti comparing against a single more expensive consumer card like 3090 or 4090. The benefits of larger total VRAM 2x16GB or 4x16GB against 1x24GB. Also, the graphs not showing CPU usage, is there any bottleneck from CPU? Are there benefits from more cores, or single threaded performance is crucial?

  • @moozoo2589
    @moozoo2589 29 дней назад

    The graphs are showing ~20-25% GPU usage each on all four GPUs. That also explaining 180W power draw in total, and not like 400W or so (100W per GPU). Could you please explain why it is not consuming all GPU power?

    • @spookym0f0
      @spookym0f0 20 дней назад

      It implies it's limited by memory bandwidth. In theory you can squeeze more tokens/sec out of it by running multiple requests in parallel.

    • @moozoo2589
      @moozoo2589 20 дней назад

      @@spookym0f0 I'm more inclined to think it is limited to single threaded CPU usage, resulting in tasks running sequentially on GPUs. Indeed, for some reason we don't see parallelism on CPU side, just one process at 100% usage.