Using Clusters to Boost LLMs 🚀

Поделиться
HTML-код
  • Опубликовано: 28 дек 2024

Комментарии • 197

  • @AlmorTech
    @AlmorTech 3 месяца назад +9

    Oh my, It's great that someone is making content of this depth ❤️

  • @woolfel
    @woolfel 3 месяца назад +144

    even though I could afford to get a 4080 or 4090, I refuse to pay extortion prices. Nvidia has gotten too greedy. So glad I have my M2Max with 96G to do fun ML research projects

    • @AZisk
      @AZisk  3 месяца назад +24

      will you finally be upgrading this year to M4?

    • @djayjp
      @djayjp 3 месяца назад +7

      x86 will be joining the party once Strix Halo launches.

    • @seeibe
      @seeibe 3 месяца назад +4

      I'm riding this out on my 4090 until we get some clarity where local models are going. The general trend seems to be fitting the same performance into smaller models over time.

    • @TheHardcard
      @TheHardcard 3 месяца назад +3

      @@djayjp Certain tasks - a major current task is LLM token generation - are memory bandwidth limited. Assuming the next Max Macs keep a 512-bit bus with faster memory, it will have the highest bandwidth. Strix Halo will be for the admittedly sizeable market of people who hate Macs, hate Apple, or both.
      Outside of that the upcoming Max will have the technical advantage. Can AMD undercut on price? Maybe, but not guaranteed.

    • @annraoi
      @annraoi 3 месяца назад +2

      With AMD exiting the high-end market, the cost may increase. The new 5090's will require 16-pins and a 600W from what I have read

  • @GVDub
    @GVDub 18 дней назад +2

    Just starting to mess around with the idea of using clusters for my home AI server, and hoping that the 48GB M4 Mac mini I've got coming will play nicely with my existing 64GB Ryzen 9-based mini-PC system with the 12GB RTX 3060 (hey, I'm on a budget) on an Oculink-dock. If I can get 70b models running okay with those two under Exo, it will be useful for my particular application (writing and research assistant).

  • @dave_kimura
    @dave_kimura 3 месяца назад +9

    I've tested exo before and ran into a lot of the same issues that you were experiencing and this was on a 10gbe network. I haven't tried it again after the failed attempts, but I do think that this kind of clustering could be very powerful with even smaller models. If it supports being able to handle multiple concurrent requests and exo acts as a "load balancer" for the requests, then you could have one entry point into a much larger capacity network of machines running inferences. This is opposed to trying to find your own load balancing mechanism (maybe HAPROXY) to balance the load, but then you still have the issue of orchestrating each machine to download and run the requested model.

    • @kevin.malone
      @kevin.malone 19 дней назад

      You can cluster Mac minis using thunderbolt 5, which gives you 80Gb. That's supposed to give ~30 tokens per second on a 4-bit quantized 70b param model.

  • @Adriatic
    @Adriatic 3 месяца назад +13

    Good day to you :) thanks for the content.

  • @danwaterloo3549
    @danwaterloo3549 3 месяца назад +1

    The idea is super cool, I’d love to be able to use multiple computers to accomplish more than what is possible with just one computer. The idea seems to me that it addresses the issue of expensive graphic cards… which is probably the next best alternative… a ‘modeling host’ with powerful graphics cards being available over the network to smaller ‘terminals’

  • @Z-add
    @Z-add 2 месяца назад +2

    You should investigate and do more videos with this cluster llms

  • @WireHedd
    @WireHedd Месяц назад

    Call back to the good old days of Beowulf clusters for unix. I picked up 5 old HP mini PCs with an intel 6 core cpu, 1TB NVMe and 64GB of ram in each. These are all on my 10GB in house ethernet so I'll give it a go and let you know. Great video, thanks.

  • @danielserranotorres4230
    @danielserranotorres4230 2 месяца назад

    You could try to run the tool in docker containers a with one shared storage over network for the model. That would help with the disk space issues.

  • @stefanodicecco3948
    @stefanodicecco3948 3 месяца назад +1

    a great project and idea, maybe the next step could be the addition of shared memory from the cloud

  • @WujoFefer
    @WujoFefer 3 месяца назад +6

    That's why I'm waiting for Mac Studio with M4Max/Ultra. 256GB for big models with good SoC soon will be esential... or already is...
    Anyway... An iOS dev I'm using 20-40b models, they are heavy but not too much, they can respond in resonable time and they are not using 50GB+

  • @RunForPeace-hk1cu
    @RunForPeace-hk1cu 3 месяца назад +11

    Use a NAS ... wouldn't have to download multiple times ... and point to a shared directory.

    • @AZisk
      @AZisk  3 месяца назад +5

      🤔

    • @AtlasBit
      @AtlasBit 3 месяца назад +3

      Good idea and also RAID on SSDs to boost performance and then computes as a cluster

    • @RunForPeace-hk1cu
      @RunForPeace-hk1cu 3 месяца назад

      @@AZisk create a shared directory for your huggingface hub directory that points to the shared NAS directory.

    • @Artificial.Unintelligence
      @Artificial.Unintelligence 2 месяца назад

      @@AZisk can you try a comparison of hardwired vs WiFi? Like the above comment, what about involving a NAS?
      What about all the exact same computers vs 1 PC of the same power vs multiple PCs with varied power?.. so we can see where the benefits actually come from?
      I suspect a significant portion of your delays are from networking and waiting on the slower and maxed out PCs to catch up and assist the powerful one. BUT the benefit here is offloading a giant model that wouldn't fit on a single machine.. there's no way networking is going to be faster with tokens per sec vs a CPU/GPU/RAM all in the same system.
      SO where do you gain performance, where are the diminishing returns? Can you use like 2-3 low power mini PCs like the new Intel and AMD mobile chips about to hit and actually do better at-scale with 1 bigger PC that can just barely handle a big request on its own? Because each of the small PCs can also still do smaller things on their own running tasks in parallel but pair up for big tasks? A single PC will only be able to do one task at a time regardless?
      Lots of questions that can be tested here and going cheap but many vs expensive single devices.

  • @Rushil69420
    @Rushil69420 3 месяца назад +11

    Would thunderbolt networking speed up the cluster at all? Are they just communicating over wifi?

    • @AZisk
      @AZisk  3 месяца назад +4

      I only tried wifi. he might be using tb

    • @alexcheema6270
      @alexcheema6270 3 месяца назад +4

      @@AZisk you can use tb too!

    • @Zaf9670
      @Zaf9670 3 месяца назад +3

      WiFi definitely would be a latency and throughput bottleneck. Thunderbolt may take some extra cycles of the CPU but the throughput increases certainly won’t hurt. Not sure how well TB does on latency but I’m sure it is better than WiFi unless there is heavy protocol inherited latency.

    • @acasualviewer5861
      @acasualviewer5861 3 месяца назад +1

      @@Zaf9670 really? if the model is only running certain layers, then the only communication you're getting is the of the context size. So if 1024 then it's 1024 x 768 numbers.
      I think a much bigger factor is the immense number of matrix multiplications. That's what is slowest.
      Unfortunately distributing the model this way, you're only as fast as your slowest node.

    • @zhanglance557
      @zhanglance557 Месяц назад

      @@acasualviewer5861 as fast as your slowest node, that's bad

  • @cyberdeth8427
    @cyberdeth8427 2 месяца назад +3

    this is a good start but the problem is still that it's trying to load the entire model on each machine. A better solution would be to share the model across machines and access it in a bittorrent type of style. Not sure how that will work though.

    • @AZisk
      @AZisk  2 месяца назад +2

      might have to try

  • @ItsBullyMaguire
    @ItsBullyMaguire 3 месяца назад +32

    Try 10 mini pc all with 96gb

    • @digital321
      @digital321 3 месяца назад +4

      Anyone with a Raspberry Pi cluster could have some fun, although the mini pc with the extra ram would be more cost effective

    • @aatef.tasneem
      @aatef.tasneem 3 месяца назад +3

      i tried it with Nvidia Jetson NANO Cluster, results are amazing,
      I tried other similar options i.e. Raspberry AI Kit, Google Corel, in comparison to Nvidia Jetson Nano do not even stand a chance.

    • @christianweyer74
      @christianweyer74 3 месяца назад

      @@aatef.tasneem Very interesting. Do you happen to have the exact specs of your setup to share with us?

  • @litengut
    @litengut 3 месяца назад +31

    It's for people who have two MacBook Pro with 256gb of ram on a plain

  • @Manuel-o7g
    @Manuel-o7g 3 месяца назад +18

    Cool! Everyone that makes AI models more convenient and accessible is a hero in my book (that includes you, Alex). Currently I'm running the smaller mistral model on my M2 macbook air base model. I am considering to buy a mac mini or mac studio when the new ones come out, and this might be what I need to run the larger models. Mistral is great, but I want to use it in combination with fabric and for that it just does not cut it. Keep it up Alex, you make me look smarter at work with every video ;)

  • @dougall1687
    @dougall1687 2 месяца назад +1

    I realize this may be a major leap in complexity, but would you consider a couple of videos on customizing LLM models to introduce local content?

  • @thesecristan5905
    @thesecristan5905 2 месяца назад +1

    Hi Alex,
    Very nice video, but I had to smile a bit because of the test setup.
    I have a cluster running at a customer, but for a different application and this technology can really bring a lot in the area of performance and failover. I am enthusiastic when cluster computing becomes generally more available and usable.
    It is very important to build a high-performance, dedicated network for cluster communication. With Macs, this is quite easily possible via Thunderbolt bridge. I recommend assigning the network addresses manually and separate the subnet from the normal network.
    With 40 Gbit/s you have something at hand that otherwise causes a lot of work and costs. (apart from the expensive cables.)
    Of course, it is better if all cluster nodes work with comparable hardware, which simplifies the load distribution, but in generally different machines are possible.
    In your case, unfortunately, a base air, which in itself hardly can handle the application, is more of a brake pad than an acceleration, as you impressively showed.
    A test with two powerful Macs would be interesting.

  • @thomasmitchell2514
    @thomasmitchell2514 2 месяца назад +1

    lol when he was looking for the safetensors i was thinking “please be in HF cache, please be in HF cache” and of course, this Alex fellow is wonderful. Means this will be simple to drop in current workflows. 405b should fit well across 4 Mac studios with 192GB 👌 next question will be if it can distribute fine-tuning

  • @houssemouerghie6036
    @houssemouerghie6036 3 месяца назад +1

    This is so cool but i think if there is a possibility to get multiple VPSs and connect them then run the model on them it would be cooler

  • @DaveEtchells
    @DaveEtchells 3 месяца назад

    Naively, I’m surprised this would work without an even bigger hit on net performance than you found: I’d think that partitioning the model across machines would be tricky: You somehow split the weights, then calculate the two halves (or multiple shards) of the matrix math separately?

  • @Peter-rm7io
    @Peter-rm7io Месяц назад

    I think you need thunderbolt bridge between the different machines to ensure low latency and speed.

  • @The_Collective_I
    @The_Collective_I 2 месяца назад

    12:40 - I can tell you easy, we have three MacBooks at home, each with 128GB and three iPad Pros with M4, each with 16GB, and two beefy windows machines with 4090.
    That’s altogether 500GB of VRAM that’s going to power my AGI for free.

  • @LewisCowles
    @LewisCowles 2 месяца назад

    If you could simplify and extract some of the connections, you might be able to make a grid. But you'd wind up powering a lot of hardware.

  • @seneschal6526
    @seneschal6526 2 месяца назад

    So I think like with RAM it may be running at the slowest common denominator. Just because you put two sticks of RAM together does not mean you get one running at 5600 and one running at 4800 even though they're the same 16 gig they will opt to work together at the slowest speed available to both, kind of like your motherboard communicating with CPU and RAM everything will slow itself down to operate simultaneous functions to the lowest common denominator.

  • @brennan123
    @brennan123 Месяц назад

    Thank you. I was struggling to find where the models were located as well. Really annoying that it is not documented and they make it so hard to find. Yeah, don't mind me, just dumping 100+ GB, don't worry about it, you don't need to know where it's at... lol

  • @psychurch
    @psychurch Месяц назад +1

    I wonder if you could share the models and network via thunderbolt

  • @allansh828
    @allansh828 Месяц назад +1

    Imagine what a cluster of new Mac Studio M4 Ultra 512GB can do. They would beat Blackwell compute cards.

  • @samre3006
    @samre3006 24 дня назад

    Its high time that models are stored in network storage so they are shared by all the machines.

    • @AZisk
      @AZisk  24 дня назад

      yep: example: Running LLM Clusters on ALL THIS 🚀
      ruclips.net/video/uuRkRmM9XMc/видео.html

  • @StraussBR
    @StraussBR Месяц назад

    Just the tought of comparing the bandwidth of your ram with the network overhead of fitting a model across 2 machines is depressing

  • @SnowDrift-bh7wb
    @SnowDrift-bh7wb 3 месяца назад

    Kinda reminds me swam intelligence. Bunch of devs sitting together, all sharing some of the compute power of their PCs, forming a clustered AI that serves all and as a whole has more performance and is smarter than simply the sum of each individual PC.

  • @gunnarfernqvist4896
    @gunnarfernqvist4896 2 месяца назад

    Interesting project. It would be interesting to see a video where you explain fairly simple about these models. Now you mention numbers like memory and tokens/s and X number of parameters. Can you please explain to us not so in to LLMs?

  • @billlodhia5640
    @billlodhia5640 Месяц назад

    Reverse proxy caching, proxy caching, and rsync will easily solve the downloading issues; download once and distribute locally at high speed

  • @5pm_Hazyblue
    @5pm_Hazyblue 9 дней назад +1

    I know a use case. College kids gather their macbooks together and forge essays.

  • @the_other_ones1904
    @the_other_ones1904 2 месяца назад

    This would a be great idea to try with all my Raspberry Pi's which are collecting dust on my shelf.. I wonder how old Pi's could be used.

  • @matteolulli2654
    @matteolulli2654 2 месяца назад

    Very nice video. However, I think that connecting the computers in a wired local network using thunderbolt cables should provide some improvement

  • @gool54
    @gool54 3 месяца назад +2

    Try maybe Meta Llama 3.2 light model

  • @_hmh
    @_hmh Месяц назад

    I think this would scale out better with equally sized computers and a fast network connection (10GbE).

  • @piratestreasure2009
    @piratestreasure2009 2 месяца назад

    you can find out where the model files are saved using dtrace

  • @StevenAkinyemi
    @StevenAkinyemi 3 месяца назад

    this setup helps if you want your own little farm without selling your soul

  • @calmsimon
    @calmsimon Месяц назад

    thanks for this video... was about to go spin some shi up myself lol

  • @dessatel
    @dessatel 3 месяца назад +1

    Supposedly you can add Nvidia or Linux etc with like tinigard as backend for ex0

  • @ClementOngera
    @ClementOngera 3 месяца назад

    The best use case would be a small - medium corporation, a retail chain, a corporation, or a learning institution that is looking to have it's data trained. Heck, If I had our farm's data, I would gladly run that model.

    • @nmstoker
      @nmstoker 2 месяца назад

      Except that EXO as currently setup is for inference only, not for training. For training you'd need a big server (on prem or cloud)

  • @Rinat-p7f
    @Rinat-p7f 2 месяца назад +1

    Is it possible to run it with Mac plus windows or Linux machine in one cluster?

  • @RocktCityTim
    @RocktCityTim 3 месяца назад +6

    It's a great solution for a small business getting into the ML/AI realm but keeping their research in-house. Scrub the Macs and go for some lower-cost gaming PCs. Install a base model of Linux and kick off 3-4 nodes. Under $5K and amazing solution.

  • @garynagle3093
    @garynagle3093 3 месяца назад

    Pretty cool to see this

  • @allanmaclean
    @allanmaclean 3 месяца назад +3

    I design air gapped AI inference systems, I do my initial tests on 30x Raspberry Pi’s to focus on efficiency. Obviously dedicated GPU memory is not possible. Maybe this teamed with the about to be announce M4 Mac Mini will be the next evolution. Also derisks getting a £thousands bill by accident on a cloud based test lab.

  • @ScottLahteine
    @ScottLahteine 2 месяца назад

    It’s also right to wonder “who is distcc for?” and more importantly, can we get a generalized cluster architecture for modern computers so that every large application can take advantage of spare hardware? This could lead to some very large clusters organized by any group that needs it. Of course, it would undercut AWS and no one wants that! Meanwhile, watch those huggingface cache folders. They do get very large and should be cleaned and purged frequently.

  • @RandyAugustus
    @RandyAugustus 2 месяца назад

    Use case is simple.
    1. 2+ docker containers with models installed; eg mistral
    2. Put containers on same docker network.
    3.????
    4. Profit.

  • @quadcom
    @quadcom 2 месяца назад

    We're all the laptops connected via WiFi or hardwired?
    Firewalls blocking coms between the systems?

  • @yoSunshineyo
    @yoSunshineyo 2 месяца назад

    You construct is a classic example of a bottleneck! The request enters a pool of resources, where its parts are divided across three instances, each waiting for the others to complete. Imagine three people are meeting up: one takes a rocket, another a speedboat, and the last one rides a bicycle. Sure, two of them will arrive quickly, but for the meeting to happen, all three need to be there. So, everyone ends up waiting for the one on the bicycle.

  • @burgerbee5169
    @burgerbee5169 2 месяца назад

    Would be very interesting if you could try a 70B model on a new laptop /mini pc with 128GB RAM and the new Intel Core Ultra 7 (2nd gen) Processor 256V / 266V running Linux and llama.cpp (compiled with AVX512 and SYCL). I don't know it there are any 128GB laptops out in the wild with Core Ultra 7 (2nd gen) yet.

  • @MihiroParmar
    @MihiroParmar Месяц назад

    maybe limited by the networking switch and ports maybe go with 10gig

  • @monkeyfish227
    @monkeyfish227 3 месяца назад

    OMG you got patience. Cool? Yes cool.

  • @aatef.tasneem
    @aatef.tasneem 3 месяца назад +1

    I am an old follower since your sub numbers were in 4 digits.
    a comparison of "Nvidia: Jetson Nano" and likes of it would be opening a lot of more portals of possibilities.

  • @TheBadFred
    @TheBadFred 3 месяца назад

    What about 10 maxed out Raspberry Pis with NPU card and ssd in a cluster?

  • @fsfaysalcse
    @fsfaysalcse 3 месяца назад

    Alex, I like your T-shirt where did you get it?

  • @abrahamsimonramirez2933
    @abrahamsimonramirez2933 3 месяца назад +1

    Interesting, I guess faster networking cables/ports and faster hard drives could help.

  • @KCM25NJL
    @KCM25NJL 3 месяца назад +1

    Hmmmm, wonder if inference would be stable enough if getting 10 of my "Gamer pals" on a VPN and running Exo across zee Interweb?

    • @alexcheema6270
      @alexcheema6270 3 месяца назад +1

      I'm adding support for invite links where you can invite friends to join your exo network

  • @ranjitmandal1612
    @ranjitmandal1612 3 месяца назад

    Very cool 👏

  • @geforce5591
    @geforce5591 3 месяца назад +5

    AMD Strix Halo APUs with 256gb RAM to the rescue in 2025. Won't have to pay Apple tax and can upgrade SSDs for a fraction of the price without having to resort to de-soldering NAND chips like on MacBook pros or spending $4000 for an 8TB SSD.

  • @rafaeldomenikos5978
    @rafaeldomenikos5978 3 месяца назад +1

    I actually have an m3 max 64GB and an m2 air 8gb. I am so intrigued by this! If it works I can set it up with my studio in the office with an m2 ultra and 192gb! Now that’ll be a lot of Ram. Maybe 405b quantized ?😂

  • @HaydonRyan
    @HaydonRyan 3 месяца назад

    Does this work on Linux with cpu only? People with beefy home labs might REALLY enjoy it. :)

  • @EcomGraduates
    @EcomGraduates 3 месяца назад

    I was literally just researching to see if anyone’s had done this yet!

  • @MrKim-pt2vm
    @MrKim-pt2vm 3 месяца назад +1

    Try use Llama 3.2 90B on MAC STUDIO M2 ULTRA

  • @kakaaika3302
    @kakaaika3302 2 месяца назад

    this project could be run on the thunderbolt bridge, I think that should be more reliable

  • @Nathan15038
    @Nathan15038 19 дней назад

    Oh man, I only have one of those MacBook Pros with the highest M3, Max processor and ram configuration😅

  • @ottoneff
    @ottoneff 3 месяца назад

    Have you seen Qualcomm Snapdragon Dev Kit for Windows Teardown (2024) from Jeff Geerling... Hopefully LLMs going to work with NPUs LMStudio soon

    • @AZisk
      @AZisk  3 месяца назад

      still waiting for mine

  • @JanBadertscher
    @JanBadertscher 3 месяца назад +1

    Petals did P2P LLM inference 3 years ago and it lead nowhere. Memory bandwidth constrains makes this inefficient. You trade off too much speed for this.
    If you take the efforts to put together 4 machines, you get way way more VRAM per Dollar and speed buying 2x A16 with 64 GB vram each for 3.3k so you get 128GB vram for 6.6k. Or you could do with some RTX cards.
    Also, I was laughing hearing "production ready". Such projects are barely ever production ready, even ollama isn't :) Got tons of problems when trying to use inference solutions like that with our clients.

  • @geofftsjy
    @geofftsjy 3 месяца назад

    They should upgrade the cluster join automation to peer2peer transfer the model the the new nodes if they don't have the model. No reason to go to the WAN over and over.

  • @anshulpathak01
    @anshulpathak01 12 дней назад

    you should've connected the macs with Thunderbolt 4 cables instead of the wireless network...

  • @tudoriustin22
    @tudoriustin22 2 месяца назад

    Love this experiment. Been following your channel for the past 2 years since I got into ML and I own a M2 Max Mac Studio with 32GB Unified Memory that I used so far for ML, happy with it but also waiting for the M4 Max Macbook Pro so I can finally get a portable powerhouse. Saved up for a whole year for the upgrade and I’m planing on getting the 8TB SSD and 128GB UNified Memory version maxed out for the 16” Model, or maybe more unified memory if they add it on the M4 Max. Benchmarks so far for the leaked models fron the russian youtubers seem so good as an estimate for performance but I cant wait to see hte new ones coming out soon.

  • @hermanthotan
    @hermanthotan 3 месяца назад

    This will be powerful when run cluster on Intel Mini PC with 96RAM.

  • @tutran-b4i
    @tutran-b4i 2 месяца назад

    Hi man, can you compare the AMD and NVIDIA card when run ollama, something like amd 7800xt vs 4060ti. Thanks

  • @vipuljain5683
    @vipuljain5683 2 месяца назад

    Anything similar for Windows thats doesn't rely on vram of gpu??

  • @MrSparc
    @MrSparc 3 месяца назад +1

    So many AI hype people talk or show it using models with llama-cpp, etc but just few prompt questions or a simple toy code. Nobody show real implementation of this AI LLM models integrated in a real project. Alex I would like to see videos of examples in your projects where integrated AI models and the added value that brings to your software.

  • @DJWESG1
    @DJWESG1 3 месяца назад

    Use case.. individualised , personalised , aligned assistants.

  • @fatherfoxstrongpaw8968
    @fatherfoxstrongpaw8968 3 месяца назад +1

    i did a project like this back in 2004 using a beowolf cluster with 9 apple 2's an amd pc, an intel/nvidia pc and an acer/intel laptop. the 2 biggest bottlenecks were the macs and the 10mbit networking, but it was a good proof of concept. in my experience, any time you cluster, your bottlenecked by your slowest component. ya you can do it, but it's better for like vm's and lots of small individual programs. not to mention whatever software your running has to be written or modified to take advantage of the distributed hardware. just because you could, doesn't mean you should. the R.O.I. just isn't there.

  • @TazzSmk
    @TazzSmk 3 месяца назад

    pair of 3090's (or three 16GB 4060Ti's) can run 70B models, reasonable compromise imo

  • @zhouyangbo4498
    @zhouyangbo4498 2 месяца назад

    a group M1,M2,M3 owner to build a cluster llm training with group studying

  • @JonCaraveo
    @JonCaraveo 3 месяца назад

    😅😋 this sounds fun 😊

  • @Ukuraina-cs6su
    @Ukuraina-cs6su 3 месяца назад

    I believe the one who has fast and reliable cable networking)))
    I don't know if they did, but it would be logical not to download models from the internet every time; once one machine has downloaded the model, it can serve the model to others. Even better, you don't need multiple copies of the model in the same network; it can be a single fast network drive.

  • @giridharpavan1592
    @giridharpavan1592 Месяц назад

    this was instructional

  • @Kitsune_Dev
    @Kitsune_Dev 3 месяца назад

    can you review Mini pcs? i want to know if i can run LLMs on my SER 5 Max 😂

  • @nickvangeel
    @nickvangeel 3 месяца назад

    What internet do you have to download nearly 20 MB a second ?? (fiber ?)

    • @AZisk
      @AZisk  3 месяца назад +1

      yes

    • @nickvangeel
      @nickvangeel 3 месяца назад

      @@AZisk 1, 5 or 10 Gbit down ?

    • @fevad1246
      @fevad1246 3 месяца назад

      @Garrus-w2h bro please stop flexing internet speed I can't even get more than 2 megabits(yes not even megabyte) per second 😭😭

  • @spacedavid
    @spacedavid 2 месяца назад

    Something tells me that 128gb of RAM will be in my next build as minimum.

  • @one_step_sideways
    @one_step_sideways Месяц назад

    If only there were any Strix Point laptops with 4 RAM sticks... That would be 192GB memory with 4x48GB sticks. Then running things on a budget would be achievable

  • @user-cw7jy9zr3z
    @user-cw7jy9zr3z 3 месяца назад +1

    Use case is a dev team

  • @swastikgorai2332
    @swastikgorai2332 2 месяца назад

    Hey, wanna give UV a try?

  • @jetman-x4e
    @jetman-x4e 3 месяца назад

    So cool

  • @nyambe
    @nyambe 3 месяца назад

    Is that the fx3 or fx30?

    • @AZisk
      @AZisk  3 месяца назад +1

      fx30

    • @nyambe
      @nyambe 3 месяца назад

      @@AZisk I have one also, fantastic lilttle camera

    • @AZisk
      @AZisk  3 месяца назад +1

      @@nyambe Yeah I like the instant access to ISO and Aperture controls. But the battery drains so fast!

  • @matej_hajek
    @matej_hajek Месяц назад

    nice vid, would recomend not mounting your camera to the table, when you touch the table the camera moves.

    • @AZisk
      @AZisk  Месяц назад

      yeah, space is limited otherwise i would love to have a nice tripod

  • @СергейПластунов
    @СергейПластунов 3 месяца назад

    Maybe it will be better to use ethernet (wires) to deploy cluster? WiFi has huge latency compared to ethernet. This could be a bottleneck in your test.

    • @vasilioshatciliamis2067
      @vasilioshatciliamis2067 Месяц назад

      That is probably the problem with the very low speed. I am surprised he didn't think of it himself.

  • @weeee733
    @weeee733 3 месяца назад +3

    Wow

  • @keepasskeep5322
    @keepasskeep5322 Месяц назад

    the idea is that:
    - make it p2p and block based.
    - to use it you need to allocate resources in idle time of android, ios, mac, pc, server.
    - exo ai cluster is a starting line i think.
    to rescue ai from big corps the people in anarchist community must train ai and intference ai in peoples customer grade devices.
    this is the only way to salvation.

    • @keepasskeep5322
      @keepasskeep5322 Месяц назад

      and im wrong badly. ai needs morecomputation than storage. and current bittorrent and blockchain technology is about the space not speed.

  • @aerotheory
    @aerotheory 3 месяца назад

    The bottleneck is the bandwidth.

  • @kamurashev
    @kamurashev Месяц назад

    Let’s think clustering the machines up. What are the solutions out there, kubernetes? Any ideas?

  • @AureliusRosetti
    @AureliusRosetti 3 месяца назад

    Hi there. first of all, great video, as always, many thanks for efford, appreciate it, really 😊. Now to the EXO, I can image huge on-prem data centre of … for instance … certain automaker r&d department running this across lets say 5 servers with 256GB RAM+2 highend GPUs each interconnected with high throughput LAN and connected to another internal vectordb cluster to enrich generated answers. This way you can easilly utilize all the advantages that modern LLM models provides without sharing even a tiny bit of data with vendors like OpenAI. Another case would be classified environments, where you aren’t connected to the internet at all. And don’t forget, it is not only about chating. Your can right our of the box integrate them also to the langchain powered applications. Such cluster projects should be in my opinion very good in distributing multiple requests across itself as well. I'm keeping my fingers crossed this project make it through to be stable.

  • @dev15652
    @dev15652 2 месяца назад

    what if we can do this with public IP address? we can setup a community to share the load... like seeding a torrent.. free unlimited AI for anyone...