1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

Поделиться
HTML-код
  • Опубликовано: 5 фев 2025
  • To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/... . You’ll also get 20% off an annual premium subscription.
    Mixture of Experts explained, well, re-explained. We are in the Fine-Grain era of Mixture of Experts and it's about to get even more interesting as we further scale it up.
    This video was sponsored by Brilliant
    Check out my newsletter:
    mail.bycloud.ai
    Special thanks to LDJ for helping me with this video
    Mixtral 8x7B Paper
    [Paper] arxiv.org/abs/...
    Sparse MoE (2017)
    [Paper] arxiv.org/abs/...
    Adaptive Mixtures of Local Experts (1991)
    [Paper] direct.mit.edu...
    Gshard
    [Paper] arxiv.org/pdf/...
    Branch-Train Mix
    [Paper] arxiv.org/pdf/...
    DeepSeek-MoE
    [Paper] arxiv.org/abs/...
    MoWE (from the meme at 7:51)
    [Paper] arxiv.org/abs/...
    Mixture of A Million Experts
    [Paper] web3.arxiv.org...
    This video is supported by the kind Patrons & RUclips Members:
    🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
    [Discord] / discord
    [Twitter] / bycloudai
    [Patreon] / bycloud
    [Music] massobeats - daydream
    [Profile & Banner Art] / pygm7
    [Video Editor] ‪@Askejm‬

Комментарии • 167

  • @bycloudAI
    @bycloudAI  6 месяцев назад +10

    To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription!
    Like this comment if you wanna see more MoE related content, I have quite a good list for a video;)

    • @erobusblack4856
      @erobusblack4856 6 месяцев назад +1

      You should do a video on virtual humans and cognitive AI.Virtual humans. Look at all the nonplayer character technology.We have a red dead redemption and the Sims Though a chat bottom one of those and we have a great virtual human

    • @PankajDoharey
      @PankajDoharey 6 месяцев назад

      Thanks for linking to all papers in the description.

  • @pro100gameryt8
    @pro100gameryt8 6 месяцев назад +379

    Imagine assembling 1 milliont PhD students together to discuss someone's request like "write a poem about cooking eggs with c++". Thats MoE irl

    • @MrPicklesAndTea
      @MrPicklesAndTea 6 месяцев назад +21

      i'm tellin chatgpt this now.

    • @tommasocanova4547
      @tommasocanova4547 6 месяцев назад +1

      Enjoy:
      In a kitchen lit by screens,
      Where code and cuisine intertwine,
      A programmer dreams of breakfast scenes,
      With a syntax so divine.
      Int main() begins the day,
      With ingredients lined up neat.
      Eggs and spices on display,
      Ready for a code-gourmet feat.
      int eggs = 2; // Declare the count,
      Double click on the pan.
      Heat it up, and don’t discount,
      Precision’s the plan.
      std::cout

    • @ElaraArale
      @ElaraArale 6 месяцев назад +3

      hahahahaha LMAO

    • @igorsawicki4905
      @igorsawicki4905 6 месяцев назад +7

      AI: Resonable request sir

    • @zipengli4243
      @zipengli4243 6 месяцев назад +25

      And MoME is getting 1 million 5th graders to teach a baby to PhD level only on how to write a poem about cooking eggs with c++

  • @GeoMeridium
    @GeoMeridium 6 месяцев назад +35

    It's crazy how Meta's 8B parameter Llama 3 model has nearly the same performance as the original GPT-4 with 1.8T parameters.
    That's a 225x reduction in compute in just 2 years.

    • @photoniccannon2117
      @photoniccannon2117 19 дней назад +1

      Llama is fantastic. Not as good as GPT for code, but absolutely excellent in everything else.

  • @gemstone7818
    @gemstone7818 6 месяцев назад +154

    to some extent this seems closer to how brains work

    • @tushargupta9428
      @tushargupta9428 6 месяцев назад +10

      neurons

    • @redthunder6183
      @redthunder6183 6 месяцев назад +22

      Yeah, kind of like how spiking networks work, but more discrete/blocky and less efficient.
      I think this concept should be applied to fundamental MLP, so you can increase model performance with out decreasing speed or RAM usage. The only sacrifice being storage which is easily scalable. IMO this is the future

    • @reinerzer5870
      @reinerzer5870 6 месяцев назад +1

      Jeff Hawkins approves this message

    • @johndoe-j7z
      @johndoe-j7z 6 месяцев назад +10

      I think this is how almost any informational system works. From molecules to galaxies, there are specialized units that use and process information individually in the system. An agentic expert approach was long forthcoming and is certainly the future of AI. Even individual ants have specialized jobs in the colony.

    • @tempname8263
      @tempname8263 6 месяцев назад +2

      @@johndoe-j7z That's how perceptrons worked right from the start

  • @sorakagodess
    @sorakagodess 6 месяцев назад +48

    The only thing in my mind is "MoE moe kyuuuuun!!!"

    • @JuuzouRCS
      @JuuzouRCS 6 месяцев назад +1

      Intentional naming fr.

  • @Quantum_Nebula
    @Quantum_Nebula 6 месяцев назад +12

    Now I really am excited for a 800B model with fine-grained MoE to surface that I can run on basically any device.

    • @FuZZbaLLbee
      @FuZZbaLLbee 6 месяцев назад +6

      You would still need a lot of storage tough, but that is easier then downloading VRAM 😋

  • @AkysChannel
    @AkysChannel 6 месяцев назад +7

    These videos format is GOLD 🏆 such specific and nerdy topics produced as memes 😄

  • @randomlettersqzkebkw
    @randomlettersqzkebkw 6 месяцев назад +54

    i see what you did there with "catastrophic forgetting" lmao 🤣

    • @Askejm
      @Askejm 6 месяцев назад

      troll emoji

  • @cdkw2
    @cdkw2 6 месяцев назад +2

    I watch you so that I feel smart, it really works!

  • @shApYT
    @shApYT 6 месяцев назад +36

    Yo dog, I heard you liked AI so we put an AI inside your AI which has an AI in the AI which can AI another AI so that you can AI while you AI.

  • @Saphir__
    @Saphir__ 6 месяцев назад +33

    I watch your videos yet I have no idea what you are explaining 99% of the time. 🙃

    • @bycloudAI
      @bycloudAI  6 месяцев назад +14

      I will try better next time 😭

    • @BeauAD
      @BeauAD 6 месяцев назад +17

      ​@@bycloudAI Personally I watch your content because you elaborate on academic papers and their relevancy very well. Do hope you continue with content like this. But I can see something like a fireship style code report for LLMs being duigestable.

    • @johndoe-j7z
      @johndoe-j7z 6 месяцев назад +1

      @@bycloudAI I liked the video, but to their point, it might help to give a brief overview of what things are. I.e. parameters, feed forward, etc., the exact same way you briefly explained what hybridity and redundancy are. This is a good video if you're already familiar with LLMs and how they work but can probably be pretty confusing if you aren't.

  • @mykhailyna1
    @mykhailyna1 4 месяца назад +1

    Idk if this was intended just as entertainment, but I used it as education
    Like I needed to understand MoE/MMoE on a high level for my research and this video totally helped me. It will be easier to dive deeper into one of the papers now

  • @KCM25NJL
    @KCM25NJL 6 месяцев назад +19

    In a very real sense, the MoME concept is similar to diffusion networks. On their own, the tiny expert units are but grains of noise in an ocean of noise..... and the routing itself is the thing being trained. Whether or not it's more efficient than having a monolithic neural net with simpler computation units (neurons)........ I dunno. I suspect like most things ML, there is probably a point of diminishing return.

  • @simeonnnnn
    @simeonnnnn 6 месяцев назад +4

    Damn.. You blew my mind on the 1 million experts and Forever learning thing

  • @farechildd
    @farechildd 6 месяцев назад

    Thank u for linking the papers in the description ❤

  • @Limofeus
    @Limofeus 6 месяцев назад +1

    I'd imagine in a month someone will come with MoE responsible for choosing the best MoE to choose the best MoE out of billions of experts

  • @-mwolf
    @-mwolf 6 месяцев назад +2

    I'm telling you: Just do it like the brain. Have every expert/node be a router, choosing who to send to.

    • @-mwolf
      @-mwolf 6 месяцев назад

      And, have every node be a RL agent.

  • @soraygoularssm8669
    @soraygoularssm8669 6 месяцев назад

    Actually really cool idea, i liked the deep seek meo version too, it's so clever

  • @lazyalpaca7
    @lazyalpaca7 6 месяцев назад +8

    3:37 wasn't it just yesterday that they released their model 😭

  • @akkilla5166
    @akkilla5166 5 месяцев назад

    Thank you. I think i understand the impact of moe.

  • @j.d.4697
    @j.d.4697 6 месяцев назад

    I have no idea what you just said but I'm glad they didn't just stubbornly stick to increasing training data and nothing else, like everyone seemed to assume they would. 🙂

  • @tiagotiagot
    @tiagotiagot 6 месяцев назад +1

    How far are we from just having a virtual block of solid computronium with inference result simply being the exit points of virtual Lichtenberg figures forming thru it, with most of the volume of the block remaining dark?

    • @electroflame6188
      @electroflame6188 5 месяцев назад +2

      it's about the distance between you and the inside of your skull

  • @npc4416
    @npc4416 6 месяцев назад +1

    my go to channel to understand ai

  • @Words-.
    @Words-. 6 месяцев назад +1

    4:13 nice editing here🤣

  • @sabarinathyt
    @sabarinathyt 6 месяцев назад

    00:01 Exploring the concept of fine-grained MoE in AI expertise.
    01:35 Mixr has unique Fe-Forward Network blocks in its architecture.
    03:11 Sparse MoE method has historical roots and popularity due to successful models.
    04:46 Introducing Fine-Grained MoE method for AI model training
    06:16 Increasing experts can enhance accuracy and knowledge acquisition
    07:52 Efficient expert retrieval mechanism using pure layer technique
    09:29 Large number of experts enables lifelong learning and addresses catastrophic forgetting
    11:01 Brilliant offers interactive lessons for personal and professional growth
    Crafted by Merlin AI.

  • @RevanthMatha
    @RevanthMatha 6 месяцев назад +3

    I lost track at 8:24

  • @keri_gg
    @keri_gg 6 месяцев назад +1

    What resource is this 2:01 seems useful for teaching

  • @NIkolla13
    @NIkolla13 6 месяцев назад +12

    Mixture of a million experts just sounds like a sarcastic description of Reddit

  • @redthunder6183
    @redthunder6183 6 месяцев назад +4

    I would love a model with the performance of a 8b model with practical performance like gpt-3.5, but with much smaller active parameters so it can run on anything super lightweight.

    • @4.0.4
      @4.0.4 6 месяцев назад +6

      Current 8B beats GPT 3.5 on most metrics, we've come a long way.

    • @redthunder6183
      @redthunder6183 6 месяцев назад

      @@4.0.4 yeah, but metrics are not everything, and from my experience, gpt 3.5 still beats out llama 3 8b (or at least 8b quantized) in terms of interpolation/generalization/flexibility, meaning while it can mess up in difficult, specific or confusing tasks, it doesn't get overly lost/confused.
      metrics are good at simple well defined one-shot questions, which I'd agree it is better at

    • @4.0.4
      @4.0.4 6 месяцев назад

      @@redthunder6183 remember not to run 8b at q4 (default in ollama for example, but BAD, use q8)

    • @4.0.4
      @4.0.4 6 месяцев назад

      @@redthunder6183 true but make sure you're using 8-bit quant, not 4-bit - it matters for those small LLMs

    • @mirek190
      @mirek190 6 месяцев назад

      @@redthunder6183 llama 3 8b? That model is so outdated already ...who is even using that ancient model ....

  • @Alice_Fumo
    @Alice_Fumo 6 месяцев назад +1

    Today I saw a video about the paper "Exponentially faster Language Modelling" and I feel like the approach is just better than MoE and I wonder why not more work has been done on top of it.. (although I think it's possible thats how GPT-4o mini was made, but who knows)

  • @larssy68
    @larssy68 6 месяцев назад +2

    I Like Your Funny Words, Magic Man

  • @pathaleyguitar9763
    @pathaleyguitar9763 6 месяцев назад +1

    Was hoping someone would make a video on this! Thank you! Would love to see you cover Google's new Diffusion Augmented Agents paper.

  • @Ryu-ix8qs
    @Ryu-ix8qs 6 месяцев назад

    Great Video once again

  • @hodlgap348
    @hodlgap348 6 месяцев назад +2

    What is the source of your 3D transformation layer demonstration???? plz tell me

  • @PotatoKaboom
    @PotatoKaboom 6 месяцев назад +2

    hey where are the 3d visualisations of the transformer blocks from?

  • @anren7445
    @anren7445 6 месяцев назад

    Where did you get the clips of attention mechanism visualization from?

  • @SweetHyunho
    @SweetHyunho 6 месяцев назад +1

    1:05 Brilliant pays youtubers $20000-50000 per sponsored video!?

  • @norlesh
    @norlesh 6 месяцев назад

    What tool was used for the Transformer visualization starting at 2:01 ?

  • @vinniepeterss
    @vinniepeterss 6 месяцев назад

    great video!

  • @ChristophBackhaus
    @ChristophBackhaus 6 месяцев назад +9

    1991... We are standing on the shoulders of giants.

  • @hightidesed
    @hightidesed 6 месяцев назад

    Can you maybe make a video explaining how Llama 3.1 8B is able to have a 128k context window while still fitting in an average computers ram?

  • @marcombo01
    @marcombo01 6 месяцев назад +2

    What is the 3D animation around 1:45 ?

  • @CristianGarcia
    @CristianGarcia 6 месяцев назад

    Thanks! Incredibly useful to keep up.

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 6 месяцев назад

    Damn, you finally catching up. You should try Nemo and Megatron-LM they have the best MoE framework

  • @Originalimoc
    @Originalimoc 2 месяца назад

    My feeling of after all these methods we'll eventually back at an essentially singleton model😂😂😂

  • @jorjiang1
    @jorjiang1 5 месяцев назад

    how is the visualization in 2:01 made

  • @warsin8641
    @warsin8641 6 месяцев назад +1

    I love these rabit holes!

  • @noiJadisCailleach
    @noiJadisCailleach 6 месяцев назад +2

    So if these Millions of Experts are cute...
    Should we call them...
    Moe MoE?

  • @Zonca2
    @Zonca2 6 месяцев назад

    Ngl, I wish we got more videos about video generators making anime waifus like in the old days, but it seems like development on that front is slowing down at the moment, hopefully you'll cover any new breakthroughs in the future.

  • @Napert
    @Napert 6 месяцев назад

    if 13B is ~8gb (q4) then why does ollama load the entire 47b (26gb) model into memory?

  • @Filup
    @Filup 6 месяцев назад

    I did a semester of ML the first half of this year, and I don't understand half of what you post lmao. Do you have any recommended resources to learn from? It is very hard to learn.

  • @kendingel8941
    @kendingel8941 6 месяцев назад

    YES!!! NEW BYCLOUD VIDEO!!!

  • @MrJul12
    @MrJul12 6 месяцев назад +1

    Can you cover Deepminds recent breakthrough on winning the math olympiad? does that mean RL is the way forward when it comes to reasoning? because as of right now, as far as I know, LLM's cant actually 'reason', they are just guesing the next token, but reasoning does not work like that.

  • @marshallodom1388
    @marshallodom1388 6 месяцев назад

    You lost me when that guy pointed at the gravesite of his brother

  • @RedOneM
    @RedOneM 6 месяцев назад

    Seems as if the greatest optimisation for practical AI tech are dynamic mechanisms.
    Lifelong memory plus continuous learning would become game changers in the space.
    At this rate humanity will be able to leave machines behind, which are able to recall our biological era. At least something is able to carry on our legacy for at least hundreds of thousands of years.

  • @zergosssss
    @zergosssss 6 месяцев назад +3

    5k views after 3h is a shame, you deserve much more, go go go algorithm

  • @ricardocosta9336
    @ricardocosta9336 6 месяцев назад

    Dude! Ty❤

  • @revimfadli4666
    @revimfadli4666 5 месяцев назад

    Shared expert isolation seems to be doing something similar to the value output in duelling networks; collecting the gradients for shared information so other subnets only need to account for the small tweaks. This mean the shared infirmation is learned faster, which in turn speeds up the learning of the tweaks

  • @cvs2fan
    @cvs2fan 6 месяцев назад

    0:42
    Undrinkable water my favorite :v

  • @npc4416
    @npc4416 6 месяцев назад +1

    meanwhile meta having no moe

  • @setop123
    @setop123 6 месяцев назад

    We might be onto something here... 👀

  • @pedrogames443
    @pedrogames443 6 месяцев назад +2

    Bro is good

  • @maxpiau4004
    @maxpiau4004 6 месяцев назад

    wow, top quality video

  • @rishipandey6219
    @rishipandey6219 6 месяцев назад

    i did'nt understand anything but sounded cool

  • @cwhyber5631
    @cwhyber5631 5 месяцев назад

    Yes we need more MOM-eis 💀💀

  • @大支爺
    @大支爺 6 месяцев назад

    Best based language models (multi Languages) + LoRAs is enough.

  • @thedevo01
    @thedevo01 5 месяцев назад

    Your thumbnails are a bit too similar to Fireship

    • @Waffle_6
      @Waffle_6 5 месяцев назад

      also the entire composition of his videos, a little more than just taking inspiration lol

  • @ickorling7328
    @ickorling7328 6 месяцев назад +2

    Bro, but did you read about Lory? It merges models with soft merging building on several papers. Lory is new paint on a method developed for vision AI to make soft mergers possible for LLM's. ❤

    • @ickorling7328
      @ickorling7328 6 месяцев назад

      Whats really key about Lory is back propagation to update it's own weights, it's fine tuning itself at inference. It's also compatable with transformer or Mamba, or Mamba-2. In addition, it looks like Test Time Training could be used with all these methods for even more context awareness.

    • @myta1837
      @myta1837 3 месяца назад

      Bot

  • @realtimestatic
    @realtimestatic 6 месяцев назад +1

    The thing about life long learning really reminds me of our human brains. Basically for every different thought or key combination it sounds like its building a seperate new model with all the required experts for said task. So basically like all relevant neurons we trained working on one thought to solve it, with the possibility of changing and adding new neurons. I can't see it going well if we keep increasing the number of experts forever tho, as the expert picking will become more and more fragmented. I think being able to forget certain thing would probably be useful too.
    I'm no scientist but I really do wonder how close this comes to the actual way our brain works.

  • @nanow1990
    @nanow1990 6 месяцев назад

    Peer doesn't scale, I've tried multiple times

  • @just..someone
    @just..someone 6 месяцев назад

    it should be MMoE ... massive mixture of experts XD

  • @picklenickil
    @picklenickil 6 месяцев назад

    😂😂😂as a behavioral scientist.. i think this one is going straight to the crapper.. mark my words.😂😂😂

  • @MasamuneX
    @MasamuneX 6 месяцев назад

    wait till the use genetic programing with monty carlo tree search and UTP and other stuff on the router

  • @narpwa
    @narpwa 6 месяцев назад

    what about 1T experts

  • @Summanis
    @Summanis 6 месяцев назад

    Eat your heart out Limitless, we're making AI smarter by having them use less of their "brain" at a time

  • @koktszfung
    @koktszfung 6 месяцев назад

    more like a mixture of a million toddlers

  • @imerence6290
    @imerence6290 6 месяцев назад +1

    MoME ? Nah. MOMMY ✅🤤

  • @borb5353
    @borb5353 6 месяцев назад

    i was like
    schizophrenic AI
    but then they went further............
    anyway finally they are optimizing instead of making them bigger

  • @jondo7680
    @jondo7680 6 месяцев назад

    I'm a big fan of

  • @VincentKun
    @VincentKun 6 месяцев назад

    1 millions beer

  • @Words-.
    @Words-. 6 месяцев назад

    That million expert strategy sounds super cool. I'm not too knowledgeable, though it does seem to sound like it literally allows for a more liquid neural network by using the attention mechanism to literally pick neurons to be used. I feel like this will be the future of NNs.

  • @blockshift758
    @blockshift758 6 месяцев назад

    Ill call it Moe(moe ehh) instead of em oh ih

  • @driedpotatoes
    @driedpotatoes 6 месяцев назад

    too many cooks 🎶

  • @hglbrg
    @hglbrg 6 месяцев назад

    Oh yeah "acidentally" added something to a graph they intended to show. Not just builing hype to inflate the bubble of nothing that is this whole business?

  • @crisolivares7847
    @crisolivares7847 6 месяцев назад +1

    fireship clone

  • @OwenIngraham
    @OwenIngraham 6 месяцев назад

    always bet on owens

  • @veyselbatmaz2123
    @veyselbatmaz2123 5 месяцев назад

    Good news: Digitalism is killing capitalism. A novel perspective, first in the world! Where is capitalism going? Digitalism vs. Capitalism: The New Ecumenical World Order: The Dimensions of State in Digitalism by Veysel Batmaz is available for sale on Internet.

  • @raunakchhatwal5350
    @raunakchhatwal5350 6 месяцев назад +1

    Honestly I think your old moe video was better.

    • @OnTheThirdDay
      @OnTheThirdDay 6 месяцев назад

      I agree. Definitely more understandable and this one would be harder to follow without seeing that first.

  • @x-mishl
    @x-mishl 6 месяцев назад +1

    why all comments before this one bots???

    • @OnTheThirdDay
      @OnTheThirdDay 6 месяцев назад +1

      It's possible that it's because RUclips shaddow banned all the real comments.

    • @cesarsantos854
      @cesarsantos854 6 месяцев назад

      ​@@OnTheThirdDayBut they can't ban bots...

    • @OnTheThirdDay
      @OnTheThirdDay 6 месяцев назад

      @@cesarsantos854 I don't know why bots (and I mean, obvious bots) do not always get banned but half of my comments that I write out myself do.

    • @IllD.
      @IllD. 6 месяцев назад

      Don't see any bots 3 hours after this comment. Gj RUclips 👍

  • @UnemployMan396-xd7ov
    @UnemployMan396-xd7ov 6 месяцев назад

    I knew it, your content so mid bro has to redemp it

  • @unimposings
    @unimposings 6 месяцев назад

    Dude Wake Up, AI is just a Stupid Buzzword! There is no AI.

    • @Waffle_6
      @Waffle_6 5 месяцев назад

      ive made my own transformer model before, as shitty as it was, it sorta worked. i agree that the term “ai” is misleading as its not sentient or anything like that. its just a really fancy autocomplete generator that understands surprisingly abstract and complex connections, relations, and context. but these models are real and arent just a million indians typing you essay for you , you can download models like llama to try it out locally

  • @saltyBANDIT
    @saltyBANDIT 6 месяцев назад

    Temu fireship…oh I’ll watch it tho.

    • @OnTheThirdDay
      @OnTheThirdDay 6 месяцев назад +2

      This channel seems to go into more detail and is more AI focused.

  • @大支爺
    @大支爺 6 месяцев назад

    Its useless and waste a lot of resources.

  • @reaperzreaperz2412
    @reaperzreaperz2412 5 месяцев назад

    Tf are you talking about

  • @bluehorizon9547
    @bluehorizon9547 6 месяцев назад +3

    Using MoE is an admission of failure. It means that they are unable to make a "smarter" model and have to rely on arbitrary gimmics.

    • @zrakonthekrakon494
      @zrakonthekrakon494 6 месяцев назад +13

      Not really, they are testing if it makes models smarter without having to do much more work

    • @a_soulspark
      @a_soulspark 6 месяцев назад +4

      I don't see it as a problem. if you think about it, all things in machine learning are just arbitrary gimmicks that happen to work out

    • @bluehorizon9547
      @bluehorizon9547 6 месяцев назад +2

      @@a_soulspark As a human if you understand N new disciplines you become N^2 more powerful because you can apply ideas from one field to any another. This is why you want a monolith not MoE. They chose MoE because they run into the wall, they can't improve the fundamentals so they have to use add-hoc measures just to boost the numbers.

    • @francisco444
      @francisco444 6 месяцев назад +6

      RLHF seems gimmicky but it worked. MoE might seem gimmicky, but it works. Multimodality might seems gimmicky but it works.

    • @bluehorizon9547
      @bluehorizon9547 6 месяцев назад

      @@zrakonthekrakon494 Nobody would even bother with MoE if they hadn't run into the wall. They did.

  • @jvf890
    @jvf890 6 месяцев назад

    MOE onichan

    • @BackTiVi
      @BackTiVi 6 месяцев назад +5

      We needed someone to say this, so thank you for sacrificing your dignity for us.

    • @a_soulspark
      @a_soulspark 6 месяцев назад +1

      get ready to call your MOME as well now

  • @x-mishl
    @x-mishl 6 месяцев назад +1

    bro fell off

    • @635574
      @635574 6 месяцев назад +7

      Becauae theres no bot comments after 24m?

  • @DhrubaPatra1611
    @DhrubaPatra1611 6 месяцев назад +1

    This channel is nice copy of fireship

  • @soraygoularssm8669
    @soraygoularssm8669 6 месяцев назад

    Actually really cool idea, i liked the deep seek meo version too, it's so clever

  • @soraygoularssm8669
    @soraygoularssm8669 6 месяцев назад

    Actually really cool idea, i liked the deep seek meo version too, it's so clever