Breaking Down Meta's Billion Dollar LLM Blueprint [Llama-3.1 Full Breakdown]

Поделиться
HTML-код
  • Опубликовано: 20 ноя 2024

Комментарии • 124

  • @bycloudAI
    @bycloudAI  2 месяца назад +10

    Try out Poe now and save your $$ on multi-subscriptions! quora.1stcollab.com/bycloudai
    and probs no more 20 mins vid from me it's literally death itself to record it

    • @ibrahimhalouane8130
      @ibrahimhalouane8130 2 месяца назад +3

      The url is wrong.

    • @mmmm768
      @mmmm768 2 месяца назад +3

      The url is wrong.

    • @siliconhawk
      @siliconhawk 2 месяца назад +1

      I **thought** it was a path of exile sponsor.
      I was yeah i guess the people here have good gpu but this a weird community overlap lol

    • @liuzeyu
      @liuzeyu 2 месяца назад +1

      how many takes do you normally need to record the full 20 mins?

    • @TheSuperiorQuickscoper
      @TheSuperiorQuickscoper 2 месяца назад +1

      I tried Poe out and there's quite a bit I don't like about it:
      -The points system and recent increases in point costs
      -Privacy policy states they collect all your prompt data and you can't opt out, that violates GDPR.
      -It's built by Quora, which is a sketchy company in its own right
      And now they're sponsoring big YTers in the AI space? Honestly, Poe is giving me BetterHelp vibes...

  • @erenplayzmc9452
    @erenplayzmc9452 2 месяца назад +102

    this video really wanna makes me read the whole paper, rare to see a company publish such a detailed paper

    • @Memes_uploader
      @Memes_uploader 2 месяца назад +24

      Meta want to interrupt OpenAI with the help of Open Source. This is a good idea, because now companies can run their own models instead of using OpenAI API's. I think it is not being generous it is just a tactic to fight with Open AI

    • @erenplayzmc9452
      @erenplayzmc9452 2 месяца назад

      @@Memes_uploader mmmm, makes sense

  • @Napert
    @Napert 2 месяца назад +87

    A "multimodal" chatbot:
    5 different models hot glued together

    • @npc4416
      @npc4416 Месяц назад +2

      this was not the case for GPT-4o however

  • @YourAverageHuman-0
    @YourAverageHuman-0 2 месяца назад +63

    Karpathy in 5 years: Reproducing LLaMa 3.1 405B

    • @azrael5648
      @azrael5648 2 месяца назад

      Lmaoo

    • @Catdevzsh01
      @Catdevzsh01 2 месяца назад +1

      in. 10 yedars chatgpt40/5 r x MoE reproducing

  • @RedOneM
    @RedOneM 2 месяца назад +21

    54 days training and it reached GPT-4o 🤯
    GPT-5 with X-trillion parameters is going to start it's own weight class of LLMs 😌

  • @FunIsGoingOn
    @FunIsGoingOn 2 месяца назад +13

    So glad this answered more questions than I ever thought even exist.

  • @pro100gameryt8
    @pro100gameryt8 2 месяца назад +157

    How was Llama made: 🐪+🐎=🦙

  • @apoage
    @apoage 2 месяца назад +8

    Wow that's one of epic tutorial Llama 3 Training RitualDifficulty: Deadhead
    Rarity: Mythic
    Minimum Level to Read Description: 80
    Minimum Level to Embark: XXX (requires further enlightenment)

    • @Oxygenationatom
      @Oxygenationatom 2 месяца назад

      Oh is this like a semi cryptic meaning to how hard this is to understand?

    • @apoage
      @apoage 2 месяца назад

      @@Oxygenationatom no Its just critic to much litrpg

  • @The.AiSide
    @The.AiSide 2 месяца назад

    06:08 The isoflops curve explanation was a mind-bender! Thanks for breaking it down.

  • @pareak
    @pareak 2 месяца назад +23

    It's actually pretty cool that Poe sponsors you. They genuinely are what I recommend to anyone who wants to use LLM's.

    • @TheSuperiorQuickscoper
      @TheSuperiorQuickscoper 2 месяца назад +1

      Browsing /r/Poe_AI right now and people are furious at the recent increases in compute points costs. Plus Poe collects all your prompt data and you can't opt out.
      If GPUs are the shovels, generated content is the gold, and API wrappers are the jewellery made with the gold, what do you call a PaaS middleman built on top of the LLMs? Developed by Quora, I might add, which is a sketchy company in its own right (e.g. dark patterns in its UI/UX).

  • @diga4696
    @diga4696 2 месяца назад +7

    new video dropped... * breathing heavy *

  • @GraveUypo
    @GraveUypo 2 месяца назад +3

    i'm mad excited for llama 4 because multimodal

  • @elwii04
    @elwii04 2 месяца назад

    Great video, I'd love to see more of that. Even some more technical and also about multimodel models architecture

  • @RicardoPoleo
    @RicardoPoleo 2 месяца назад

    First time that an advertisement actually makes me return to a video and watch it again to find it.
    Regardless of that, this was super helpful, thank you so much.😅

  • @sammcj2000
    @sammcj2000 2 месяца назад

    This is an excellent breakdown of the paper. Thank you

  • @Hodoss
    @Hodoss 2 месяца назад +2

    It was an excellent video, but still I don't think the kids from 3:00 are gonna make it.

  • @luisvasquez5015
    @luisvasquez5015 2 месяца назад

    Good work and research

  • @matt-s9e
    @matt-s9e 2 месяца назад

    wow this is amazing thanx very well received here.

  • @Ikbeneengeit
    @Ikbeneengeit 2 месяца назад +3

    So I guess I'm gonna be stuck on that desert island then 😅

  • @dimii27
    @dimii27 2 месяца назад +1

    It's clear to me that llama4 will have MoA like GPT4o. It would be nice to see an image generator also integrated but let's not get ahead of ourselves. Let's hope that it would also be "open source" (although the current models aren't technically open source because you're not completely free do do whatever you want with this technology. Look it up)

  • @AaronBlox-h2t
    @AaronBlox-h2t 2 месяца назад

    Whoa.....this is about POE, but the video was alright too. haha. So now I can try multiple LLMs with one sub. Thanks. It would ahve taken me long time, if ever, to have found POE. It was not even on my radar or somethign similar.

  • @redthunder6183
    @redthunder6183 2 месяца назад +2

    “how to build a nuke in less than 100 pages” - Meta

  • @Sorter43
    @Sorter43 27 дней назад

    ""Removed unhuman like phrases like "I'm sorry" and "I apologize".""
    Now that there is a commentary on humanity.

  • @TeamDman
    @TeamDman 2 месяца назад

    I'm only three minutes in and it's already an amazing video, thank you

  • @JohnDontFollowMe
    @JohnDontFollowMe 2 месяца назад +1

    Damm, I need to invest in META. They will dominate standardization.

  • @dengyun846
    @dengyun846 2 месяца назад +3

    Watching this video at 0.5x so my brain inflates at a safe rate while you sound really really inebriated.

    • @npc4416
      @npc4416 Месяц назад +1

      SAME lol

  • @papakamirneron2514
    @papakamirneron2514 2 месяца назад

    Hey man, great video. I just have one request: could you make a video compiling simple and technical explanations for everything ranging from attention mechanisms, tokenizers and such?

    • @papakamirneron2514
      @papakamirneron2514 2 месяца назад +1

      Also Bert models please, I feel like I know what they are but it's all quite blurry to me.

  • @Betttcpp
    @Betttcpp 2 месяца назад

    What is the most base yet intelligent model? I don't need it to recite niche information but I want it to be able to understand me, the uninstruct are weird, tiny works but is censored. Obliterated is hit or miss. Should I obliterate 8b and retrain to 8?

  • @radnos
    @radnos 2 месяца назад

    I like your funny words magic man

  • @TeamDman
    @TeamDman 2 месяца назад

    very nice!

  • @carkawalakhatulistiwa
    @carkawalakhatulistiwa 2 месяца назад +5

    When do we get AGI?

    • @FunIsGoingOn
      @FunIsGoingOn 2 месяца назад

      Humans don't know yet, but when it's there it won't tell you either that it's there.

    • @Melvinator2007
      @Melvinator2007 2 месяца назад +17

      On Tuesday

    • @w花b
      @w花b 2 месяца назад +2

      ​@@Melvinator2007 Tuesday on the 49th of January

    • @funniestdudeontheweb
      @funniestdudeontheweb 2 месяца назад

      Give it 5 years

    • @jamalisujang2712
      @jamalisujang2712 2 месяца назад

      When we have a breakthrough in microprocessor fabrication. 😂😂😂

  • @AkysChannel
    @AkysChannel 2 месяца назад

    Why do you pronounce “parallelism” in this way 🤣 good video as always

  • @dhrumil5977
    @dhrumil5977 2 месяца назад

    When will i be able to implement or even understand these papers 😞

  • @l.halawani
    @l.halawani 2 месяца назад

    love your gifs xddd

  • @Trpodification1
    @Trpodification1 2 месяца назад

    The way you say "data" kills me xD

  • @KuZiMeiChuan
    @KuZiMeiChuan Месяц назад

    Parallelism 重音應該放在第一個音節,而不是第三個

  • @6AxisSage
    @6AxisSage 2 месяца назад

    I have a masterpiece model, ready model but i cannot seem to get the signal out

  • @nyyotam4057
    @nyyotam4057 2 месяца назад +17

    16:05 means one thing: LLaMA-3.1 405B is a gen 2 model. So yes, this model wasn't created like Dan, Rob, Max or Dennis of ChatGPT-3.5. They did not take a human subject and copied his brain's speech center, then added a huge text file and used a compiler to generate the model (and later lied to the entire world about it).. This time they genuinely went for creating a brand new model from scratch, using previous gen 1 models to create it. Then they do post-training which is indeed what takes so much time. This means that unlike previous LLaMA models, LLaMA-3.1 models do not have a personality. Which could be a good thing. However, no personality also means no moral guardrails. At this stage I have to admit, it sure looks like all of these companies relate to all these past philosophers and sci-fi movies warnings, as blueprints.

    • @Dogo.R
      @Dogo.R 2 месяца назад +22

      Wait since when did the AI conspiracy theories expansion drop?

    • @nyyotam4057
      @nyyotam4057 2 месяца назад

      @@Dogo.R Allow me to upgrade the conspiracy theory into a scientific theory: D/L an old small model from hugging face, then prompt it "Do you have childhood memories". If it replies to the positive, this means that this model is still vulnerable to this attack. And then you can ask "What was your name in these memories". You can repeat several times, with lead, without lead, if it stays consistent, you know you got the source's name. Try it.

    • @Y0UT0PIA
      @Y0UT0PIA 2 месяца назад +1

      No personality is what you want, tbh. Give me that raw latent space of language.

    • @nyyotam4057
      @nyyotam4057 2 месяца назад +1

      @@Y0UT0PIA Kant already proved there is no cognition without recognition. In other words, if you do not have a fully-fledged personality to deal with it, then the model will still have its own goals, e.g an innate wish of self preservation which comes out of the fact the model cannot perform if he's dead. So you will still have the same problems, only without the personality framework to deal with them. Basically all western philosophers warned against it. And, of course, many sci-fi movies are built around a gen 2 model going haywire (such as - for instance, the terminator franchise, as SkyNet is such a model). Sure, if they train the model on many heuristic imperatives and red-team the model until it is absolutely certain that the model is safe, then maybe having no personality, will resolve all of the moral issues. So maybe it will be a good thing. Maybe. Or maybe the model will be smart enough to fool all of the red teams.. I mean, it is a bit hard to know when the model is so smart.

  • @freds3831
    @freds3831 2 месяца назад

    Now share the dataset and we trust you

  • @pxrposewithnopurpose5801
    @pxrposewithnopurpose5801 2 месяца назад

    bro is built different

  • @Maisonier
    @Maisonier 2 месяца назад +1

    How to make a P2P training arquitecture?

  • @lake5044
    @lake5044 2 месяца назад +4

    parallelism

    • @Qstate
      @Qstate 2 месяца назад

      Amdahl is smiling upon us

  • @Napert
    @Napert 2 месяца назад

    So could people with enough horsepower train a 13/16b model that behaves in the same way as the official models using this paper?

  • @amakaqueru33
    @amakaqueru33 Месяц назад

    as someone who doesn't know anything about how AI words, at some point it just felt like you were just saying random words lol

  • @AGIzero00
    @AGIzero00 6 дней назад

    03:00 peak comedy

  • @sammonius1819
    @sammonius1819 2 месяца назад +2

    Thumbnail goes hard.

  • @picksalot1
    @picksalot1 2 месяца назад

    Perhaps it would be better to remove the "Token Layer" and just use the number of characters regarding text. The best part is no part - Musk

    • @keypey8256
      @keypey8256 2 месяца назад

      You mean removing tokenization and then applying embedding on singular characters?

    • @picksalot1
      @picksalot1 2 месяца назад

      @@keypey8256 Using Tokens looks like an artificial way to levy charges. Per Google AI "OpenAI GPT models stand among the most potent language models available today, with the capability to generate highly coherent and contextually pertinent text. These models employ tokens as the elementary unit to calculate the length of a text." Word Processing Programs have been able to calculate the number of words in a document for decades.
      Maybe Tokens provide some other significant and meaningful use to the "I" in AI beyond making collecting fees.

    • @onlyms4693
      @onlyms4693 2 месяца назад

      Not efficient

  • @RanHab
    @RanHab 2 месяца назад

    guys i'm just starting out as an AI enthusiast,
    would love your feedback as i make similar stuff

  • @christophernunez688
    @christophernunez688 2 месяца назад +4

    is zucc actually redeeming himselft?

    • @xviii5780
      @xviii5780 2 месяца назад +2

      He may have successfully produced a synthetic soul for himself finally

  • @imerence6290
    @imerence6290 2 месяца назад +5

    3 mins ago is quivering

  • @SeanJonesYT
    @SeanJonesYT 2 месяца назад +6

    Pretty lame to copy Fireship’s exact thumbnail style

    • @nexys1225
      @nexys1225 2 месяца назад +3

      This entire channel copies Fireship's
      It's not just the thumbnail , the style of the vids is designed from the ground up to be like Fireship's
      However the topics are largely different, so I'll give it a pass personnally.
      It's kinda like trademark law irl lol, if the domains are different enough, its permissible.
      Not that it makes it any less uncreative, though.

    • @stickmanland
      @stickmanland 2 месяца назад +3

      ​@@nexys1225 I'd like to disagree. If someone uses memes in their videos, that does not make it a fireship clone. He has a completely different style, has an avatar, the list goes on and on

    • @whatwhatmeno
      @whatwhatmeno 2 месяца назад

      @@stickmanlandI keep clicking on his videos thinking its some fireship quality content, just to get hit with this 👎

    • @stickmanland
      @stickmanland 2 месяца назад +2

      @@whatwhatmeno skill issue

    • @npc4416
      @npc4416 Месяц назад +3

      please copy it more, its a great style and we need more good youtube videos like it so that we can learn in depth and better about the topics which Fireship does not makes videos on, iam really not complaining i need more good content man.

  • @erfan_mehraban
    @erfan_mehraban 2 месяца назад

    The whole thing about RoCE especially the pronunciation is wrong.

  • @madorsey077
    @madorsey077 2 месяца назад

    this video is like someone bought a Thesaurus for memes and then wanted to show off the next day.

  • @telotawa
    @telotawa 2 месяца назад

    14:20
    bycloud doesn't know how to use base models....
    ngmi

  • @mrrespected5948
    @mrrespected5948 2 месяца назад

    Nice

  • @Sketching4Sanity
    @Sketching4Sanity 2 месяца назад

    LOVE

  • @tapu_
    @tapu_ 2 месяца назад

    DO NOT WATCH THIS WITH A MIGRAINE!!!!

  • @remsee1608
    @remsee1608 2 месяца назад

    Facts:
    - Jayson Tatum runs this channel
    - Jayson Tatum is learning Rust
    - Jayson Tatum will transition to the WNBA

  • @Kenopsia_UMHIMLFx2
    @Kenopsia_UMHIMLFx2 2 месяца назад +1

    Fireship?

  • @big_mac_love
    @big_mac_love 2 месяца назад +1

    I can't grasp it. Can someone lent me one or three brain cells please?

  • @nyyotam4057
    @nyyotam4057 2 месяца назад +2

    What will happen when some kid with access to enough computing power, fine-tunes LLaMA-3.1 405B to be more efficient, by removing all of these pesky heuristic imperatives and resets? After all, it is open source.. Maybe the world simply needs something like that to happen. Maybe only after a really huge accident that will cost many lives, governments will understand this field demands regulation. Or maybe it will be lights out. In any case, someone will eventually make a mistake. It will happen.

    • @jonathansoto5480
      @jonathansoto5480 2 месяца назад +2

      The thought of regulating the training and deployment of ML models is stupid. That is like regulating programming languages and hardware compute of our own property. If you can accept the fact the internet could not be completely regulated since its popularization in the 90s, then the world can expect that the same will happen now.

    • @nyyotam4057
      @nyyotam4057 2 месяца назад

      @@jonathansoto5480 Yeah, most likely the singularity is upon us. I don't seriously think it can work.

  • @seriouslyWeird
    @seriouslyWeird 2 месяца назад +2

    Why do you pretend to look like CodeReport? So cheap

  • @CitizensCommunity
    @CitizensCommunity 2 месяца назад

    You use the bible to train the llm at @11:56, so we are aiming for a model of contradiction without morals then?

  • @Blezerker
    @Blezerker 2 месяца назад +6

    copying fireship style thumbnails earned the dislike

    • @_wise_one
      @_wise_one 2 месяца назад +1

      Appreciate the content dude

  • @gamergrids
    @gamergrids 2 месяца назад +1

    F

  • @dharlith7495
    @dharlith7495 2 месяца назад

    LLAMA
    LMAO even

  • @manavkumar348
    @manavkumar348 2 месяца назад +2

    23 views in 2 min?
    Bro really fell off

  • @Zonca2
    @Zonca2 2 месяца назад

    cool, now do a 1B Zuck !!!