[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024

Комментарии • 143

  • @genomexp
    @genomexp 6 месяцев назад +90

    I'm so glad to see ML news back in action with more regularity. You've got actual knowledge and credibility that matters for presenting info in a rapidly crudifying space, the scene is filling up with empty influencer know-nothings, and I want the straight dope and technicals. Thank you.

    • @AncientSlugThrower
      @AncientSlugThrower 6 месяцев назад +5

      Yannic's channel was my first experience in ML and AI news when GPT4 exploded. Yannic is the real deal, one of the best and most reliable sources available.

    • @2ndfloorsongs
      @2ndfloorsongs 6 месяцев назад +1

      The sunglasses are the most important, followed by the humor; but yeah, okay, I appreciate the information as long as it doesn't interfere with my fandom.

    • @handris99
      @handris99 6 месяцев назад +2

      Yeah same here. I know you are busy with open assistant and stuff like that (keep those things up) but we need your AI news and paper reviews! How else am I gonna move on with my own AI project. Half the technology I'll need is still buried in some obscure paper somewhere in the paper pile!

  • @lachland592
    @lachland592 6 месяцев назад +21

    For the Grok model it is worth noting this has no fine tuning, and its performance is bad. I want to give Elon credit but this seems like more of a performative release than a real contribution.

  • @testboga5991
    @testboga5991 6 месяцев назад +8

    Open sourcing Grokh was cheap, it was a basically useless model in terms of commercial usefulness.

    • @tho207
      @tho207 6 месяцев назад

      and useless in terms of research since it's jax

  • @AncientSlugThrower
    @AncientSlugThrower 6 месяцев назад +8

    Inflection put out a pretty good product with Pi. The Pi chatbot is tuned to being a companion and even as a kind of therapist. Giving an empathetic digital face to Microsoft will not end well.

  • @gaggix7095
    @gaggix7095 6 месяцев назад +14

    The robots are humanoids because the tools have been developed around the human topology, so if you want to have a robot in the future to interface with these tools, humanoid is the most optimal form.

    • @pedrogorilla483
      @pedrogorilla483 6 месяцев назад +3

      I generally agree with you but there’s also the marketing aspect of it.

    • @MFBjosejuFanNumberUan2047
      @MFBjosejuFanNumberUan2047 6 месяцев назад +3

      WHEN AND WHERE SHALL WE GET THE CAT ROBOT GF!!!!

    • @2ndfloorsongs
      @2ndfloorsongs 6 месяцев назад +2

      This is AI world, YOU are the cat.

  • @MonkeySimius
    @MonkeySimius 6 месяцев назад +9

    I'm always happy to see your videos. You give informative breakdowns without getting lost in the sauce about how whatever minor new improvement shocks the industry and marks full on AGI.
    Thank you

  • @memegazer
    @memegazer 6 месяцев назад +5

    You joke...but I think mandatory discoslure of AI is the easist regulatory hurdle that will make a huge impact imo.
    Ideally people would discoluse such without regulatory compulsion...but imo it is important, to me at least, that such disclosures exists moving forward.

    • @johnflux1
      @johnflux1 6 месяцев назад +2

      Just like how the cookie disclosure has had a huge impact, and California's cancer warnings?

    • @danielsan901998
      @danielsan901998 6 месяцев назад +1

      @@johnflux1 GDPR have a huge impact, even american companies had to adapt.

    • @ra2enjoyer708
      @ra2enjoyer708 6 месяцев назад

      @@johnflux1 Yes it did make an impact, since now a typical Norman knows "cookie" is some scary legalistic jargon and not just a magical internet feature which "just works".

  • @waterbot
    @waterbot 6 месяцев назад +4

    i love a good Monday ML news on Tuesday that I watch on Wednesday, cheers Yannic!

  • @dylan_curious
    @dylan_curious 6 месяцев назад +1

    Make sure you think about the downsides to open sourcing frontier models also. Open source might be the best way but it's not clear to me the benefits are worth the risks.

  • @Hukkinen
    @Hukkinen 6 месяцев назад +1

    I'm all for EU cookie-nags. Just say no, and I always do. Companies maximize their data hogging anyway, so I don't see why there would be less data by default in an alternative universe, where no EU cookie law exist.

  • @matteofrattini9133
    @matteofrattini9133 6 месяцев назад +1

    As a fellow AI engineer/enthusiast of course I'm happy when we can just release cool stuff, but Europe is going in the right direction by starting to regulate the field. While AI is amazing there are still countless possibilities of misuse, overapplication and even rights infringement (I'm thinking privacy), so regulating big corps early on is necessary.
    Or we can just take the US route and have sugar in our bread lmao

  • @marcussky
    @marcussky 6 месяцев назад +2

    @yannic there is a large FP4 literature now (NF4, QLoRa etc) with hundreds of QLoRa models on HuggingFace

  • @anthonyward8805
    @anthonyward8805 6 месяцев назад +3

    Humanoid robots also make sense if you believe that we can get video pre-training to work from human videos

  • @IsraelMendoza-OOOOOOO
    @IsraelMendoza-OOOOOOO 6 месяцев назад +8

    Great To see you again God Bless You 🫵🏼❤️

  • @ScibbieGames
    @ScibbieGames 6 месяцев назад +2

    I don't know if I've missed your coverage, but 1.58bit model training / inference source code has been released, which is interesting because of the scaling law suggested by the associated paper.

  • @jonmichaelgalindo
    @jonmichaelgalindo 6 месяцев назад +1

    Humanoid bots: It's about training. If you want to show a bot how to do something, it needs to be able to follow your example. If it's not a humanoid, it has to solve every task in existence from scratch. If it's humanoid, it can learn from human actions.
    Great video. Thanks for the detailed and broad content! ❤

  • @spleck615
    @spleck615 6 месяцев назад

    Amazing to hear they finally open sourced Grok-1. No doubt given the channel history you will build it from scratch and validate it matches the distributed weights and doesn’t have any sleeper agents, etc, as you can do with any good open source project. Right? We don’t just have to take the word of the guy that has repeatedly lied and misled many? That’s the power of open source, right? Trust, but verify.

  • @skierpage
    @skierpage 6 месяцев назад

    23:30 c'mon Yannick, use the original for "On the internet, no one knows your dog" which is Peter Steiner's 1993 cartoon in The New Yorker.
    As I said last week, your curved movie-screen topic thumbnails should snap into a full screen view, not whiz off the side.

  • @rexrelic
    @rexrelic 6 месяцев назад

    Can you please make a video explaining this paper on anomaly detection "Asymmetric Student-Teacher Networks for Industrial Anomaly Detection"? It will be a great help😊

  • @scorpiorok
    @scorpiorok 6 месяцев назад +4

    Grok=314B, Pi=3.14..., I assume this is deliberate?

    • @zacharyormstedt8514
      @zacharyormstedt8514 6 месяцев назад +4

      You sure did some deducing there detective

    • @SteveStavropoulos
      @SteveStavropoulos 6 месяцев назад +4

      Given that we are talking about Musk, I would guess his next model will have exactly 3141B parameters (and the next next one, 31415B). And that will be a hard requirement given to his engineering team...

    • @lexer_
      @lexer_ 6 месяцев назад +3

      naah its going to be 420B first

  • @technolus5742
    @technolus5742 5 месяцев назад

    The performance is quite poor, it's useless to them (and to the vaaast majority of people). If they had a SOTA model, I wonder if they would open-source it.
    Also: it's an MoE model, not all parameters are active at the same time. Expect it to require about the same compute as an 80b model.

  • @ClaudioMartella
    @ClaudioMartella 6 месяцев назад +6

    there's actually a paper that shows that ~1.5 bits per weight is enough

    • @Sven_Dongle
      @Sven_Dongle 6 месяцев назад +1

      And that paper is probably good for wiping butts.

    • @ClaudioMartella
      @ClaudioMartella 6 месяцев назад +2

      @@Sven_Dongle why?

    • @Sven_Dongle
      @Sven_Dongle 6 месяцев назад

      ​@@ClaudioMartella If you think there is no difference between 4 and 8 bit quantization you havent worked with these models at all. And what even is
      ~1.5 bits per weight? 3 bits per 2 weights? It stretches credulity.

    • @AM-yk5yd
      @AM-yk5yd 6 месяцев назад +1

      @@ClaudioMartella that paper came from the authors of "RetNet: the successor to transformers" (~half of the authors are the same to be more exact).
      This time they didn't want to be just to be just a mere successor and called it a new era. They are high on their own farts.
      Oh, and retnet was such a successor that the only model released was a pile of garbage and got unreleased. 1.58bit paper didn't even compare themselves to the "successor' of transformers that they build themselves
      And author never released the weights.
      I expect their next paper to be called "The bestest revolutionest architecture in the mutliverse" with loud claims and no impact.

    • @DajesOfficial
      @DajesOfficial 6 месяцев назад +2

      @@Sven_Dongleyou don’t seem to understand the difference between the possibility that some model can perform well with 1.5 bpw and ability to cast any model (presumably pretrained with 16bpw precision) into 1.5 bpw without losing quality

  • @DanielWolf555
    @DanielWolf555 6 месяцев назад

    12:45 - All the technology and buildings and tools that mankind has created is made for the humanoid form. Making humanoid robots would mean that they could use everything we use as well. E.g. you want your floor cleaned? Get a robotic vacuum cleaner? No, better give your ordinary vacuum to a humanoid robot so that he cleans the floor with it.

  • @gpeschke
    @gpeschke 6 месяцев назад +2

    The Google post was to help the SEO shamans start hallucinating in the right direction. Without it, there are prone to all sorts of fun thinking. Blog posts like that keep them mostly sane.

  • @fitybux4664
    @fitybux4664 6 месяцев назад +1

    23:00 It used to be "one guy" they all called, but now they just get LLMs to hallucinate the answers. 😆

  • @Phobos11
    @Phobos11 6 месяцев назад +4

    Rest of the world: develops AI and makes it open source
    The EU: we don’t do that here

  • @alexijohansen
    @alexijohansen 6 месяцев назад +1

    very competent …. single source 1400 lines of code 🎉😂

  • @jeffrey5602
    @jeffrey5602 6 месяцев назад

    Can we get a 250k subscriber special on ML meme reviews please?? 😃

  • @jonclement
    @jonclement 6 месяцев назад

    Khan academy was one of the first reported partners using openai so I assume the gpt -khan model name is specific to a Khan application rather then just trained on khan data nuance.

  • @unclecode
    @unclecode 6 месяцев назад

    Love your joke in EU AI Act, watched twice 🤣🤣🤣

  • @fitybux4664
    @fitybux4664 6 месяцев назад

    3:27 Half a bit on, half a bit off, you're half way to quantum computing!

  • @jayhu6075
    @jayhu6075 6 месяцев назад +1

    In India, the government has considered a non-regulatory approach, emphasizing the need to innovate, promote and adapt to the rapid advancement of AI technologies.
    The EU should do the same. Innovation is very important than the rules about regulation, otherwise we will lag behind in development compared to the US, China, the politicians have no understanding of technology at all, which has major consequences for knowledge technology and our future generation. Many thanx for your great update.

    • @matteofrattini9133
      @matteofrattini9133 6 месяцев назад

      The EU is fully aware that its innovating capacity is miles behind American or Chinese massive corporations, there's just no way Europe will win the research battle in an open field. I think regulating and sort of "protecting" the market from these corporations is a good move, especially since the applications of AI will keep covering more and more ground in the next decades

    • @technolus5742
      @technolus5742 5 месяцев назад

      ​@@matteofrattini9133 Not so sure about that, the US is the biggest business center, which is different from being the most innovative.
      European regulation is likely to influence guidelines across the globe. It's not just about protecting the market, but ensuring some level of safety with a technology that is dangerous.

    • @matteofrattini9133
      @matteofrattini9133 5 месяцев назад

      @@technolus5742 I fully support Europe's effort to regulate a potentially dangerous field. I'm just saying it's also a strategic move, since Europe could never achieve technological dominance in an unregulated field against powerhouses like the US or China

  • @widerthanpictures
    @widerthanpictures 6 месяцев назад +1

    The models are the private deployment capacity for those companies.

  • @florianhonicke5448
    @florianhonicke5448 6 месяцев назад +1

    🎉 You make people to like mondays

  • @GarethDavidson
    @GarethDavidson 6 месяцев назад

    "employ, exploit, extinguish."

  • @TylerMatthewHarris
    @TylerMatthewHarris 6 месяцев назад +2

    Wow

  • @SinanAkkoyun
    @SinanAkkoyun 6 месяцев назад

    Is the groq inference code as fast as the one they use for hosting?

  • @regularsteven
    @regularsteven 6 месяцев назад +3

    Hi Yannic, Can you please confirm that you're coming to WebExpo? I've tried a few times to get in touch. Hope to get a reply, and I'm sorry for chasing. Please let us know.

  • @k98killer
    @k98killer 6 месяцев назад

    I experimented with rolling my own 4-bit float encodings, and the lack of precision made them challenging to use. Maybe it will be useful with the first several passes of quickprop.

  • @tensor_verkampen
    @tensor_verkampen 6 месяцев назад

    That list of model names appear to have been thoroughly scrubbed... Catbox 404.

  • @erikjohnson9112
    @erikjohnson9112 6 месяцев назад +1

    I realize nobody probably cares but GR00T is zeros no the letter O. GR00T vs GROOT. It jumps out once you see it printed on a page that shows cap-O, like 4:00

    • @Brahvim
      @Brahvim 6 месяцев назад +1

      Thanks for telling me.
      ...and me specifically, given how you yourself think about not many caring.

    • @erikjohnson9112
      @erikjohnson9112 6 месяцев назад

      @@Brahvim Just trying head off the "who cares?" default response from YT comments. Personally I think it is neat to discover things by paying close attention (in this case visually).

    • @Brahvim
      @Brahvim 6 месяцев назад

      @@erikjohnson9112 At least it isn't a "secret" anymore, thanks to you! People WILL know it's a `0` and not the letter 'O' now!
      ...
      People like _me,_ at least!
      It really is great. Keep it up, dear internet stranger!

  • @ozten
    @ozten 6 месяцев назад

    We should use genetic algorithms to evolve the ideal robot form factor based on parts, cost, and human built environments and tasks. Maybe they won't look like humans!

  • @tiagotiagot
    @tiagotiagot 6 месяцев назад

    Microsoft never stopped doing EEE, did it?

  • @ParkerSeeley
    @ParkerSeeley 6 месяцев назад

    6:09 Welp, guess josh is getting fired

  • @zhandanning8503
    @zhandanning8503 6 месяцев назад

    How does the 1 bit embedding work? Can someone explain? That eould mean things are encoded into binary right? So does that mean precision foesn't really matter to throughout the machine learning model? That at some point it becomes like a pigeon hole thing within the model?

  • @1PercentPure
    @1PercentPure 6 месяцев назад

    I love you andnml news so kuchby

  • @scottmiller2591
    @scottmiller2591 6 месяцев назад

    "First major AI law passed by European LLaMakers"

  • @quantumjun
    @quantumjun 6 месяцев назад

    FP4 maybe we just need 0/1

  • @d_b_
    @d_b_ 6 месяцев назад

    Wow, does that list expose companies that fine tuned models with OpenAI?

  • @velo1337
    @velo1337 6 месяцев назад

    ai banner would be great.

  • @manishsharma2211
    @manishsharma2211 6 месяцев назад

    samay ki english achi hogyi hai xd

  • @retronyme
    @retronyme 6 месяцев назад

    For one I think the AI act is a good thing for people !

  • @rexrelic
    @rexrelic 6 месяцев назад

    Very very cool AF😎

  • @seanreynoldscs
    @seanreynoldscs 6 месяцев назад

    Those look like open ai api adapters. Ie the APIs that OpenAI can access.

  • @GoldenBeholden
    @GoldenBeholden 6 месяцев назад

    I guess for every GDPR we get a cookie-esque law.

  • @pedrogorilla483
    @pedrogorilla483 6 месяцев назад

    I like how you don’t jump on the hype train as soon as it passes by. I see you were here before it was cool.

  • @markr9640
    @markr9640 6 месяцев назад

    "These people had wey too much precision" 🤣

  • @bbamboo3
    @bbamboo3 6 месяцев назад

    appreciate the information/evaluation density.

  • @GilesBathgate
    @GilesBathgate 6 месяцев назад

    Well 1-bit is closer to a 'biological' activation function 🤷

  • @BoominGame
    @BoominGame 6 месяцев назад

    Question is how many groqs it takes to run grok.

  • @VincentVonDudler
    @VincentVonDudler 6 месяцев назад

    22:40 - I hope Google bargained an Android release of iMessage into the deal.

  • @wolpumba4099
    @wolpumba4099 6 месяцев назад

    Terrifying robot movement at 4:37

  • @rando6836
    @rando6836 6 месяцев назад +1

    RIP, Josh.

    • @dubhd4r4
      @dubhd4r4 6 месяцев назад

      Josh sweating like crazy right now

  • @BooleanDisorder
    @BooleanDisorder 6 месяцев назад +1

    I'm excited for the future.

    • @2ndfloorsongs
      @2ndfloorsongs 6 месяцев назад

      I'm not sure if the future needs our support. But that's probably just me. I have a hard enough time maintaining interest in the present, being excited about a future is way beyond my abilities.

  • @allurbase
    @allurbase 6 месяцев назад +3

    If a robot costs more than a humans year salary, it's not worth it.

    • @pjtren1588
      @pjtren1588 6 месяцев назад +1

      Yet...

    • @drdca8263
      @drdca8263 6 месяцев назад +1

      Depends on the upkeep costs and how long it remains how useful, right? Especially if there’s a “rent-to-own” option!

    • @allurbase
      @allurbase 6 месяцев назад

      ​@@drdca8263I was assuming an average lifetime of 1 year. Sabotage by human coworkers is definitely going to be a thing.

  • @dmytroivakh6164
    @dmytroivakh6164 6 месяцев назад

    Great work! Keep it up!

  • @graham8316
    @graham8316 6 месяцев назад

    Flat repo for grok is hard asf

  • @XOPOIIIO
    @XOPOIIIO 6 месяцев назад +5

    People should confirm agreement on cookies just once - when they connect to the internet. It should be written in the agreement with provider.

    • @ra2enjoyer708
      @ra2enjoyer708 6 месяцев назад +3

      The person who had not used internet before will never know what "agreement on cookies" even is, especially since the agreement can be constructed in misleading but technically correct ways aka "do you want to create accounts on the internet?". Which would ofc implicitly include an account on provider's site, bonus points if said provider holds a de facto monopoly in the area.
      That's akin to implicitly agreeing to be stabbed by a knife just because you bought one for your kitchen.

  • @serta5727
    @serta5727 6 месяцев назад

    Very cool ❤

  • @soylentpink7845
    @soylentpink7845 6 месяцев назад

    Nice

  • @snakeonex
    @snakeonex 6 месяцев назад +10

    I kinda feel bad, that we got to a point that everytime someone mentions Elon, he has to first say "I am not endorsing him / what ever you think about him / yeah he is bad, but this is good", he is really reaching Trump level rep isn't he

    • @bryce.ferenczi
      @bryce.ferenczi 6 месяцев назад +15

      He's done it to himself. No one told he should try as hard as possible to ruin his own reputation.

    • @Dogo.R
      @Dogo.R 6 месяцев назад +3

      I love how he said "what ever you think of him" 2x.
      Yet you added versions he didnt say that imply or dirrectly say elon is more on the bad side.
      Which is not what he said in the video.
      Very weird to read your comment.
      Its like subtly changing what happened... but in a way where its doesnt overtly look like lying unless you rewatch the video clip.
      Please try to be more accurate in the future.

    • @voyagerrock1137
      @voyagerrock1137 6 месяцев назад +1

      I mean it's his own fault, he's a lying scumbag

    • @gpeschke
      @gpeschke 6 месяцев назад +3

      'Whatever you think of him' is a pretty unsubtle code. It's a polite way of distancing yourself from a person.
      Added with endorsement for his actions, and he's taking a pretty neutral stance. Which is a wise public stance around polarizing figures.
      Regardless of which pole he's actually on, it's foolish to spend his social capital on the subject. That's the message he's actually sending.

    • @maheshprabhu
      @maheshprabhu 6 месяцев назад

      ​@@Dogo.R you can't be that naive. Elon Musk has pretty much forced people to pick sides when forming an opinion on him.

  • @d96002
    @d96002 6 месяцев назад

    why not 420 billion parameters ?

  • @memegazer
    @memegazer 6 месяцев назад

    On open source Sora like models.
    Imo there will be no parity bc of compute.
    The results I have seen from sora indicate to me that it is not raw vid train data, but rather a hybrid of synthetic data that uses either a nerf or guasssplat supplement to create that level of fine control and temporal fiedility.
    As impressive as sora is, it still seem obvious to me that 3d rendering is part of their training pipeline, and imo the most reasonable explantion of how to get large amounts of synthetic data there is with nerf/guass splat

  • @memegazer
    @memegazer 6 месяцев назад

    lol...I am not surprised you are impressed with grok considering your 4chan llm.
    But as far as I am concerned they are about par in terms of impressiveness relative to state of the art.

    • @memegazer
      @memegazer 6 месяцев назад

      Also...not sure how reliable "popularity graphs" are concerning online tools...whang did a cool vid about such voting metrics are easily manipulated...boaty mcboat face examples come to mind.

  • @alansmithee419
    @alansmithee419 6 месяцев назад

    11:10
    I'm slightly confused here, because it sounds like you think the lawmakers can shift their focus onto developing technology instead.
    But I know you don't think that, I'm just not really sure what your point is.

  • @schwajj
    @schwajj 6 месяцев назад

    Lol Grok-1 “will probably require 69 GPUs to run”, haha at least that many. Probably more like 420 😂

    • @Sven_Dongle
      @Sven_Dongle 6 месяцев назад

      And its 256 GB for the weights, so you probably need a tera of RAM, then a combined tera of RAM over the range of the GPU's

  • @bigbug04
    @bigbug04 6 месяцев назад

    Most of the Open Source are rather just Open Sores

  • @6AxisSage
    @6AxisSage 6 месяцев назад

    The transformer architecture consists of multiple decoder layers, each containing:
    - Multi-head attention (MHA) with query, key, and value projections
    - Mixture of experts (MoE) layer
    - Feedforward layers (linear transformations with activation functions)
    - Layer normalization and RMS normalization
    The MoE layer uses the `Router` to compute routing probabilities, which determine the experts to route the input to. The selected experts process the input independently, and their outputs are combined based on the routing probabilities.
    The multi-head attention mechanism allows the model to attend to different positions in the input sequence, capturing dependencies and relationships between tokens. The rotary position embeddings (RoPE) enhance the model's ability to capture relative position information.
    The transformer model takes an input sequence and applies the decoder layers sequentially. At each layer, the input goes through the MHA, MoE, and feedforward layers, with layer normalization and residual connections. The final output of the transformer is the embedded representation of the input sequence.
    The code also includes sharding and partitioning utilities to distribute the model across multiple devices for efficient training and inference.
    Overall, this transformer architecture incorporates mixture of experts layers and rotary position embeddings to enhance the model's capacity and ability to capture complex dependencies in the input sequence.

    • @WiseCheese587
      @WiseCheese587 6 месяцев назад +1

      Bro what are you doing commenting on yt go build West World with that big brain or please be the next president a

    • @Sven_Dongle
      @Sven_Dongle 6 месяцев назад +1

      @@WiseCheese587 Its just a regurgitation of the Grok-1 specs, genius.

  • @IsraelMendoza-OOOOOOO
    @IsraelMendoza-OOOOOOO 6 месяцев назад

    Split one side Truth word Of God/other Ungodly things ❤

  • @Sven_Dongle
    @Sven_Dongle 6 месяцев назад +1

    Grok-1 256GB for the weights. Good luck.