The Engineering Unlocks Behind DeepSeek | YC Decoded

Поделиться
HTML-код
  • Опубликовано: 7 фев 2025

Комментарии • 96

  • @chapterme
    @chapterme 2 дня назад +36

    Chapters (Powered by ChapterMe) -
    00:00 - Intro: New AI model in town
    01:45 - DeepSeek's V3 model to R1
    02:32 - How DeepSeek optimized for efficiency?
    03:45 - FP8 GPUs utilization
    04:20 - How Nvidia helps AI researchers
    07:33 - DeepSeek Secret Sauce: Reinforcement Learning
    10:22 - R1 results
    11:44 - Reproducible results and room for improvement
    12:34 - Big Takeaway
    12:48 - First YC spring batch

  • @CreepyCrawly-xy1iw
    @CreepyCrawly-xy1iw 2 дня назад +18

    There's been a tsunami of DeepSeek videos recently, but this video stood out for its quality research and presentation. Excellent job, well done!

  • @meterfeeder
    @meterfeeder 2 дня назад +48

    Thanks for putting this video out. Now I don't need to explain this to people, I can just forward this video.

  • @One_Two_Two_Three
    @One_Two_Two_Three 2 дня назад +26

    AWesome video. NO other deepseek coverage covers this much in depth concise content in under 15 mins

  • @dan_isaza_dev
    @dan_isaza_dev 2 дня назад +30

    Diana this is awesome - thanks for taking the time to make it!

  • @deeplearningexplained
    @deeplearningexplained 2 дня назад +14

    Great recap, a few things to add about R1 and the hype:
    1. One insane result the R1 paper showed was that distillation of R1 onto smaller model ALSO lead to solid reasoning emerging.
    2. The post-training method is simple, elegant and doesn’t require much to replicate.
    3. DeepSeek were abundantly transparent which the tech community greatly respected. They even showed that aligning for harmlessness lead to less performant model and showed the reasoning token which OpenAI kept hidden.
    4. Everyone in Davos from the tech sector that was interviewed was looking deadly afraid of DeepSeek because of V3. You didn’t have to be a tech wizard to understand the vibe was off.
    5. They had their ChaGPT moment where they put everything they had on the table. HN was all over it, tech twitter was all over it, it didn’t take a lot for it flare up.
    It’s the transparency over many front that did most of the heavy lifting in generating that hype.

  • @deveshlakwal8148
    @deveshlakwal8148 2 дня назад +48

    They used PTX, not cuda for parallel processing, it shows how crackhead those guys are, using ptx is similar to building a full-fledged modern day website in assembly.

    • @imerence6290
      @imerence6290 2 дня назад

      Quants are cracked

    • @floydsm8
      @floydsm8 2 дня назад +7

      There are people who build websites in something other than assembly? 😅

    • @petepreston2787
      @petepreston2787 2 дня назад +1

      @@floydsm8 Ahahaha!!! Too funny.

  • @zh1581
    @zh1581 2 дня назад +4

    Great dissection that addresses the hype around DeepSeek. Cuts through all the lousy media reporting by well-known publications. Thank you for this quality reporting.

  • @JosephTin
    @JosephTin 3 часа назад

    Very knowledgeable run through of the excitement of the past one month in A.I. development. Good work. Thank you very much.

  • @layer4down
    @layer4down День назад

    The was an absolutely crystal clear and fantastic summarization of R1! Well done!

  • @HeatWanted
    @HeatWanted 2 дня назад +38

    Every article I see on this is about how "DEEP SEEK ACTUALLY COST $1 BILLION". Just a bunch of propaganda to discredit DeepSeek. Thank you for providing a real explanation and treating me like an intelligent human being.

    • @ssddkar3577
      @ssddkar3577 День назад +1

      Deepseek is a chinese data farm. Ask it about tinamen square 😂

    • @suisinghoraceho2403
      @suisinghoraceho2403 День назад +1

      perfect example of the facts that internet can’t read.
      OTH, this provides a perfect opportunity to test R1’s Rag performance. Feed the paper to R1, and ask R1 what’s the cost of training V3 😂😂😂

    • @architect36ixtylastofhisna45
      @architect36ixtylastofhisna45 3 часа назад

      Why don't you try tge techniques in the paper and see for yourself?

    • @pathtooptimalhealth
      @pathtooptimalhealth Час назад

      Nah it just ripped GPT off - message it sent me through reasoning section “First, I need to clarify that I, as ChatGPT, am a cloud-based AI developed by OpenAI. My specific architecture and training data aren't available for download. However, the user might be conflating me with other open-source models that can be run locally. Ollama does host various models like Llama, Code Llama, or Mistral, which are different from me but can be used similarly for certain tasks.”

  • @pikaso6586
    @pikaso6586 2 дня назад +13

    US investors and companies forgot that significant AI advancements will come from better algorithms and better hardware, not millions of H100.

  • @imerence6290
    @imerence6290 2 дня назад +21

    Thank you for making a good technical and unbiased non-polarising video.

  • @pathtooptimalhealth
    @pathtooptimalhealth Час назад

    It just told me it was Chat GPT from Open AI (in the reasoning section)
    “First, I need to clarify that I, as ChatGPT, am a cloud-based AI developed by OpenAI. My specific architecture and training data aren't available for download. However, the user might be conflating me with other open-source models that can be run locally. Ollama does host various models like Llama, Code Llama, or Mistral, which are different from me but can be used similarly for certain tasks.”

  • @maximkireenkov
    @maximkireenkov День назад

    Business, no water, everything in its place! Thank you for the detailed analysis! It was very interesting and nice)))

  • @justjustgord
    @justjustgord 2 дня назад +3

    great summary. RL is fundamental to AI, we will see a lot of high growth startups using RL in engineering/logistics/medicine applications. .. currently undervalued due to hype around LLMs.

  • @sjkba
    @sjkba День назад

    Thanks for putting this out. Very interesting and well explained.

  • @andrewleonardi3351
    @andrewleonardi3351 2 дня назад +11

    If we can build models that powerful for just $6 million, imagine the possibilities with $500B using the same strategy.

    • @mrcookies409
      @mrcookies409 2 дня назад

      This should at least open people's minds to new possibilities.

    • @Moh_ha
      @Moh_ha День назад

      There inly so much can be done!

    • @galdutro
      @galdutro 20 часов назад

      Usually an exponential curve is the compounding of many plateau curves. Each of these curves are different innovations that unlocked more performance.
      Using this technique developed by Deepseek in a 500B cluster doesn’t necessarily translate to extremely higher performance due to the concept I explained earlier.

  • @hassanyahya400
    @hassanyahya400 10 часов назад

    Best possible time to build a start up

  • @mohansathya
    @mohansathya 21 час назад

    The best one I've seen yet.

  • @petepreston2787
    @petepreston2787 2 дня назад +1

    I hope you guys know that Open AI was started at Y Combinator. Yes? In 2012.

  • @cipanmandul
    @cipanmandul День назад

    Two thumbs up. Subscribed.

  • @harrylee27
    @harrylee27 2 дня назад

    Diana has all the passion to put it

  • @andrewaikawa6712
    @andrewaikawa6712 2 дня назад

    Thank your for the technical content!

  • @saracrypto5638
    @saracrypto5638 2 дня назад

    Thank you!!! Finally clarity🙏

  • @citizen_of_earth_
    @citizen_of_earth_ 2 дня назад +9

    I have a startup idea that gives each person at birth their own personal AI tool that learns everything about them as they grow and is personalized to each individual to help them navigate life and be successful.

  • @ErikPlay2Learn
    @ErikPlay2Learn 2 дня назад +1

    Reinforcement Learning with Human Feedback -> RLHF
    Putting the creation of abbreviations in a small animation might help viewers understand that something like "RLHF" is not a magic black box. You feel like, you just said it, right? Why having to say it again through animation? The viewer's brain is busy with parsing the information and looking for possible new information in their domain. Then suddenly parsing becomes harder, because it's not immediately clear what RLHF is, but because it's a bunch of uppercase letters, it might be important. At this point the speech is 30 seconds further ahead.

  • @danielocampo543
    @danielocampo543 2 дня назад +3

    This explained deepseek way better than anything before, as if it was the first time hearing about it

  • @orkhan_help
    @orkhan_help День назад

    thanks for sharing it

  • @uemrecimen
    @uemrecimen 2 дня назад

    I think it would be better to talk about why "Open"AI is not open as Deepseek , to understand the hype behind Deepseek or maybe you can take Sam as a guest speaker to talk about this , who knows 😂

  • @NwaburuEmeka
    @NwaburuEmeka 2 дня назад

    Good competition!

  • @df4privateyoutube722
    @df4privateyoutube722 2 дня назад

    Would appreciate more visuals to help educate us through it like diagrams etc :)

  • @co66
    @co66 2 дня назад +4

    I did😂. Looks good, works fast. If they add an option to add docs, it will become pretty a competitive tool.

  • @passage2enBleu
    @passage2enBleu День назад

    Nimble is the new superpower.
    Nimble Deepseek

  • @qet-lab
    @qet-lab 2 дня назад

    Soon YC will be fully hard tech

  • @TheHassoun9
    @TheHassoun9 День назад

    This is the best possible time to be building a startup 🤔 guys what do you think?

  • @jamesulan1
    @jamesulan1 12 часов назад

    @diana you're so good!

  • @Music-m1k
    @Music-m1k 2 дня назад +1

    It'd be better without the teleprompter ;)

  • @AbuSous2000PR
    @AbuSous2000PR 2 дня назад +1

    Well done...hats off dear. You did very well. NO BS..all beef. thx

  • @anttycoon
    @anttycoon День назад

    Nice, after watching this, I put myself at AI god level.

  • @floydsm8
    @floydsm8 2 дня назад

    Good job, Diana! Non carborundum illegitimi.

  • @AutoKeybo
    @AutoKeybo 2 дня назад

    Thanks for the video! AutoKeybo runs DeepSeek.

  • @hahahadiall
    @hahahadiall 2 дня назад

    Best time to build indeed

  • @Spyrosigma
    @Spyrosigma 15 часов назад

    Content is 🔥but the speaker is Cute

  • @AJLIM-q9c
    @AJLIM-q9c День назад

    Can download to c drive and try to run

  • @BrianBaliat
    @BrianBaliat День назад

    "Cost of intelligence is getting lower and lower "

  • @qet-lab
    @qet-lab День назад

    What if China open source all yc startups?

  • @ogilbii
    @ogilbii 2 дня назад +2

    Pleaseeeee activate Spanish dubbing.

  • @Nick_the_Gold_Bach
    @Nick_the_Gold_Bach День назад

    If all this is an OpenSource release, I wonder what the "paid version" capabilities are ? 😳

  • @samrj6227
    @samrj6227 День назад

    You're pretty ❤️
    good at explaining. Your video is the same as No Hype AI's video though. But good job 👍.

  • @prabuddhadas935
    @prabuddhadas935 День назад +1

    how many more startups do we actually need to build the future?

  • @Corythehausbaus
    @Corythehausbaus День назад

    I’m not that smart. I’m the end she said right now it’s the best time to build start ups ? Why ?

  • @coach-g7
    @coach-g7 2 дня назад

    They got yall working 24hrs on defense

  • @SDFNI3894YR
    @SDFNI3894YR День назад

    why our chinese brothers so good at AI? why not koreans? japanese? veitnamese? bcoz technically they all branch from same ______.

  • @dARKf3n1Xx
    @dARKf3n1Xx 2 дня назад

    None of the media understood the concept behind FP8 and FP32 but kept yapping like babies so that they could keep up with hype

  • @DanielCardenas1
    @DanielCardenas1 День назад

    Up voted. Please ditch the music next time. Not appropriate for technical videos.

  • @TarunLeela-f2g
    @TarunLeela-f2g 2 дня назад +3

    not y making an asian to explain about deepseek

  • @jonathandinbi8024
    @jonathandinbi8024 23 часа назад

    West Vs China
    WOKE Vs WORK

  • @waymanharris7245
    @waymanharris7245 2 дня назад

    💯

  • @pikaso6586
    @pikaso6586 2 дня назад

    Great work.

  • @comosaycomosah
    @comosaycomosah 2 дня назад

    0:54 lol uh no...not just the "pUbLiC Now Pay ATTentioN"..regardless of politics and china the thinking model is way more advanced and all ai should be similar. i dont like the full write out but it should be similar...8:55 math stuffs

  • @yafengyang3815
    @yafengyang3815 23 часа назад

    美国人说话总是表情和语气夸张,中国人去了之后也这样

    • @Shbtsd
      @Shbtsd 19 часов назад

      并不是,可能是这位女士说华语习惯了,我并不觉得

  • @ChristopherCarolyn
    @ChristopherCarolyn 3 часа назад

    Great content, as always! I have a quick question: My OKX wallet holds some USDT, and I have the seed phrase. (air carpet target dish off jeans toilet sweet piano spoil fruit essay). How should I go about transferring them to Binance?

  • @marvingrass8
    @marvingrass8 День назад

    First she said that deepseek's cheap development cost is a misconception and that it can't be built so cheaply. In the next sentence she says that UCLA built a comparable model for $30.
    Please help me understand

  • @sagarsingh1014
    @sagarsingh1014 2 дня назад

    😅 I can't even login

  • @kz1iv
    @kz1iv День назад

    The Republican Senator from Missouri Josh Hawley has introduced a new bill that would make it illegal to import or export artificial intelligence products to and from China, meaning someone who knowingly downloads a Chinese developed AI model like the now immensely popular DeepSeek could face up to 20 years in jail, a million dollar fine, or both, should such a law pass. Seems like someone in YC will be jailed and fined if this bill passes.

    • @ycombinator
      @ycombinator  17 часов назад

      We are aware and are in touch with his office. The bill did not pass.

  • @udaym4204
    @udaym4204 День назад

    deepseel is better than open ai sorry i mean closed ai i test across coding deeepseek is best

  • @DailyProg
    @DailyProg 2 дня назад

    She chose violence

  • @obi-wankenobi5332
    @obi-wankenobi5332 12 часов назад

    it's was stolen, the end.

  • @faizanrana2998
    @faizanrana2998 2 дня назад +1

    WAIT, DIANA HU?

  • @soulspawn
    @soulspawn 2 дня назад +1

    🖤🔥