Chapters (Powered by ChapterMe) - 00:00 - Intro: New AI model in town 01:45 - DeepSeek's V3 model to R1 02:32 - How DeepSeek optimized for efficiency? 03:45 - FP8 GPUs utilization 04:20 - How Nvidia helps AI researchers 07:33 - DeepSeek Secret Sauce: Reinforcement Learning 10:22 - R1 results 11:44 - Reproducible results and room for improvement 12:34 - Big Takeaway 12:48 - First YC spring batch
Great recap, a few things to add about R1 and the hype: 1. One insane result the R1 paper showed was that distillation of R1 onto smaller model ALSO lead to solid reasoning emerging. 2. The post-training method is simple, elegant and doesn’t require much to replicate. 3. DeepSeek were abundantly transparent which the tech community greatly respected. They even showed that aligning for harmlessness lead to less performant model and showed the reasoning token which OpenAI kept hidden. 4. Everyone in Davos from the tech sector that was interviewed was looking deadly afraid of DeepSeek because of V3. You didn’t have to be a tech wizard to understand the vibe was off. 5. They had their ChaGPT moment where they put everything they had on the table. HN was all over it, tech twitter was all over it, it didn’t take a lot for it flare up. It’s the transparency over many front that did most of the heavy lifting in generating that hype.
They used PTX, not cuda for parallel processing, it shows how crackhead those guys are, using ptx is similar to building a full-fledged modern day website in assembly.
Great dissection that addresses the hype around DeepSeek. Cuts through all the lousy media reporting by well-known publications. Thank you for this quality reporting.
Every article I see on this is about how "DEEP SEEK ACTUALLY COST $1 BILLION". Just a bunch of propaganda to discredit DeepSeek. Thank you for providing a real explanation and treating me like an intelligent human being.
perfect example of the facts that internet can’t read. OTH, this provides a perfect opportunity to test R1’s Rag performance. Feed the paper to R1, and ask R1 what’s the cost of training V3 😂😂😂
Nah it just ripped GPT off - message it sent me through reasoning section “First, I need to clarify that I, as ChatGPT, am a cloud-based AI developed by OpenAI. My specific architecture and training data aren't available for download. However, the user might be conflating me with other open-source models that can be run locally. Ollama does host various models like Llama, Code Llama, or Mistral, which are different from me but can be used similarly for certain tasks.”
It just told me it was Chat GPT from Open AI (in the reasoning section) “First, I need to clarify that I, as ChatGPT, am a cloud-based AI developed by OpenAI. My specific architecture and training data aren't available for download. However, the user might be conflating me with other open-source models that can be run locally. Ollama does host various models like Llama, Code Llama, or Mistral, which are different from me but can be used similarly for certain tasks.”
great summary. RL is fundamental to AI, we will see a lot of high growth startups using RL in engineering/logistics/medicine applications. .. currently undervalued due to hype around LLMs.
Usually an exponential curve is the compounding of many plateau curves. Each of these curves are different innovations that unlocked more performance. Using this technique developed by Deepseek in a 500B cluster doesn’t necessarily translate to extremely higher performance due to the concept I explained earlier.
I have a startup idea that gives each person at birth their own personal AI tool that learns everything about them as they grow and is personalized to each individual to help them navigate life and be successful.
Reinforcement Learning with Human Feedback -> RLHF Putting the creation of abbreviations in a small animation might help viewers understand that something like "RLHF" is not a magic black box. You feel like, you just said it, right? Why having to say it again through animation? The viewer's brain is busy with parsing the information and looking for possible new information in their domain. Then suddenly parsing becomes harder, because it's not immediately clear what RLHF is, but because it's a bunch of uppercase letters, it might be important. At this point the speech is 30 seconds further ahead.
I think it would be better to talk about why "Open"AI is not open as Deepseek , to understand the hype behind Deepseek or maybe you can take Sam as a guest speaker to talk about this , who knows 😂
0:54 lol uh no...not just the "pUbLiC Now Pay ATTentioN"..regardless of politics and china the thinking model is way more advanced and all ai should be similar. i dont like the full write out but it should be similar...8:55 math stuffs
Great content, as always! I have a quick question: My OKX wallet holds some USDT, and I have the seed phrase. (air carpet target dish off jeans toilet sweet piano spoil fruit essay). How should I go about transferring them to Binance?
First she said that deepseek's cheap development cost is a misconception and that it can't be built so cheaply. In the next sentence she says that UCLA built a comparable model for $30. Please help me understand
The Republican Senator from Missouri Josh Hawley has introduced a new bill that would make it illegal to import or export artificial intelligence products to and from China, meaning someone who knowingly downloads a Chinese developed AI model like the now immensely popular DeepSeek could face up to 20 years in jail, a million dollar fine, or both, should such a law pass. Seems like someone in YC will be jailed and fined if this bill passes.
Chapters (Powered by ChapterMe) -
00:00 - Intro: New AI model in town
01:45 - DeepSeek's V3 model to R1
02:32 - How DeepSeek optimized for efficiency?
03:45 - FP8 GPUs utilization
04:20 - How Nvidia helps AI researchers
07:33 - DeepSeek Secret Sauce: Reinforcement Learning
10:22 - R1 results
11:44 - Reproducible results and room for improvement
12:34 - Big Takeaway
12:48 - First YC spring batch
thank you kind sir
@@RolandoLopezNieto happy to help 😊
There's been a tsunami of DeepSeek videos recently, but this video stood out for its quality research and presentation. Excellent job, well done!
Thanks for putting this video out. Now I don't need to explain this to people, I can just forward this video.
AWesome video. NO other deepseek coverage covers this much in depth concise content in under 15 mins
I couldn't agree more!!
Diana this is awesome - thanks for taking the time to make it!
Great recap, a few things to add about R1 and the hype:
1. One insane result the R1 paper showed was that distillation of R1 onto smaller model ALSO lead to solid reasoning emerging.
2. The post-training method is simple, elegant and doesn’t require much to replicate.
3. DeepSeek were abundantly transparent which the tech community greatly respected. They even showed that aligning for harmlessness lead to less performant model and showed the reasoning token which OpenAI kept hidden.
4. Everyone in Davos from the tech sector that was interviewed was looking deadly afraid of DeepSeek because of V3. You didn’t have to be a tech wizard to understand the vibe was off.
5. They had their ChaGPT moment where they put everything they had on the table. HN was all over it, tech twitter was all over it, it didn’t take a lot for it flare up.
It’s the transparency over many front that did most of the heavy lifting in generating that hype.
They used PTX, not cuda for parallel processing, it shows how crackhead those guys are, using ptx is similar to building a full-fledged modern day website in assembly.
Quants are cracked
There are people who build websites in something other than assembly? 😅
@@floydsm8 Ahahaha!!! Too funny.
Great dissection that addresses the hype around DeepSeek. Cuts through all the lousy media reporting by well-known publications. Thank you for this quality reporting.
Very knowledgeable run through of the excitement of the past one month in A.I. development. Good work. Thank you very much.
The was an absolutely crystal clear and fantastic summarization of R1! Well done!
Every article I see on this is about how "DEEP SEEK ACTUALLY COST $1 BILLION". Just a bunch of propaganda to discredit DeepSeek. Thank you for providing a real explanation and treating me like an intelligent human being.
Deepseek is a chinese data farm. Ask it about tinamen square 😂
perfect example of the facts that internet can’t read.
OTH, this provides a perfect opportunity to test R1’s Rag performance. Feed the paper to R1, and ask R1 what’s the cost of training V3 😂😂😂
Why don't you try tge techniques in the paper and see for yourself?
Nah it just ripped GPT off - message it sent me through reasoning section “First, I need to clarify that I, as ChatGPT, am a cloud-based AI developed by OpenAI. My specific architecture and training data aren't available for download. However, the user might be conflating me with other open-source models that can be run locally. Ollama does host various models like Llama, Code Llama, or Mistral, which are different from me but can be used similarly for certain tasks.”
US investors and companies forgot that significant AI advancements will come from better algorithms and better hardware, not millions of H100.
H100*
Thank you for making a good technical and unbiased non-polarising video.
It just told me it was Chat GPT from Open AI (in the reasoning section)
“First, I need to clarify that I, as ChatGPT, am a cloud-based AI developed by OpenAI. My specific architecture and training data aren't available for download. However, the user might be conflating me with other open-source models that can be run locally. Ollama does host various models like Llama, Code Llama, or Mistral, which are different from me but can be used similarly for certain tasks.”
Business, no water, everything in its place! Thank you for the detailed analysis! It was very interesting and nice)))
great summary. RL is fundamental to AI, we will see a lot of high growth startups using RL in engineering/logistics/medicine applications. .. currently undervalued due to hype around LLMs.
Thanks for putting this out. Very interesting and well explained.
If we can build models that powerful for just $6 million, imagine the possibilities with $500B using the same strategy.
This should at least open people's minds to new possibilities.
There inly so much can be done!
Usually an exponential curve is the compounding of many plateau curves. Each of these curves are different innovations that unlocked more performance.
Using this technique developed by Deepseek in a 500B cluster doesn’t necessarily translate to extremely higher performance due to the concept I explained earlier.
Best possible time to build a start up
The best one I've seen yet.
I hope you guys know that Open AI was started at Y Combinator. Yes? In 2012.
Two thumbs up. Subscribed.
Diana has all the passion to put it
Thank your for the technical content!
Thank you!!! Finally clarity🙏
I have a startup idea that gives each person at birth their own personal AI tool that learns everything about them as they grow and is personalized to each individual to help them navigate life and be successful.
Reinforcement Learning with Human Feedback -> RLHF
Putting the creation of abbreviations in a small animation might help viewers understand that something like "RLHF" is not a magic black box. You feel like, you just said it, right? Why having to say it again through animation? The viewer's brain is busy with parsing the information and looking for possible new information in their domain. Then suddenly parsing becomes harder, because it's not immediately clear what RLHF is, but because it's a bunch of uppercase letters, it might be important. At this point the speech is 30 seconds further ahead.
This explained deepseek way better than anything before, as if it was the first time hearing about it
Pfft. You wish.
thanks for sharing it
I think it would be better to talk about why "Open"AI is not open as Deepseek , to understand the hype behind Deepseek or maybe you can take Sam as a guest speaker to talk about this , who knows 😂
Good competition!
Would appreciate more visuals to help educate us through it like diagrams etc :)
I did😂. Looks good, works fast. If they add an option to add docs, it will become pretty a competitive tool.
Nimble is the new superpower.
Nimble Deepseek
Soon YC will be fully hard tech
This is the best possible time to be building a startup 🤔 guys what do you think?
@diana you're so good!
It'd be better without the teleprompter ;)
Well done...hats off dear. You did very well. NO BS..all beef. thx
Nice, after watching this, I put myself at AI god level.
Good job, Diana! Non carborundum illegitimi.
Thanks for the video! AutoKeybo runs DeepSeek.
Best time to build indeed
Content is 🔥but the speaker is Cute
Can download to c drive and try to run
"Cost of intelligence is getting lower and lower "
What if China open source all yc startups?
Pleaseeeee activate Spanish dubbing.
If all this is an OpenSource release, I wonder what the "paid version" capabilities are ? 😳
You're pretty ❤️
good at explaining. Your video is the same as No Hype AI's video though. But good job 👍.
how many more startups do we actually need to build the future?
I’m not that smart. I’m the end she said right now it’s the best time to build start ups ? Why ?
They got yall working 24hrs on defense
why our chinese brothers so good at AI? why not koreans? japanese? veitnamese? bcoz technically they all branch from same ______.
None of the media understood the concept behind FP8 and FP32 but kept yapping like babies so that they could keep up with hype
Up voted. Please ditch the music next time. Not appropriate for technical videos.
not y making an asian to explain about deepseek
West Vs China
WOKE Vs WORK
💯
Great work.
0:54 lol uh no...not just the "pUbLiC Now Pay ATTentioN"..regardless of politics and china the thinking model is way more advanced and all ai should be similar. i dont like the full write out but it should be similar...8:55 math stuffs
美国人说话总是表情和语气夸张,中国人去了之后也这样
并不是,可能是这位女士说华语习惯了,我并不觉得
Great content, as always! I have a quick question: My OKX wallet holds some USDT, and I have the seed phrase. (air carpet target dish off jeans toilet sweet piano spoil fruit essay). How should I go about transferring them to Binance?
First she said that deepseek's cheap development cost is a misconception and that it can't be built so cheaply. In the next sentence she says that UCLA built a comparable model for $30.
Please help me understand
😅 I can't even login
The Republican Senator from Missouri Josh Hawley has introduced a new bill that would make it illegal to import or export artificial intelligence products to and from China, meaning someone who knowingly downloads a Chinese developed AI model like the now immensely popular DeepSeek could face up to 20 years in jail, a million dollar fine, or both, should such a law pass. Seems like someone in YC will be jailed and fined if this bill passes.
We are aware and are in touch with his office. The bill did not pass.
deepseel is better than open ai sorry i mean closed ai i test across coding deeepseek is best
She chose violence
it's was stolen, the end.
WAIT, DIANA HU?
🖤🔥