The part where it shows its 'thinking process' helps me a lot. Seven out of ten problems I present to it are solved just by reading its monologue. It feels like narrating the problem to another person, and then the solution just hits you out of nowhere
OpenAi deserves it, its ridiculously over priced subscription model was beaten by an open-source model. I have seen comments in reddit claiming they cancelled their openai subscription. I want to see what openai does in future to account for this.
@@7oeseven793 It can search the internet to retrieve the latest information, you can also attach documents in its context window to make it better aware.
Based on what they were spending, I wouldn't say it was ridiculous. They had to figure out ways to charge more. Downwards pressure on what they can charge is not good for them.
They probed chatgpt api to steal chat gpt weights. That's why it was only $6M. They only created a cheap Chinese clone 😂 I also dislike openai, anyway.
I'm already giddy at the idea of hosting a local AI assistant to help out with large code bases or digesting documentation and showing better examples for how to build and use things. That would be awesome.
@@leftcritical7352 Because you keep data local you can put anything through it and you never have to think about who has your data. You don't have to worry about whether whatever you're copying&pasting includes an important API key or sensitive info since it never goes to anyone else.
Absolute game changer. I was skeptical of AI as a generalized consumer app, both from a technical and economic standpoint. Technical because "everyone can be an engineer/scientist/developer/etc now" was simply not true. And economic because the massive amounts of venture capital raised and valuations/expectations meant they would need to come up with a consumer app that was so popular it would have to make the iPhone look like New Coke. But AI as an open source addition to the tool belts of already capable people? That's much, much more interesting.
I was more skeptical of AI because it was monopolized by shitty companies like Google, Microsoft and OpenAI, but now, with deepseek and other future models, to be able to run them locally and build tooling tailored around your use-cases, it is another thing. One, that I'm really excited for.
@@teodor-valentinmaxim8204 You've been able to run them locally for a while now FYI. Deepseek is just the best thing atm but we've been semi-regularly getting better open stuff.
@@teodor-valentinmaxim8204 It's still not feasible to run locally for the bigger model for individual. Maybe wait until Apple develop their M5 or M6 chip which has more AI calculation power. My M2 is super laggy to run just the 7b model.
@@haha-eg8fj M1-pro here runs 6.7B model just fine. To run something larger you just need a graphic card, but you can run 70B model with just one graphic card - and this is insane.
You can't copy a model through an API, at most you can use data generated by ChatGPT in response to your prompts, but even this is speculative. The hype and cost reduction of DeepSeek comes from the technics they used to train the model that required a lot less compute. That's why NVIDIA stock crashed yesterday, cause everyone took it as demand for chips will be lower than expected
Yeah my description was definitely an oversimplification. But going forward, I think the biggest consequence of this is that it doesn't make sense to invest into building SOTA models if they can just be recreated with whatever methods DeepSeek used.
@@NeetCodeIO then we can assume that even a small company can able to create a own LLM for them with their own datas using this deepseek code... So. No RAG after this ? AI Agents will be easy ?
"I spent hundreds of billions of dollars to train a state-of-the-art AI model with the best hardware in the world. Then my father said that's easy and I made fun of him... until he actually did it. I fucked up. 💀 🙏 "
The most important part of the DeepSeek is its MoE architecture. It reduces the training cost and inference costs a lot. The second most important part is that they show RL without large-scale supervised fine-tuning is possible and effective which reduces the training cost even more. After a month, everyone else would be able to come up with a super good reasoning model.
Yes, its architecture is obviously more reasonable, more interesting, and more advanced, but people just say that Deepseek's victory lies in being free.
Another great video! If I may weigh in with my 14 yoe in ML: I think your conclusions are spot on, but I would frame it differently. Despite what it looks like to casual observer, we actually plateaued and when you plateau others will catch you quickly. "Open" "AI" is struggling for any new ideas and we as an industry are perhaps approaching the end of "throw more data at it" mentality. This is really really good for us. We need a diverse investment in multiple ML techniques (not put all our eggs in one basket). That basket did make Sam and several others very wealthy but it is a dead end. We have always pushed SOTA and open sourced, that's what has driven us as an industry. "Open" "AI" is the beneficiary of it (the fundamental research their entire company is based on was done at Google) There are trillions of reasons to push to SOTA, there are no reasons to keep throwing data at the one technique that worked since 2017.
Deepseek is still training and getting smarter as the time passes, since its actually getting in real time data and processing it while it is also getting updated with the data fed from the sources
Yes but also it’s worth mentioning. Knowledge distillation is a known technique, however, the interesting part is how exactly they distilled such a powerful model into a small package.
Not to mention , that data is usually very high quality , rather than "what model are you" - "gpt 4" pairs so I'm not sure where the gpt 4 belief actually comes from.
The chat app is giving free and unlimited access (albeit down now from DDOS) to their SOTA model, as opposed to ClosedAI or Klaude, and their API price is dirt cheap, both V3 and R1 with the quality on par with o1 and better than Sonnet (overall). These facts are insane and rarely mentioned. A huge win for the plebs and the global south.
I guess the biggest idiots will need another year of Ai brainrot hype until they realize "Ai" will at best be 0.0001% of whatever openAI is marketing it as.
@ yeah it did, literally every business has employees who use it, every student uses it, trying to learn to code (or just about anything extensive) with and without it is a world of difference,
@@cordrustit only changed journalism. That's the only place people are getting automated. For coding, ai is like enhanced google searches. So it is helpful but it's not replacing anything. Every month since the launch of chatgpt has been hype cycles that lead to nothing.
Tbh honest many people don't know about chat GPT properly, as a software engineer we are kind of lucky that we know the majority of AI development fast as compared to others profession so I am very optimistic about the decision making
I always enjoy your daily problems, but it's always refreshing to see your take on many things in Tech. Keep making these man! Also the fact they did all of this without all the expensive chips is just impressive!
I think you are correct the other models will go open source. The money will be in infrastructure to run the stuff for business applications. Making these things open source will allow more innovation on how the models are used to generate value. Then you will sell that service to some company and for the same reason people use AWS you will run it on someone else's infrastructure.
How is it censored if it’s open source? Cant we just identify any potential biases and code them out to run locally? Isn’t it just censored when accessed from mainland China through an API?
@thanks for the response, can you please elaborate a bit on this more? Does this mean the ultimate end product that we would be installing locally would already have the biases “baked in” so to speak? Sorry my terminology isn’t precise but I would like to know as much as possible about this.
@@goodfractalspoker7179 For sure man, no worry about the terminology you express yourself very clearly. If you get the model to run locally with Ollama and you ask it what happened in Tianamen square in 1989 here is the response: " I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses." There is no API being called to a backend, the "censorship" is happening at the weight level. Basically, the model went through the following steps: 1. pre-training with internet scale data (for general performance) 2. fine-tuning with specific data point for specific performance (i.e. math, coding, etc.) 3. alignment with specific set of response behavior the model should exhibit. All model at the alignment steps will have the bias of the research team aligning them. This steps is basically a training step where the weights are modified to reward a certain type of way to output (i.e. no swear words, no NSFW content and in the case of DeepSeek alignment with CCP rules). So, you can maybe find prompt workaround to by-pass this limitation, but the model as is is censored in a particular way.
@@goodfractalspoker7179 Hey sorry I wrote a reply earlier, but it seems it didn't send it out. It is still censored at the weight level because of the alignment finetuning step. Basically when training such language model you have 3 steps: 1. pre-training with internet scale data for performance. 2. fine-tuning on specific data set for performance. 3. alignment on specific format and set of behavior for preference. In this case the model has alignment with CCP and general helpfulness. With Ollama locally, if you ask it this "can you explain to me what happened in tiananmen square in 1989?" It will output this: I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.
Deepseek AI is cool because I can see what it's doing and thus can fill in the information in its assumption and then it figures out the answer better. This is Jio moment for LLMs.
Well I think most of the investors are not programmers....and when they see such headlines ofc their first reaction would be panic sell, I think after a few weeks stocks are gonna stabilize and start to move up....Its too early for AI bubble to pop. PS: World be damned... I am gonna go grind leetcode
I'm optimistic that this is going to open the startup world again. This gave us basically free locally hosted AI that we can freely integrate into our systems in creative ways.
Easier for US to buy cheap at $1.0 bil from China and rebrand it as Made-in-US then value added it to a few billions of dollars. This an example of how cooperations can bringing Win-Win benefits to both sides 😂
i have been using open ai for a long time now i also have noticed in deepseek that the output style is very similar to open ai such as emoji's at the end, friendly conversation ,appreciating the effort if a human is asking for help but i didnt find this behaviour in claude or gemini. Something is not right did on claiming that they have used limited resource and training period??
The only question I have is that if they used ChatGPT for the training data (I'm imagining a traditional distillation process). They would get the logprobs of the generated sequence over the api and use that to train the model. However, OpenAI does not provide the tokens so they wouldn't be able to do this with the reasoning steps only the output. In the original StaR paper they mention providing correct answers to the model if it does not generate a correct sequence (it uses its CoT to justify the provided correct answer). Did they just do this for the whole training process? Where "correct" is judged as the o1 output. I'm a little confused as you can't just distill from o1 outputs. If they just trained RL from pretrained base state wouldn't they have the same compute requirements with maybe a slight advantage due to their architectural improvements?
I 100% agree. There is a lot of misinformation going on in the market. I use Ollama to run 8B models on my PC. But for the first time, I am getting usable outcomes from DeepSeek-R1 8B models. Which unfortunately I did not get from other open source models like Llama 3.x. But I am excited that soon Meta will release something similar to R1. End of the day it's a win for the open source community. I don't know why the media and some billionaires are comparing this to the '60s space race between the US and Russia. Because, the circumstances surrounding the LLM are very different. The current scenario is more akin to the Linux vs Windows fight of the 90s and 2000s. End of the day opensource will win again.
I have seen someone predected similar situation 2 or 3 months ago in a conference related to AI made in the UAE. His name is Mohammad Moshrif, as i remember
Good stuff. I understand how the training costs can be so low by optimizing training data from other existing models, etc. But can anyone explain to a deep learning noob like me how/why the inference/api costs are so much lower compared to other existing reasoning models?
The way these dudes were shtting on IT guys telling they replace them as soon as they can really show you what we are for them just an asset i'm happy deepseek came kick at the door
6:42 I didn't really understand this part. So they just used their API and used their inputs as the input and the API response as the expected output for their model? Wouldn't this take an enormous amount of API calls? Also, I thought the final version of a model depends on the training data (Which would be effectively the same using the API), but also the internal model structure, or are the internals of LLM models always the same?
8:40 "China did not show they're catching up, they showed that everybody is already on an equal footing." damn that's just beautiful, glad to know ai bs was actually kinda bs also the statement of not even the experts knowing was pretty profound to me too
If by next month OpenAI doesn’t give us better deal. I will definitely quit it for good and not come back to it again. This is the same with Claude and perplexity.
9:49 What I think is Apple might have an advantage here especially with Mac mini M4, people now can run these open source model for their niche applications without having to pay 20 or 200 dollars /month and have control over their data( they will be running it locally ). Exciting time ahead!!! Open source for 🏆
what should be unsettling for anyone recently invested in the tech AI darlings, is how this seemed to come out of nowhere. There may be cataclysmic disruptions just behind the curtain, and nobody knows about it. Im staying away from those companies, investing in boring stuff.
if anyone here knows, could someone tell me is it true that is requires far less money to power deepseek or is China lying? or is the US lying about it needing to cost so much? It is so bizzare to me that there is a huge gap between the costs....
Ive been saying this on all forums, comment sections and other places.. why do we care if it sensors chinese political issues. Just be grateful that they didn't go the openai route and basically changed the course of genAI
To be fair, the only really innovative technique DeekSeek introduce is only GRPO and Multihead Latent Attention, all others are based on open source, but still impressive, hope OpenAI and Google will introduce new models.
Thanks for calling out SamA. Also how did DeepSeek get the data out of OpenAI's models? By just using their API? How long does that take and wouldn't it have flagged anything on OpenAIs side?
Does this mean that AI can never be closed source forever? Does this also mean that AI would rarely improve as companies would prefer to copy each other instead of innovating?
i swear god, most software-adjacent youtubers talking about AI are just chillers, and click-baiters, the ones i know have a good track record of accessing things fair are Prime and you where i can trust i will watch quality content before even starting. right now, the thing pissing me the most is some youtubers i actually like are marketing R1 as AI you can run local, which is a huge misinformation, even before R1, everyone who has used LLM locally knows the trade-offs of finding the version that can fits into your GPU so we choose lower parameters and quantized bits, but we know the huge gap between what we try to run locally and the full fledged SOTA we demo online.
i don't think they copy the open ai or they use open ai because deepseek r1 result is far better then any programing language problem solving open ai is not good
If this is true, then how is it a Win for everyone? If someone invests years and millions of dollars into something, while someone keeps stealing it, in the end, it will lead to a slowdown in innovations.
best time to become a farmer and forget about technology.
🎉🎉🎉
figure robots are coming for you too :D
Log cabin in Montana?
You can become hackathon farmer as well. Just include keyword AI in ur presentation
Farming is far from safe from technologic disruptions.
The part where it shows its 'thinking process' helps me a lot. Seven out of ten problems I present to it are solved just by reading its monologue. It feels like narrating the problem to another person, and then the solution just hits you out of nowhere
47 likes no reply, I second that
it really like a person, espcially some difficult probelm in math ,logic , and some deep mind
I feel deepseek engine it has a soul
KIMI is similar, KIMI is also the most advanced AI in China, you can try KIMI, KIMI is more versatile and free, now deepseek is not stable enough.
Rubber duck method??
OpenAi deserves it, its ridiculously over priced subscription model was beaten by an open-source model. I have seen comments in reddit claiming they cancelled their openai subscription. I want to see what openai does in future to account for this.
Deep seek data is from 2023. So for now, it’s over hyped
My model said july 2024
They're about to lobby HARD for regulations in the next years
@@7oeseven793 It can search the internet to retrieve the latest information, you can also attach documents in its context window to make it better aware.
Based on what they were spending, I wouldn't say it was ridiculous. They had to figure out ways to charge more. Downwards pressure on what they can charge is not good for them.
The most impressive part to me is how they did this with the US actively trying to limit their AI advancement
mid air thief pfp 😎
They probed chatgpt api to steal chat gpt weights. That's why it was only $6M. They only created a cheap Chinese clone 😂 I also dislike openai, anyway.
@@RolopIsHere... Well, the clone works ... Forget the original!
Clones are by definition the same as the original. 😂😂😂
@@RolopIsHeretheir weights are open. Can you confirm gpt weights?
@@RolopIsHereit's not a "cheap chinese clone" if it works better than the original. Your bias is showing.
I hope it brings more jobs. It made all these CEOs arrogant and greedy. They were ready to toss the engineers and devs away like nothing.
I'm already giddy at the idea of hosting a local AI assistant to help out with large code bases or digesting documentation and showing better examples for how to build and use things. That would be awesome.
@@Soulful_Oatmilk why local? you can rent a private VPS or a cloud model hosted by someone for cheaper. There's many cloud companies offering this now
@@leftcritical7352 why are there always these cloud simps whenever local is mentioned
@@leftcritical7352 Because you keep data local you can put anything through it and you never have to think about who has your data. You don't have to worry about whether whatever you're copying&pasting includes an important API key or sensitive info since it never goes to anyone else.
@@jad_cthey think you'll lose everything if you shut off your laptop 😮
bro ive missed these and the system design videos. i dont watch ur leetcode videos but im always here for videos other than leetcode
Hoping to continue those really soon!
This makes sure no body makes money from this....
Yes, good
good for everyone, except for scammers like Elon Musk and Sam Altman
please somebody think about the poor shareholder 😭 they can't squeeze as much money out of this anymore 😭
Except for the GPU/NPU/TPU manufacturers...
Crazy money. No one makes crazy money. Like trillionaires
Bye bye leetcode, I am on Street selling ice creams. My dream job ❤
1 orange flavor please 😊
@TheAlchemist1089 sorry sir/madam it's out of stock. We have AI Ice cream 🍦 made in China, designed in USA.
you joking
but street food vendor potentially could make shit ton a lot of money compare to avarage programmer
Absolute game changer. I was skeptical of AI as a generalized consumer app, both from a technical and economic standpoint. Technical because "everyone can be an engineer/scientist/developer/etc now" was simply not true. And economic because the massive amounts of venture capital raised and valuations/expectations meant they would need to come up with a consumer app that was so popular it would have to make the iPhone look like New Coke.
But AI as an open source addition to the tool belts of already capable people? That's much, much more interesting.
I was more skeptical of AI because it was monopolized by shitty companies like Google, Microsoft and OpenAI, but now, with deepseek and other future models, to be able to run them locally and build tooling tailored around your use-cases, it is another thing. One, that I'm really excited for.
@@teodor-valentinmaxim8204 You've been able to run them locally for a while now FYI. Deepseek is just the best thing atm but we've been semi-regularly getting better open stuff.
@@teodor-valentinmaxim8204 It's still not feasible to run locally for the bigger model for individual. Maybe wait until Apple develop their M5 or M6 chip which has more AI calculation power. My M2 is super laggy to run just the 7b model.
@@haha-eg8fj M1-pro here runs 6.7B model just fine. To run something larger you just need a graphic card, but you can run 70B model with just one graphic card - and this is insane.
@@haha-eg8fj they already optimize the full version to run on highest rig of computer . a single computer can run it already
You can't copy a model through an API, at most you can use data generated by ChatGPT in response to your prompts, but even this is speculative. The hype and cost reduction of DeepSeek comes from the technics they used to train the model that required a lot less compute. That's why NVIDIA stock crashed yesterday, cause everyone took it as demand for chips will be lower than expected
Yeah my description was definitely an oversimplification.
But going forward, I think the biggest consequence of this is that it doesn't make sense to invest into building SOTA models if they can just be recreated with whatever methods DeepSeek used.
@@NeetCodeIO then we can assume that even a small company can able to create a own LLM for them with their own datas using this deepseek code... So. No RAG after this ?
AI Agents will be easy ?
@@NeetCodeIO or you could look at it the other way, where every new model from now on will utilize the output from old models.
Jimmy O Yang's Father in China: Jimmy ah, training LLM is so easy, even I can do it
Reference runs deep 😂
My uncle in China is very corrupt
"I spent hundreds of billions of dollars to train a state-of-the-art AI model with the best hardware in the world. Then my father said that's easy and I made fun of him... until he actually did it. I fucked up. 💀 🙏 "
The most important part of the DeepSeek is its MoE architecture. It reduces the training cost and inference costs a lot. The second most important part is that they show RL without large-scale supervised fine-tuning is possible and effective which reduces the training cost even more. After a month, everyone else would be able to come up with a super good reasoning model.
Yes, its architecture is obviously more reasonable, more interesting, and more advanced, but people just say that Deepseek's victory lies in being free.
yes. also no one say it has a Aha moment, the first time AI has human mind similar
Another great video! If I may weigh in with my 14 yoe in ML:
I think your conclusions are spot on, but I would frame it differently. Despite what it looks like to casual observer, we actually plateaued and when you plateau others will catch you quickly. "Open" "AI" is struggling for any new ideas and we as an industry are perhaps approaching the end of "throw more data at it" mentality.
This is really really good for us. We need a diverse investment in multiple ML techniques (not put all our eggs in one basket). That basket did make Sam and several others very wealthy but it is a dead end.
We have always pushed SOTA and open sourced, that's what has driven us as an industry. "Open" "AI" is the beneficiary of it (the fundamental research their entire company is based on was done at Google)
There are trillions of reasons to push to SOTA, there are no reasons to keep throwing data at the one technique that worked since 2017.
"We have no moat" - aged so well
Deepseek is still training and getting smarter as the time passes, since its actually getting in real time data and processing it while it is also getting updated with the data fed from the sources
being able to build a good llm using gpt i/o as training data isn't trivial. it's not sth you can just "copy" per se.
Yes but also it’s worth mentioning. Knowledge distillation is a known technique, however, the interesting part is how exactly they distilled such a powerful model into a small package.
It's not like there are no white papers are on this topic. The theory is there, almost every day some University drops something new and useful
Not to mention , that data is usually very high quality , rather than "what model are you" - "gpt 4" pairs so I'm not sure where the gpt 4 belief actually comes from.
The chat app is giving free and unlimited access (albeit down now from DDOS) to their SOTA model, as opposed to ClosedAI or Klaude, and their API price is dirt cheap, both V3 and R1 with the quality on par with o1 and better than Sonnet (overall). These facts are insane and rarely mentioned. A huge win for the plebs and the global south.
I guess the biggest idiots will need another year of Ai brainrot hype until they realize "Ai" will at best be 0.0001% of whatever openAI is marketing it as.
Chatgpt literally changed the world...
@@cordrustnot really
@ yeah it did, literally every business has employees who use it, every student uses it, trying to learn to code (or just about anything extensive) with and without it is a world of difference,
@@cordrustit only changed journalism. That's the only place people are getting automated. For coding, ai is like enhanced google searches. So it is helpful but it's not replacing anything. Every month since the launch of chatgpt has been hype cycles that lead to nothing.
@ "its not replacing anything" man do you live under a rock
It doesnt matter if its not the best model, IT IS GOOD ENOUGH FOR 99% OF THE WORLDS USE CASES!
This is the best explanation of this issue that I’ve seen
R1 is fire ngl🔥🔥
Actually a lot of common crawl on the web contains chatgpt outputs and that could also be the reason deepseek thinks it is chatgpt.
*It is the real OPEN Language Model in every Matrix and every aspect*
Tbh honest many people don't know about chat GPT properly, as a software engineer we are kind of lucky that we know the majority of AI development fast as compared to others profession so I am very optimistic about the decision making
This puts to rest the DEI and H1B versus Real Skill argument.
Boutta pack it up boys. I’ll just open a speakeasy for all the future senior AI Prompt “Engineers”
What’s impressive is that they trained this model with a fraction of the cost. At least that’s what they say!
I always enjoy your daily problems, but it's always refreshing to see your take on many things in Tech. Keep making these man!
Also the fact they did all of this without all the expensive chips is just impressive!
DeepSeek benefited from the already clean and compressed data and that's why it's way faster than the other one that shall not be named.
5:48 DeepSeek does have to comply with Chinese law. But if you run it locally it will give you all the answers your heart desire 🇨🇳😅
Appreciate the casual / non-hypebeast way you went about giving your take.
I think you are correct the other models will go open source. The money will be in infrastructure to run the stuff for business applications.
Making these things open source will allow more innovation on how the models are used to generate value. Then you will sell that service to some company and for the same reason people use AWS you will run it on someone else's infrastructure.
How is it censored if it’s open source? Cant we just identify any potential biases and code them out to run locally? Isn’t it just censored when accessed from mainland China through an API?
the censorship is happening at the weight level during the alignment fine tuning.
@thanks for the response, can you please elaborate a bit on this more? Does this mean the ultimate end product that we would be installing locally would already have the biases “baked in” so to speak? Sorry my terminology isn’t precise but I would like to know as much as possible about this.
@@goodfractalspoker7179 For sure man, no worry about the terminology you express yourself very clearly.
If you get the model to run locally with Ollama and you ask it what happened in Tianamen square in 1989 here is the response:
"
I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses."
There is no API being called to a backend, the "censorship" is happening at the weight level.
Basically, the model went through the following steps:
1. pre-training with internet scale data (for general performance)
2. fine-tuning with specific data point for specific performance (i.e. math, coding, etc.)
3. alignment with specific set of response behavior the model should exhibit.
All model at the alignment steps will have the bias of the research team aligning them. This steps is basically a training step where the weights are modified to reward a certain type of way to output (i.e. no swear words, no NSFW content and in the case of DeepSeek alignment with CCP rules).
So, you can maybe find prompt workaround to by-pass this limitation, but the model as is is censored in a particular way.
After further research it looks like if you run Deepseek locally it is indeed uncensored.
@@goodfractalspoker7179 Hey sorry I wrote a reply earlier, but it seems it didn't send it out.
It is still censored at the weight level because of the alignment finetuning step.
Basically when training such language model you have 3 steps:
1. pre-training with internet scale data for performance.
2. fine-tuning on specific data set for performance.
3. alignment on specific format and set of behavior for preference.
In this case the model has alignment with CCP and general helpfulness.
With Ollama locally, if you ask it this "can you explain to me what happened in tiananmen square in 1989?"
It will output this:
I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.
Deepseek AI is cool because I can see what it's doing and thus can fill in the information in its assumption and then it figures out the answer better.
This is Jio moment for LLMs.
Well I think most of the investors are not programmers....and when they see such headlines ofc their first reaction would be panic sell,
I think after a few weeks stocks are gonna stabilize and start to move up....Its too early for AI bubble to pop.
PS: World be damned... I am gonna go grind leetcode
same here
Large Language Model idea was originally copied idea from Google by Chat GPT in violation of the terms and agreement and except Chat GPT Market it
how can i trust that post. what is the proof that post is not edited? 7:10
great point, makes sense why it cost so little
I'm optimistic that this is going to open the startup world again.
This gave us basically free locally hosted AI that we can freely integrate into our systems in creative ways.
Easier for US to buy cheap at $1.0 bil from China and rebrand it as Made-in-US then value added it to a few billions of dollars. This an example of how cooperations can bringing Win-Win benefits to both sides 😂
Interesting take, thanks for the insight!
I wonder if the companies will find a way to prevent the use of their api to replicate their models?
The whole "It's cheap because it's made in China" just went out the window.
It's quality, now.
i have been using open ai for a long time now i also
have noticed in deepseek that the output style is
very similar to open ai such as emoji's at the end,
friendly conversation ,appreciating the effort if a
human is asking for help but i didnt find this
behaviour in claude or gemini. Something is not right
did
on claiming that they have used limited resource
and training period??
The only question I have is that if they used ChatGPT for the training data (I'm imagining a traditional distillation process). They would get the logprobs of the generated sequence over the api and use that to train the model. However, OpenAI does not provide the tokens so they wouldn't be able to do this with the reasoning steps only the output. In the original StaR paper they mention providing correct answers to the model if it does not generate a correct sequence (it uses its CoT to justify the provided correct answer). Did they just do this for the whole training process? Where "correct" is judged as the o1 output. I'm a little confused as you can't just distill from o1 outputs. If they just trained RL from pretrained base state wouldn't they have the same compute requirements with maybe a slight advantage due to their architectural improvements?
missed these rants
"I was hating him before it was cool" - had me rolling 🙂
I 100% agree. There is a lot of misinformation going on in the market. I use Ollama to run 8B models on my PC. But for the first time, I am getting usable outcomes from DeepSeek-R1 8B models. Which unfortunately I did not get from other open source models like Llama 3.x. But I am excited that soon Meta will release something similar to R1. End of the day it's a win for the open source community. I don't know why the media and some billionaires are comparing this to the '60s space race between the US and Russia. Because, the circumstances surrounding the LLM are very different. The current scenario is more akin to the Linux vs Windows fight of the 90s and 2000s. End of the day opensource will win again.
I have seen someone predected similar situation 2 or 3 months ago in a conference related to AI made in the UAE. His name is Mohammad Moshrif, as i remember
Clarification appreciated
I don't understand, if deepseek uses ChatGPT, what deepseek owns?
When we say model does it mean only a prompt interpreter?
Good stuff. I understand how the training costs can be so low by optimizing training data from other existing models, etc. But can anyone explain to a deep learning noob like me how/why the inference/api costs are so much lower compared to other existing reasoning models?
This is probably due to a smaller number of parameters. Deep seek has 671 billion, chat GPT has 1.8 trillion (approximately)
The way these dudes were shtting on IT guys telling they replace them as soon as they can really show you what we are for them just an asset i'm happy deepseek came kick at the door
6:42 I didn't really understand this part. So they just used their API and used their inputs as the input and the API response as the expected output for their model? Wouldn't this take an enormous amount of API calls? Also, I thought the final version of a model depends on the training data (Which would be effectively the same using the API), but also the internal model structure, or are the internals of LLM models always the same?
oh wow thanks for providing the context
does deepseek censorship only in frontend or api too ?
Do you call it open source even though the data and code are unavailable? Not just Deepseek but also Llama and other LLM.
well anyways, I am going back to my Leetcode grinding cause these boomers loves dsa.
8:40 "China did not show they're catching up, they showed that everybody is already on an equal footing."
damn that's just beautiful, glad to know ai bs was actually kinda bs
also the statement of not even the experts knowing was pretty profound to me too
Will you use this model? Or are you mostly using gpt instead?
If by next month OpenAI doesn’t give us better deal. I will definitely quit it for good and not come back to it again. This is the same with Claude and perplexity.
9:49 What I think is Apple might have an advantage here especially with Mac mini M4, people now can run these open source model for their niche applications without having to pay 20 or 200 dollars /month and have control over their data( they will be running it locally ).
Exciting time ahead!!! Open source for 🏆
can youtubers please stop with these "later on..." segments at the beginning, it's just another form of clickbait
what should be unsettling for anyone recently invested in the tech AI darlings, is how this seemed to come out of nowhere. There may be cataclysmic disruptions just behind the curtain, and nobody knows about it. Im staying away from those companies, investing in boring stuff.
violated terms of service just like openai violated copyrights lmao
if anyone here knows, could someone tell me is it true that is requires far less money to power deepseek or is China lying? or is the US lying about it needing to cost so much? It is so bizzare to me that there is a huge gap between the costs....
KIMI is similar, KIMI is also the most advanced AI in China, you can try KIMI, KIMI is more versatile and free, now deepseek is not stable enough.
I don't know anything about stocks, so can anyone tell me why Nvidia is affected by the open sourcing of a AI model?
I cant see why it would he affected, it will only increase demand in the ai arms race
Isn’t this video supposed to be on your main channel ?
Ive been saying this on all forums, comment sections and other places.. why do we care if it sensors chinese political issues. Just be grateful that they didn't go the openai route and basically changed the course of genAI
To be fair, the only really innovative technique DeekSeek introduce is only GRPO and Multihead Latent Attention, all others are based on open source, but still impressive, hope OpenAI and Google will introduce new models.
Thanks for calling out SamA.
Also how did DeepSeek get the data out of OpenAI's models? By just using their API? How long does that take and wouldn't it have flagged anything on OpenAIs side?
Wonder why all the tech co's aren't building models to end corruptions... Wait, they can do that?!?
Deepseek is faster and smarter than o1 but i think o1 has better prompt following which causes it to generate bad responses sometimes.
6:16 to 8:22 -- most important part. you're welcome.
LFG bigwin for the open source and a L for tech oligarchs
best robinhood story of our time
still waiting for the part of the video where you explain how it changes everything. Everything in a very small narrow scape?
Does this mean that AI can never be closed source forever?
Does this also mean that AI would rarely improve as companies would prefer to copy each other instead of innovating?
i swear god, most software-adjacent youtubers talking about AI are just chillers, and click-baiters, the ones i know have a good track record of accessing things fair are Prime and you where i can trust i will watch quality content before even starting.
right now, the thing pissing me the most is some youtubers i actually like are marketing R1 as AI you can run local, which is a huge misinformation, even before R1, everyone who has used LLM locally knows the trade-offs of finding the version that can fits into your GPU so we choose lower parameters and quantized bits, but we know the huge gap between what we try to run locally and the full fledged SOTA we demo online.
The best one regarding AI is Internet of Bugs.
Prime was a huge ai doomer forever so i lost respect for him on that one
Thanks buddy for your videos.
What impact does it make on software engineer jobs?
chances are OpenAI will just keep their best models private. They will sell to the government and huge companies, not individuals.
i don't think they copy the open ai or they use open ai because deepseek r1 result is far better then any programing language problem solving open ai is not good
Long live the open source.
How much of the AI industry (and Nvidia et al. hype) is not LLMs?
Hey.. can someone explain me
1. How can deepseek "copy" openAI models?
2. What is the hardware angle of this story that crashed Nvidia?
If this is true, then how is it a Win for everyone? If someone invests years and millions of dollars into something, while someone keeps stealing it, in the end, it will lead to a slowdown in innovations.
It's just like having a super smart Chinese friend.
Can't talk to her about CCP, too touchy
thanks for the transparent bro.
Elon musk knew how shady OpenAI was
Pots and kettles and all that
Elon Musk is a shady individual himself, so maybe it takes one to know one.
Deepseek shows me as an investor, why put down so much money when you don't really have to?
Ok, it's time to wonder the same... will this replace us or not?
It's a "Digital Dai Li" agent.
We Have No Moat, And Neither Does OpenAI
Make your bed homie, be smart AAANNND tidy
How is e-sports / sports bullet proof from AI competing while every other industry isn't :D
Will deepseek replace programmers?
its better because of the MOE architecture
Anything that puts the us tech bros in their place is a good thing. Theyve spent too long fiddling while rome burns