DeepSeek R1 vs ChatGPT o1 - Ultimate Test

Skill Leap AI

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 30 янв 2025

Комментарии • 68

@MBS_KSA 6 часов назад ⁺⁶²
Open source Deepseek is truely a gift for mankind. You can run the full model with about $6000 hardware budgets, completely free and with 100% privacy… This does huge help to some scientific scholar researches that need to protect data security. we never thought this could happen back in 2024, where OpenAI was the only option for reasoning model.
@marcus3of5 4 часа назад
$6000 to run the current R1 model on your own hardware for a midsize companies is probably not too bad. But in six months or so can or will they be able to update it? And what does it mean for that $6000 machine?
@barryneild379 2 часа назад ⁺¹⁵
I have read many papers but to sum up R1 has five main advantages *1) it gives you the reasoning behind its thoughts, you can find and tell it to correct it's mistake if you find one 2) it is much more DEPLORABLE it's like when they first invented Personal Computer PC!! You don't have to have a huge Data center or large amount of GPUs to run it, in fact, you can even run it on your phone without internet 3) it is cheaper and faster of course 4) most of all it is free 5) open source so you can open you can edit it update it any way you like*
Any of the reasons above should be a game changer by itself but combination of five you got a stock crash like yesterday
@thesidneychan Час назад ⁺²
Do you mean "Deployable" and not "deplorable"?
@e8root 3 часа назад ⁺²¹
Locally executed deepseek-r1:32b (only 18.48GB file !!!) on RTX 4090 ran mostly faster than in your video and was still very verbose and understandable.
question 1: pass
question 2: pass
question 3: pass
question 4: pass
question 5: pass - note: this one took quite a bit longer but answer didn't have any formatting issues
question 6: pass - note: slightly different emphasis on last point - 'she' but explained correctly
question 7: pass
question 8: pass - note: also pass for misspelled word
I didn't bother with comparing privacy policies.
Overall I am very impressed a 18.49GB model can actually reason better than average human being and has more specialized knowledge than average human being... really, it is totally nuts and especially if you consider this thing can read and write in most languages known to man and I don't need to have internet connection to have access to it. It runs on the same GPU as your average path traced game... yeah, it does need some processing to be snappy and nice to use. In theory however any computer with 32GB RAM should be able to mull through all the tokens and generate similar responses.
Now I really feel like I am living in XXI century.
Only flying cars and we are also flying and not only talking!
@taijistar9052 2 часа назад
How much does your computer costs? What is the specific R1 model? R1 14G?
@morneauh 2 часа назад
@@taijistar9052 It's the 32B Model, it run on a mac (M series) with 32gb ram.
@FragmaKepi 2 часа назад
I ran the Distill Llama 8B version locally and got these results:
1. Pass
2. Pass, though it's the same single answer as o1
3. Pass, note: it got the u≈0.8879c in answer section though the final answer in squares is rounded to 0.89c
4. Fail, it said the chicken came before the egg.
5. Fail, it answered "The missing $3 was taken by the waiter as an unintended profit."
6. Pass
7. Pass
8. Pass
9. Failed, it said there are 3 "r"s, it didn't see the misspelling and counted using "strawberry" instead.
Not the best result but I also thought it did very well for a 8B model that fits on my 9 year old RX470 8G, way better than any other similar sized model I've tried.
@gabrielsandstedt Час назад
What quantization did you use? And top p and temperature?
@sijiolawumisijinius Час назад
lol @ ''Now I really feel like I am living in XXI century.''
@trilokatmadasa3180 5 часов назад ⁺⁶
Okay, looks like ChatGPT was already updated after your video :) because it gave the correct answer:
Version 4o quickly answered:
"9.9 is bigger than 9.11. This is because 9.9 is equivalent to 9.90, and when comparing 9.90 vs. 9.11, 9.90 is clearly larger."
And Model 01 took 39 seconds to respond but also gave the correct answer:
"Mathematically, 9.9 (which you can think of as 9.90) is larger than 9.11 (9.110). The integer part (9) is the same, but .90 is greater than .11, so 9.9 is the bigger number."
@marcus3of5 4 часа назад ⁺³
Revisit this in six months. I’m betting that the whole landscape looks different.
@jomdizon6930 2 часа назад ⁺⁴
Deepseek is currently experiencing massive ddos attack that's why sometimes it won't load
@xLBxSayNoMo 4 часа назад ⁺⁴
Nice test. Been using deepseek a lot and it was much much faster before the malicious attacks started occurring.
With that being said, it feels like o1 has been either dumbed down or they lowered the thinking time, as that would explain getting questions wrong like the 9.9 vs 9.11 question, as it used to get that right. Even 4o would get that right. Seems like they always dumb down their models before they release a new model so you can see the "vast improvement", in this case before o3 mini releases within the next few days.
@sollinton 6 часов назад ⁺⁵
Interesting note is that deepseek had an error in the misspelled strawberry prompt. It had the right number of letters for that, but then it went on to say: "Note that the correct spelling of the fruit is "strawberry" (with 2 "r"s)"
@aidenxie692 Час назад
it means 2 consecutive "r" s at the end, not 3. doesn't mean 2 in total.
@ChristiaanRoest79 6 часов назад ⁺¹⁰
Great comparison. In my case o1 answered it correct that 9,9 is more than 9,11. Very weird. I tested Alibabas new Qwen 2.5 and i really like it for generating free images.
@SkillLeapAI 6 часов назад ⁺²
Yea I don’t know why sometimes it gets the wrong. I’ve tried the qwen website. I keep getting errors
@mijmijrm 56 минут назад ⁺¹
is 9.11 bigger than 9.9.
It depends on context. If it's a version number 9.11 could be regarded as bigger than 9.9
@ricardourroz9322 3 часа назад ⁺¹
Also, at the end you mention de privacy comparison which is great also. Quick question? Doesn't LLM keeps evolving and improving? So if you download locally would be as effective, or you would have to keep downloading newer versions? And what is the minimum version you think is worth downloading?
@weevil601 7 минут назад
You have to keep downloading newer versions to get improvements. Once you download one, it is frozen at whatever "intelligence" it has and will never improve. Models do not evolve on their own, at least not yet. Maybe in the future, someone will come out with one that does, but for now, no.
@justthisguyyouknow666 20 минут назад
I love these AI comparisons. Gotham Chess just did a ChatGPT vs. Deepseek chess game, with um, interesting results.
@aelaan12 6 часов назад ⁺²
You could install them locally with ollama and install a simple web frontend to use it. Might it be as fast? Well I am running pretty beefy gear, and I am not disappointed.
@omidnoorshams4280 4 часа назад ⁺¹
it seems R1 is similar to O1 Pro. Do you agree?
@ricardourroz9322 3 часа назад ⁺¹
Great video as usual. I saw your video on the privacy settings on DeepSeek and worried me a little. What are your thoughts on that regard? Not sure if I should just accept that or stick with the free ChatGPT. Any thoughts?
@SkillLeapAI 3 часа назад ⁺²
I think the fact that you can’t opt out of training data is a big problem with deepseek. Mainly if you use it for work. The other privacy issues are personal preference
@smk1795 3 часа назад ⁺¹
can you build a pc and run the actual r1 model. i mean if we can do that locally, it will completely wipe of need of paid chatgpt.
as there will be zero privacy risk of running R1model (whole r1) locally.
@alexj2869 15 минут назад
DS is being cyber attacked
@Team_SamayRaina 6 часов назад
i am using macbook m1 8 gb ram i want to use deepseek r1 for my studies i think the seeing the thinking process to answer anyquestion is great for me to lear fater but i faced many performance isssues like if i use 1.5b model then thats not able to answer my question in great way like how bigger model does even the 7b but i am faceing too much performance issues so how to use quantized version for faster and smoother experience
@trilokatmadasa3180 4 часа назад ⁺¹
Hi, thanks for the awesome video! I just wonder if PROJECT DIGITS is good enough to run DeepSeek locally-I believe it is. It seems like perfect timing.
I'm a big fan of ChatGPT because, at this point, it's so personalized that it almost feels like I'm in the ChatGPT ecosystem-very practical. But I believe that at some point, having something like DeepSeek, the highest/top model, locally might be the answer.
What do you think?
@lahih 6 часов назад
love your videos. just started with ai stuff , running 8b locally thanks to you 😊 is there any way to modify it and connect to internet? like telegram bot api?
@SkillLeapAI 6 часов назад
Thanks. I don't know a way to add web access to the local installs
@morneauh Час назад ⁺²
These models pass:
Question 1: GPT 3.5, 4o, Sonnet 3.5, Deepseek V3, Qwen R1 32B
Question 2: GPT 3.5 (1/2), Sonnet 3.5 (1/2), Deepseek V3, Qwen R1 32B
Question 3: 4o, Sonnet 3.5, Deepseek V3, Qwen R1 32B
Question 4: 4o, Deepseek V3, Qwen R1 32B
Question 5: GPT 3.5, 4o, sonnet 3.5, Deepseek V3, Qwen R1 32B
Question 6: GPT 3.5, 4o, sonnet 3.5, Deepseek V3, Qwen R1 32B
Question 7: 4o, Deepseek v3, Qwen R1 32B
Question 8: 4o, Deepseek v3, Qwen R1 32B
Not a single question Deepseek V3 got wrong, so the reasoning was not even necessary for those questions.
Qwen R1 is the distilled version of Deepseek R1 at 32B
@SkillLeapAI Час назад
Interesting - thanks for sharing
@AymanDonia-g8u 6 часов назад ⁺²
Try Gemini flash thinking
@SkillLeapAI 6 часов назад ⁺¹
Yea it's pretty good. On my list of videos
@aelaan12 6 часов назад
Where it proofs that you do not need to finish first if you think right, just as my math teacher taught us long time ago.
@taijistar9052 2 часа назад
The speeds measure doesn’t mean anything because they are hosted in different servers and with different users!
@hummingbirdman 2 часа назад
if you turn on chatgpt o1 "think", it would take longer to think. Also, I tried deepseek counting r's in strawberrry, without the r1, and it was actually very fast, but with r1, it took a lot longer.
@SkillLeapAI 2 часа назад
oh interesting. Thank you for that
@benjamindecosta863 4 часа назад ⁺²
Tried DeepSeek to summarize a book and provide main purpose for each chapter. It got it wrong even when informed of error. Wrong again and again.
@meepowned4613 Час назад ⁺¹
worked fine for me
@AymanDonia-g8u 7 часов назад ⁺³
Deepseek is free
@OA-Wanted 8 минут назад
it just need text to speech just like openai and it will be perfect
@larrysmith8426 3 часа назад ⁺¹
GOD says that HE made chickens first according to Genesis in the Bible.
@damien2198 Час назад
Files upload & internet search make DS far more useful/powerful
@henrythegreatamerican8136 Час назад
Uhhh ChatGPT has both of those. And ChatGPT lets you upload videos and images for analysis whereas deepseek doesn't.
@SkillLeapAI Час назад
o1 doesn’t have search and more limited file upload. Like you can upload csv even though you can do both with 4o
@damien2198 47 минут назад
@@henrythegreatamerican8136 O1 has no search and only accept image upload, basically useless
@henrythegreatamerican8136 43 минуты назад
@@damien2198 Then I must of used a different version of ChatGPT. I was uploading numerous images yesterday of homes for analysis to see if the rooftops were metal or some other material.
No version of Deepseek let me do that.
@Leto2ndAtreides 30 минут назад
DeepSeek's Chinese website is more of a basic prototype... Not something with billions in funding, or much need to care a ton about how international users respond.
I imagine they don't even really want all the international users that are increasing their compute costs while not paying anything.
If the site ends up becoming a serious product, they'll probably add in more data controls and such.
@motess5304 Час назад
How did 01 miss one(early)? Your question was "Which Animals (and how many of each) did you buy"? You never asked how many variations are there that fit this answer. 01 provided a correct answer.🙄
@SkillLeapAI Час назад
There is from a test. Answer key had two answers. Every other model gave me two answers also. Pretty straight forward. Omitting another correct answer makes it incorrect or at least half correct.
@Michael20545 2 часа назад
But why are you using DeepSeek - it has NO OPT-OUT OPTION!!!!!11 So scary.
@SkillLeapAI 2 часа назад
Yea I pointed that out as the number one issue in the video
@scrollop 6 часов назад
Thanks, you should try o1-preview which I find is better than 01
@Producermiles 5 часов назад ⁺¹
I really enjoy Deepseek and how they show their reasoning. But i still find the answers they provide to be misleading about 40% of the time.
I found when i asked it to reason through subject areas where i know the answers to, they will still make up answers at an alarming rate.
While other models will do the same, if call it out they will admit they were making things up and then if they dont know how to answer it they will tell me
Deepseek tends to admit as well, but keep providing incorrect information.
The steps are amazing, but not convinced overall.
I tend to use their reasoning to then work through problems and ask other models.
Anyone else have similar experiences?
@micbab-vg2mu 7 часов назад ⁺¹
check new OpenAI deep thinking option. This amaizng update make o1 Pro - dummer - maybe it is o3 :)
@SkillLeapAI 7 часов назад ⁺¹
Oh you are ahead of me. How do I turn that on?
@micbab-vg2mu 6 часов назад
@@SkillLeapAI Choose "O1 Pro" and click the lightbulb next to the paperclip-the model works 10× faster but is 2× dummer.
@belkassem06 3 часа назад
❤❤❤❤❤DeepSeek ❤❤❤❤❤❤
@KenMastersonMasters 3 минуты назад
save all the animals stop wearing them next to your dixxx
@KenMastersonMasters 3 минуты назад
agreed save family nucleus m and f is only true marriage by divine providence
@faisal-anqoudi 7 часов назад ⁺³
Deepseek is free
@KenMastersonMasters 3 минуты назад
save all the animals stop wearing them next to your dixxx

Следующие

Автовоспроизведение

I tested DeepSeek vs. OpenAI-o1 for data science tasks: Here’s what I found.