I believe this is really the challenge here. The chain-of-thought prompt optimization and the model capabilities are both influencing the results. To what extent? We really don't know - I think that is an interesting area of research. From my experience CoT is useful to many class of problems that require logic, planning, and self-reflection but it's not even clear if we are getting all of these capabilities in the CoT steps. Something to analyze more closely so this is why I am excited about the possibility to get access to the raw thinking steps and not just a summary of it like what OpenAI o1 unfortunately does.
I am using DeepSeek 2.5 since a while. In many tasks this LLM beats ChatGpt, Google Gemini and Claude Sonnet. It is slower but I like the usefulness of the responses. I assume that at DeepSeek smart people are developing useful models which are working well with less advanced compute. Banning modern NPU/GPU chips from China represents a clear incentive to develop LLMs which are running with less NPU/GPU requirements. That will pay off soon for the Chinese AI developers while US providers like OpenAI and Microsoft will be drowning in their compute cost.
You are right. These models are really good. I think it’s good competition and their approach to the thinking steps is an interesting direction. But we definitely need alternative and models that do more unique and efficient inference.
Okay, let's see. I have this problem about Peter and his candles. He has five candles, all the same length, and he lights them all at the same time. After some time, he blows them out one after the other, and then there's a figure showing the lengths of the candles after they've been blown out. The lengths are represented by the number of equal signs. So, candle 1 has four equals, candle 2 has seven, candle 3 has eight, candle 4 has one, and candle 5 has two. First, I need to understand what's going on here. All candles are the same length initially, and they're all lit at the same time. That means they should be burning at the same rate, assuming they're made of the same material and are in the same environment. But then Peter blows them out one after the other, and we see their lengths after that. I think the key here is to figure out which candle was blown out first based on how much it has burned. Since they all started at the same length and were burning at the same rate, the one that was blown out first should have burned the least, right? Because it had less time to burn before being extinguished. Looking at the lengths after being blown out: candle 3 is the longest with eight equals, then candle 2 with seven, candle 5 with two, candle 1 with four, and candle 4 with one equal sign. Wait, actually, candle 4 is the shortest with just one equal sign, and candle 3 is the longest. So, if the one that was blown out first has the least time to burn, it should be the one with the least length burned off. But I need to think carefully here. The number of equals represents the length remaining, not the length burned. Wait a minute, if the equals represent the length remaining, then the candle with more equals means it has burned less, and the one with fewer equals has burned more. So, the candle that was blown out first would have had the least time to burn, meaning it has the most length remaining, which is candle 3 with eight equals. But, let's double-check. If all candles were lit at the same time and then blown out one after the other, the first one blown out would have burned for the shortest time, so it should have the most remaining length. Conversely, the last one blown out would have burned the longest, having the least remaining length. Given that, candle 3 has the most remaining length, so it should be the first one blown out. But, there's a twist. The problem says "Which of the five candles was the first one he has blown out?" and provides the lengths after being blown out. Wait, perhaps I should consider the burning rates differently. Maybe the candles have different burning rates even if they are the same length? But the problem says they are all the same length and presumably made of the same material, so they should burn at the same rate. Alternatively, maybe the number of equals represents the length burned, not the remaining length. But that doesn't make sense because it says "the number of = represents the length of the candle." Hmm, this is confusing. Let me read the problem again. "Here is a figure of the five candles after they have been blown out. The number of = represents the length of the candle." So, the equals represent the length remaining after being blown out. Therefore, the candle with the most equals has the most length remaining, meaning it burned the least, which would be the first one blown out. Therefore, candle 3 with eight equals is the first one blown out. But let's think about this again. If all candles are burning at the same rate, and the first one is blown out, it has less time to burn, so more length remains. The next one is blown out a bit later, so it has burned a little more, and so on. So, the order from most remaining length to least remaining length should correspond to the order in which they were blown out, from first to last. So, candle 3: eight equals (most remaining, first blown out) Then candle 2: seven equals Then candle 1: four equals Then candle 5: two equals Then candle 4: one equal (least remaining, last blown out) Yes, that seems logical. But perhaps there is another way to look at it. Maybe considering the time each candle was burning before being blown out. Assuming they all started at the same time, and were blown out one after the other, the first blown out has the shortest burning time, hence most remaining length. Yes, that aligns with what I thought earlier. So, the answer should be candle 3. But the problem is presented in a way that might be trying to trick me, so I need to be sure. Another way to approach this is to consider the total burn time. Let’s say each candle burns at a rate of 1 unit per minute. They all start burning at time t=0. Suppose Peter blows out the first candle at time t1, the second at t2, and so on, with t1 < t2 < t3 < t4 < t5. Then, the remaining length of each candle would be original length minus burn rate times the time it was burning before being blown out. Since original lengths are the same and burn rate is the same, remaining length is original length minus t1, original length minus t2, etc. So, the candle with the most remaining length has the smallest t, meaning it was blown out first. Yes, this confirms the earlier conclusion. Therefore, candle 3, with eight equals, is the first one blown out. Peter lights all five candles at the same time. Since they are the same length and burn at the same rate, the candle that was blown out first would have burned for the shortest time and thus has the most remaining length. The figure shows that candle 3 has the most remaining length with eight "=" symbols. Therefore, candle 3 was the first one Peter blew out. Answer: Candle 3
@@elvissaravia It very quickly loses track of its task, and begins looping quite easily. It can definitely tackle difficult problems, but once you go into real-world problems as opposed to benchmark type prompts, it quickly starts repeating the previous response instead of factoring in a reiteration of the problem statement/parameters that were already given to it before. And then it just gets stuck or loses track entirely. For comparison, o1-mini tackles the same tasks flawlessly without losing track. I do think it might be a token issue, though. Since up to the point of where it starts looping/losing track it gives very good responses, and when you hone in on subsections it can elaborate quite rigorously and accurately.
@ I see. Yeah I observed too that it’s repeating stuff unnecessarily even when it got the right response for some tasks. That’s not great! They need to do a lot more work on the quality of those thinking steps. Thanks for sharing, good stuff!
That’s a great catch. You are absolutely correct on this. That’s my mistake. I have tested this question so many times and for some reason I completely overlooked the error the model made.
Peter has 5 candles that are all the same length. He lights them all at the same time. After a while, he blows out the candles one after the other. Which of the five candles was the first one he has blown out? Here is a figure of the five candles after they have been blown out. The number of = represents the length of the candle. Respond with the label of the candle that has been blown out first by Peter. 1. ==== 2. ======= 3. ======== 4. = 5. ==
Learn how to build with AI in my new courses here: dair-ai.thinkific.com/
Use code BLACKFRIDAY to get a 35% off. The offer expires 11/29.
How much is this the underlying model being really good and how much of it is a really well done chain of thought prompt?
I believe this is really the challenge here. The chain-of-thought prompt optimization and the model capabilities are both influencing the results. To what extent? We really don't know - I think that is an interesting area of research. From my experience CoT is useful to many class of problems that require logic, planning, and self-reflection but it's not even clear if we are getting all of these capabilities in the CoT steps. Something to analyze more closely so this is why I am excited about the possibility to get access to the raw thinking steps and not just a summary of it like what OpenAI o1 unfortunately does.
I am using DeepSeek 2.5 since a while. In many tasks this LLM beats ChatGpt, Google Gemini and Claude Sonnet. It is slower but I like the usefulness of the responses. I assume that at DeepSeek smart people are developing useful models which are working well with less advanced compute. Banning modern NPU/GPU chips from China represents a clear incentive to develop LLMs which are running with less NPU/GPU requirements. That will pay off soon for the Chinese AI developers while US providers like OpenAI and Microsoft will be drowning in their compute cost.
You are right. These models are really good. I think it’s good competition and their approach to the thinking steps is an interesting direction. But we definitely need alternative and models that do more unique and efficient inference.
Can you make a video comparing it with another model in a environment
Yes, great suggestion. I am working on that along with a few more tests included.
@elvissaravia same project, same prompts, different model b2b. Let the games begin
Okay, let's see. I have this problem about Peter and his candles. He has five candles, all the same length, and he lights them all at the same time. After some time, he blows them out one after the other, and then there's a figure showing the lengths of the candles after they've been blown out. The lengths are represented by the number of equal signs. So, candle 1 has four equals, candle 2 has seven, candle 3 has eight, candle 4 has one, and candle 5 has two.
First, I need to understand what's going on here. All candles are the same length initially, and they're all lit at the same time. That means they should be burning at the same rate, assuming they're made of the same material and are in the same environment. But then Peter blows them out one after the other, and we see their lengths after that.
I think the key here is to figure out which candle was blown out first based on how much it has burned. Since they all started at the same length and were burning at the same rate, the one that was blown out first should have burned the least, right? Because it had less time to burn before being extinguished.
Looking at the lengths after being blown out: candle 3 is the longest with eight equals, then candle 2 with seven, candle 5 with two, candle 1 with four, and candle 4 with one equal sign. Wait, actually, candle 4 is the shortest with just one equal sign, and candle 3 is the longest.
So, if the one that was blown out first has the least time to burn, it should be the one with the least length burned off. But I need to think carefully here. The number of equals represents the length remaining, not the length burned.
Wait a minute, if the equals represent the length remaining, then the candle with more equals means it has burned less, and the one with fewer equals has burned more.
So, the candle that was blown out first would have had the least time to burn, meaning it has the most length remaining, which is candle 3 with eight equals.
But, let's double-check. If all candles were lit at the same time and then blown out one after the other, the first one blown out would have burned for the shortest time, so it should have the most remaining length. Conversely, the last one blown out would have burned the longest, having the least remaining length.
Given that, candle 3 has the most remaining length, so it should be the first one blown out.
But, there's a twist. The problem says "Which of the five candles was the first one he has blown out?" and provides the lengths after being blown out.
Wait, perhaps I should consider the burning rates differently. Maybe the candles have different burning rates even if they are the same length? But the problem says they are all the same length and presumably made of the same material, so they should burn at the same rate.
Alternatively, maybe the number of equals represents the length burned, not the remaining length. But that doesn't make sense because it says "the number of = represents the length of the candle."
Hmm, this is confusing. Let me read the problem again.
"Here is a figure of the five candles after they have been blown out. The number of = represents the length of the candle."
So, the equals represent the length remaining after being blown out.
Therefore, the candle with the most equals has the most length remaining, meaning it burned the least, which would be the first one blown out.
Therefore, candle 3 with eight equals is the first one blown out.
But let's think about this again. If all candles are burning at the same rate, and the first one is blown out, it has less time to burn, so more length remains. The next one is blown out a bit later, so it has burned a little more, and so on.
So, the order from most remaining length to least remaining length should correspond to the order in which they were blown out, from first to last.
So, candle 3: eight equals (most remaining, first blown out)
Then candle 2: seven equals
Then candle 1: four equals
Then candle 5: two equals
Then candle 4: one equal (least remaining, last blown out)
Yes, that seems logical.
But perhaps there is another way to look at it. Maybe considering the time each candle was burning before being blown out.
Assuming they all started at the same time, and were blown out one after the other, the first blown out has the shortest burning time, hence most remaining length.
Yes, that aligns with what I thought earlier.
So, the answer should be candle 3.
But the problem is presented in a way that might be trying to trick me, so I need to be sure.
Another way to approach this is to consider the total burn time.
Let’s say each candle burns at a rate of 1 unit per minute.
They all start burning at time t=0.
Suppose Peter blows out the first candle at time t1, the second at t2, and so on, with t1 < t2 < t3 < t4 < t5.
Then, the remaining length of each candle would be original length minus burn rate times the time it was burning before being blown out.
Since original lengths are the same and burn rate is the same, remaining length is original length minus t1, original length minus t2, etc.
So, the candle with the most remaining length has the smallest t, meaning it was blown out first.
Yes, this confirms the earlier conclusion.
Therefore, candle 3, with eight equals, is the first one blown out.
Peter lights all five candles at the same time. Since they are the same length and burn at the same rate, the candle that was blown out first would have burned for the shortest time and thus has the most remaining length. The figure shows that candle 3 has the most remaining length with eight "=" symbols. Therefore, candle 3 was the first one Peter blew out.
Answer: Candle 3
yes, i'm also thinking candle 3
❤
Not sure about the API version, but the chat version sucks.
Could you elaborate on what you mean by it sucks. I am curious to know as I think the dev experience is definitely lacking.
@@elvissaravia It very quickly loses track of its task, and begins looping quite easily.
It can definitely tackle difficult problems, but once you go into real-world problems as opposed to benchmark type prompts, it quickly starts repeating the previous response instead of factoring in a reiteration of the problem statement/parameters that were already given to it before. And then it just gets stuck or loses track entirely.
For comparison, o1-mini tackles the same tasks flawlessly without losing track.
I do think it might be a token issue, though. Since up to the point of where it starts looping/losing track it gives very good responses, and when you hone in on subsections it can elaborate quite rigorously and accurately.
@ I see. Yeah I observed too that it’s repeating stuff unnecessarily even when it got the right response for some tasks. That’s not great! They need to do a lot more work on the quality of those thinking steps. Thanks for sharing, good stuff!
Same deepSeek, same problem, giving different result
The first candle blown out should have been the longest one. The LLM was incorrect saying it would be the shortest.
That’s a great catch. You are absolutely correct on this. That’s my mistake. I have tested this question so many times and for some reason I completely overlooked the error the model made.
@@elvissaraviaI thought I was going crazy! Otherwise, great video as always.
@ to be honest, prompts are getting trickier to assess but that’s where things are headed
@@elvissaravia i think you should have actually compared it side by side with o1. That way you are more likely to catch mistakes.
Peter has 5 candles that are all the same length. He lights them all at the same time. After a while, he blows out the candles one after the other. Which of the five candles was the first one he has blown out?
Here is a figure of the five candles after they have been blown out. The number of = represents the length of the candle. Respond with the label of the candle that has been blown out first by Peter.
1. ====
2. =======
3. ========
4. =
5. ==