I gave deep think a system prompt before I, rewrote your confetti prompt because when you prompt cline you say it like this write a website in HTML, CSS , and JS, in your test you say you can use CSS and JS , I got the confetti in one shot, just press run html in the window... Let's add a 2nd test update the questions and a/b test the tests to really test these models
The AI's response to Test #3 was correct but it would have been interesting if it had been able to further speculate that C possibly was the other person playing table tennis with E unless E was playing solo with the table against the wall.
Now this is interesting to see. Finally a new model showing highly promising results. Well lets see what I think of it. Also, forgive me if it is a bit chaotically structured, I am writing this as I watch the video. With that out of the way, let us get started! As weird as it is, I would consider test one neither a fail nor pass, as what the model went through eerily resembled a human being stunned by a question, and not seeing the logical answer immediately. Hard to say how this can be improved, but I theorise the problem may solve itself once the model is given more time to think without rushing. Maybe even having it change perspectives at some point? Moving on to test #4, we can tell that it did objectively fail, but the reasoning chain was obviously halted prematurely, presumably by the system itself to limit the amount of tokens spent on thinking. Smart choice by DeepSeek, yet obviously a performance limiter in cases like these. Would love to see what the answer would be if given as much time as it wants. Yet again, if they release the model open source down the line, these compute power problems can be solved easily by the users themselves (of course assuming that it is not an absurdly big model, which it unfortunately does seem to be the case here. Would love to be incorrect on the size though.) Yet again, more compute time will not solve all the problems, as we can see from test #9 that it was unable to create a proper website for the confetti. Unfortunately though, we did not see the code perform outside of DeepSeek's own environment, which may itself be the limiting factor in this case, not the model itself. For more rigour the code should have been run also through more conventional means just in case, something like how the Python code was externally executed. (Also, do pardon me if my assumptions about DeepSeek's environment are incorrect, I am not that familiar with web frameworks or their execution.) I would say something similar for test #12, but I did not catch if the DeepSeek environment was used, so I am forced into mere speculation for this. Sorry for the long paragraph, but moving on to Test #11, I would consider it a fail from an artistic perspective, but the model itself was most likely not trained on SVG creation, so the expected potential is rather low. However, it is still impressive that it created a general shape of a butterfly. All in all, a very, very exciting model. Especially if it is able to be used on most systems.
well, this is great result for model named "Lite" which can almost beat o1 not just o1-mini. im very sure that might the "Large" one can beat sonnet aswell so we can have an Greatest Open Source model and much much cheaper than sonnet. cant wait for another brilliant move from this company
While the "thinking outside the box" is impressive I think the AI failed Test #1 for 2 reasons. First, the AI said >>there doesn't appear to be any country with an official English name that ends in "lia."
i like very much the artstyle of the images in the beginning of your videos, would you mind share the prompt to get this art style? Would be greatly appreciated
logically, your 3rd question has a better answer than "unknown" Statement 1 says there are 5 people in a house, naming them. 4 people are given activities with the 5th (C) not being mentioned. E, however, is playing table tennis, a 2 player game. Logically, E is playing with C, because there are 5 people in the house and table tennis cannot be played alone.
Sonnet is a force to be reckoned with when it comes to coding.
I gave deep think a system prompt before I, rewrote your confetti prompt because when you prompt cline you say it like this write a website in HTML, CSS , and JS, in your test you say you can use CSS and JS , I got the confetti in one shot, just press run html in the window... Let's add a 2nd test update the questions and a/b test the tests to really test these models
Oh lord ! Just tested it, it's wild ! Loving it ! I'm hoping the API cost is the same as now, if it's the case I will forget Heroku quickly!
What do you mean by you will forget heroku?
They also have some kind of Bedrock alternate?
@@ANSHU61936 Claude Haïku sorry! Wrote too quick 🙏
I’m curious about how to drop Heruku and why you could do that once deepseeks api costs little
@@vdbv0 same question again, why are you dropping heroku? They also provide some kind of ai model?
The AI's response to Test #3 was correct but it would have been interesting if it had been able to further speculate that C possibly was the other person playing table tennis with E unless E was playing solo with the table against the wall.
Now this is interesting to see. Finally a new model showing highly promising results. Well lets see what I think of it. Also, forgive me if it is a bit chaotically structured, I am writing this as I watch the video. With that out of the way, let us get started!
As weird as it is, I would consider test one neither a fail nor pass, as what the model went through eerily resembled a human being stunned by a question, and not seeing the logical answer immediately. Hard to say how this can be improved, but I theorise the problem may solve itself once the model is given more time to think without rushing. Maybe even having it change perspectives at some point?
Moving on to test #4, we can tell that it did objectively fail, but the reasoning chain was obviously halted prematurely, presumably by the system itself to limit the amount of tokens spent on thinking. Smart choice by DeepSeek, yet obviously a performance limiter in cases like these. Would love to see what the answer would be if given as much time as it wants. Yet again, if they release the model open source down the line, these compute power problems can be solved easily by the users themselves (of course assuming that it is not an absurdly big model, which it unfortunately does seem to be the case here. Would love to be incorrect on the size though.)
Yet again, more compute time will not solve all the problems, as we can see from test #9 that it was unable to create a proper website for the confetti. Unfortunately though, we did not see the code perform outside of DeepSeek's own environment, which may itself be the limiting factor in this case, not the model itself. For more rigour the code should have been run also through more conventional means just in case, something like how the Python code was externally executed. (Also, do pardon me if my assumptions about DeepSeek's environment are incorrect, I am not that familiar with web frameworks or their execution.) I would say something similar for test #12, but I did not catch if the DeepSeek environment was used, so I am forced into mere speculation for this.
Sorry for the long paragraph, but moving on to Test #11, I would consider it a fail from an artistic perspective, but the model itself was most likely not trained on SVG creation, so the expected potential is rather low. However, it is still impressive that it created a general shape of a butterfly.
All in all, a very, very exciting model. Especially if it is able to be used on most systems.
well, this is great result for model named "Lite" which can almost beat o1 not just o1-mini. im very sure that might the "Large" one can beat sonnet aswell so we can have an Greatest Open Source model and much much cheaper than sonnet. cant wait for another brilliant move from this company
Interesting comparison. I would love to see the API coming out so we can implement it in our own apps
Have you tested AYA? Great for structured outputs.
WOW. Finally, deepseek!
While the "thinking outside the box" is impressive I think the AI failed Test #1 for 2 reasons. First, the AI said >>there doesn't appear to be any country with an official English name that ends in "lia."
With regards to question 2 - shouldn't C be playing table tennis with E? If noone else in the house and C is not playing, who is E playing with?
claude sonnet 3.5 is the best choice for coding
good one, this will really be a game changer
First to view, first to comment.
This is quite an impressive. I have used it and the results is so amazing.
Nice open models are getting close, Cant wait until we can run cline locally and do full stack applications with no limit
Great job. I disagree with the opinion about CoT and coding. In case of complicated architectures, thinking steo by step should provide better results
i like very much the artstyle of the images in the beginning of your videos, would you mind share the prompt to get this art style? Would be greatly appreciated
It's very basic.. Something like "A panda in a forest, in front of a campfire, cinematic, anime style".
I wonder if this is usable for doing Thesis? Anyway, I don't think it's monologue is necessary.
logically, your 3rd question has a better answer than "unknown"
Statement 1 says there are 5 people in a house, naming them. 4 people are given activities with the 5th (C) not being mentioned.
E, however, is playing table tennis, a 2 player game. Logically, E is playing with C, because there are 5 people in the house and table tennis cannot be played alone.
I wonder if the answer to question 3 should be "playing table tennis". It's hard to imagine that E is playing tennis solo, right..?
I'm having trouble creating a markdown file of the pixijs api. Something about the url syntax prevents it from being scraped. Any advice?
Can you test the new Mistral Large 2411?
Open source model this good is crazy, just hope it's not like 500 GB
its okay for 500GB, if we cant use it on our local then we can use their's which is crazy cheap than sonnet and gpt
E
Edit: I just now noticed on the top commenter with the most hearted comments, that is a pog
How do you use this open source in like in aider or va code extensions/apps?
you need to pay to use their api
Suppose we combine it with the coder model☠️
I think you need a new benchmark 😂
first and bro answer how to make aider detect my local project?
It should probably detect it automatically.
Are you running it in the project folder/repository?
@@AICodeKing i am using linux and it not , i think i should do /save then /load not ?
@@Gorops yep
😊
Yoooooo😂🎉
openai has no moat
haha 8th one to comment, it would be cool if you were to show how can we access these llm models for free without any limits
well. just install it on your ollama or LM Studio and use it locally. but be sure you have the greatest GPU or it will perform very slow