AI Coding o3-mini vs DeepSeek R1 (in Cursor vs Windsurf)

Marvijo AI Software

Просмотров 3 тыс.

107

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 фев 2025

Комментарии •

@hammeedabdo.82 3 дня назад ⁺¹
Thanks, Please more videos like this.
@MarvijoSoftware 3 дня назад
@@hammeedabdo.82 will definitely do! Check out the other comparison videos on the channel, e.g., Cursor vs Cline: ruclips.net/video/AtuB7p-JU8Y/видео.html
@gabrielsandstedt День назад ⁺²
TIP give the entire code and the assignment to a reasoning model, then ask claude sonnet as the agent to implement the plan generated - this works REALLY well
@MarvijoSoftware День назад ⁺¹
@@gabrielsandstedt I use "Cline"'s Memory Bank technique : docs.cline.bot/improving-your-prompting-skills/custom-instructions-library/cline-memory-bank
@gabrielsandstedt День назад
@@MarvijoSoftware oh cool not heard of, I will look into it
@gabrielsandstedt День назад
Wonder if it works with Roo Code also (popular cline fork that is more agentic)
@jonathan-k7z4y День назад ⁺¹
Love this content! I'll appreciate you testing RooCline :)
@MarvijoSoftware День назад
@@jonathan-k7z4y will definitely test it for you!
@SurvivalKompass 4 дня назад ⁺³
Thanks! Great test! I love Sonet! 🙏
@m.f.mfazrin8720 3 дня назад ⁺³
For Every new LLM models' agents are struggling to understand the LLM's output, in that case Claude Sonnet doing great, they produce response in same way except each version increase the accuracy of the response which is great for Agents
День назад ⁺¹
I run Cline and RooCline with Gemini's free API inside Windsurf. Best of all worlds
@MarvijoSoftware День назад ⁺¹
Niiice! How's the performance?
День назад
@MarvijoSoftware Claude still writes better code but Gemini is fast and the massive context windows are amazing. It fits my entire codebase in with no issue. Best of all, it's free for upwards of like 30 calls per minute, after that, wait a minute and go again.
But whenever it struggles, I just switch back to Cascade with Claude to bugfix
@HudsonAtwell 4 дня назад ⁺²
Ty
@renierdelacruz4652 3 дня назад ⁺¹
Great video, I love it but you should make a video about making a comparison between cline working with open source model and close model.
@MarvijoSoftware 3 дня назад ⁺¹
@@renierdelacruz4652 will do. Which models do you want to compare?
@renierdelacruz4652 3 дня назад
@MarvijoSoftware could be deepseek R1 and Claude soonet or o1
@saklıdosya 23 минуты назад
I got like 5 forks of VSCode. Windsurf seems to be best, but I really wanna use my own API keys since I got credits in the Openrouter and dont wanna pay for subscription. We can do that in cursor but it defeats the whole purpose of agentic stuff. What would you recommend?
@aculz 3 дня назад ⁺¹
i agree that sonnet has born a bit longer than both two, so windsurf and cursor just fine-tune their tools with sonnet model. thats why it perform well.
i think we just need to wait a bit more when both IDE fine tune also both model
@MarvijoSoftware 3 дня назад
@@aculz I agree!
@MarvijoSoftware 4 дня назад
Hi. Let me know which AI tools you need to be compared and please consider supporting the channel using the Thanks button
@MarvijoSoftware 3 дня назад
TL; DR/Text take aways:
Here are the findings from the review of using o3-mini and R1 in Cursor vs in Windsurf, with a 240k+ token codebase. The task was to integrate Supabase Authentication into the app:
**TL;DR: When using Cursor or Windsurf in a relatively large codebase, Claude 3.5 Sonnet still seems to be the best option**
\- o3-mini isn't practical yet, both in Cursor and Windsurf. It's buggy, error prone and doesn't produce the expected results
\- Claude 3.5 Sonnet is still the best coder amongst the 3 reasoning models in current tests: against o3-mini, R1 and Gemini 2 Flash Thinking
\- We might be approaching things wrong by coding with reasoning models, they're supposed to do the planning/architecting; e.g., R1 + 3.5 Sonnet are the best AI Coding duo in the Aider Polyglot benchmark (ref: [aider.chat/docs/leaderboards/](aider.chat/docs/leaderboards/) )
\- I'll see how R1 vs o3-mini compare as Software Architects, paired with DeepSeek V3 vs Claude 3.5 Sonnet. This should be an ultimate SOTA test
\- I believe we shouldn't miss the point and spend an equivalent amount of time using AI Coders as real developers. If it takes > 60% of the estimated time for a human developer, it's probably not a good model... or the prompt needs to be refined
\- if the prompt engineering + AI Coding takes as long as the human dev estimates, we're missing the point
\- Both Cursor and Windsurf are either optimized for Claude 3.5 Sonnet, or Claude 3.5 Sonnet is just extremely optimized for coding and is probably better named Claude 3.5 Sonnet Coder. We know it's a good coder, but it shouldn't theoretically be competing with R1 since it's not a reasoning model
\- it would be great to see how o3-mini-high performs in both Cursor and Windsurf
@anon1999-h5j 2 дня назад ⁺¹
Do a video testing "Augment Code". I've had better success using it in a large project compared to Cursor or Windsurf.
@MarvijoSoftware 2 дня назад
@@anon1999-h5j I tested it, I'll see if I can provide feedback
@JozefVodicka 3 дня назад ⁺¹
Always show .windsurfrules please, so we know what context you give to models. Models are more accurate if provided with project rules.
@doublebucketz4661 3 дня назад ⁺¹
Cursor or windsurf as a game developer? I can never telllll
@MarvijoSoftware 3 дня назад
@@doublebucketz4661 what do you use? Unity or something else? The best way is to TRY BOTH with your codebase, if it's your product or your company permits. Cursor is just better in general
@MarvijoSoftware 4 дня назад
Cursor vs Windsurf: Round 1: ruclips.net/video/duLRNDa-CR0/видео.html
@MarvijoSoftware 4 дня назад
DeepSeek R1 vs OpenAI O1 & Claude 3.5 Sonnet - Hard Code Round 1: ruclips.net/video/EkFt9Bk_wmg/видео.html
@caseyhoward8261 4 дня назад ⁺¹
Why am I getting AICodeKing vibes? 😉
@MarvijoSoftware 4 дня назад ⁺¹
@@caseyhoward8261 😅 yes, cool dude! We cover similar topics but he focuses on 'free stuff' and just testing tools while I focus on tools which can be used in the workplace in larger codebases. So he'd have more videos for example 🙂
@MarvijoSoftware 4 дня назад
Cursor vs Cline | 240k Tokens Codebase Side-by-Side: ruclips.net/video/AtuB7p-JU8Y/видео.html
@MarvijoSoftware 4 дня назад
Aider vs Cline Using DeepSeek 3: Codebase 20k Lines: ruclips.net/video/e1oDWeYvPbY/видео.html

Следующие

Автовоспроизведение

How China’s New AI Model DeepSeek Is Threatening U.S. Dominance