DeepSeek R1 vs OpenAI O1 & Claude 3.5 Sonnet - Hard Code Round 1

Aider + Deepseek 3 vs Claude 3.5 Sonnet

Deepseek R1 671b Running LOCAL AI LLM is a ChatGPT Killer!

Timothée Chalamet | This Past Weekend w/ Theo Von #551

Barstool Pizza Review - Del Rossi's (Philadelphia, PA) Bonus Cheesesteak Presented by Tommy John

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

NEW Gemini 2 Flash Thinking 0121 | First Impressions (Coding vs DeepSeek R1)

Marvijo AI Software

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2025

Комментарии • 32

@georgezorbas9036 10 дней назад ⁺³
Bravo...seems always you have something really useful
@MarvijoSoftware 10 дней назад
Thanks! Will keep them coming
@ragibhasan2.0 10 дней назад ⁺⁷
This Open-source model is truly revolutionary!🔆🔆
@MarvijoSoftware 9 дней назад ⁺²
@@ragibhasan2.0 it's not Open Source
@ragibhasan2.0 9 дней назад ⁺³
@@MarvijoSoftware i mean deepseek r1
🤗
@JayBallentine 10 дней назад ⁺⁴
It’s suuuuuch a relief to see your videos in feed, regarding something I’m interested in, and know instantly I WILL NOT be disappointed. RUclips is a long game and the cream always rises my good man. 💪
@MarvijoSoftware 10 дней назад ⁺¹
@@JayBallentine 🙏🏾 I appreciate you my good man!
@anandkanade9500 10 дней назад ⁺²
its so satisfying to see spreadsheet of performance at end , 👍
@MarvijoSoftware 10 дней назад ⁺¹
😃
@TheBuzzati 10 дней назад ⁺¹
Appreciate your videos. Thanks for the comparisons and insights
@MarvijoSoftware 10 дней назад
@@TheBuzzati I appreciate your viewing 🙏🏾
@MrParad0x 10 дней назад ⁺¹³
#1 my ass. I tested it with 10+ prompts and it hallucinated a lot. It would suddenly stop generating response in AI studio. As of today, it's quite buggy and unreliable. I wouldn't recommend using this.
@MarvijoSoftware 10 дней назад ⁺¹²
@@MrParad0x yep, I don't trust LM Arena in the slightest, especially after it ranked the weak o1-mini so high for coding
@moidrugag 4 дня назад ⁺¹
Can you compare OpenHands with the agents you've already tested? I’d like to see how it stacks up against them using DeepSeek V3 or R1.
@MarvijoSoftware 4 дня назад
@@moidrugag okay
@sLoNcE 10 дней назад ⁺¹
Thank you
@ninjaxae 9 дней назад ⁺²
Please Use this voice in every video , and also please Do Cursor(Deepseek-R1) vs Windsurf(Sonnet 3.5) video .
@MarvijoSoftware 9 дней назад
Alright, I'll queue it up. The problem is that Cursor + R1 don't support Composer. Cursor vs Windsurf: ruclips.net/video/duLRNDa-CR0/видео.html
@Arron_Mottram 9 дней назад ⁺³
I don't trust the chatbot arena, they put Claude 3.5 Sonnet in 11th place
@MarvijoSoftware 9 дней назад ⁺¹
I also don't trust it, but I get it. It's not Devs who vote for coding tasks, and people just cast random votes. That's why I believe actual benchmarks are needed, like we do on the channel, and what Aider does. The problem with Aider benchmarks is that LLMs can train on them because they are public
@gemini_537 10 дней назад ⁺¹
❤ Gemini 2.0
@andrewandreas5795 10 дней назад ⁺²
Thanks for the nice video. Please make a comparison of Roocline vs Aider vs Cursor
@MarvijoSoftware 10 дней назад ⁺¹
@@andrewandreas5795 A Roo-Cline video is incoming soon, after the R1 Architect video in a larger codebase
@mlsterlous 10 дней назад ⁺²
How the f its number one on arena? Like one or two days after release? I had to wait a lot longer to see phi4 in that list.
@MarvijoSoftware 10 дней назад
I asked myself the same question after it was just released! So quick, who voted? Bots? Something might be off
@ilyass-alami 9 дней назад ⁺⁴
No ,deepseek R1 is number one, it batter then new Gemini 2 Flash thinking
@MarvijoSoftware 9 дней назад ⁺¹
I agree. That's the LMArena leaderboard which is based on random people voting
@abdusalamolamide 10 дней назад ⁺⁴
lol.. these aren't three Rs Gemini..😂😂
@MarvijoSoftware 10 дней назад
@@abdusalamolamide 😂
@Dom-zy1qy 10 дней назад ⁺¹
I think the new gemini models have been tuned pretty well in terms of human preference (at least I like the newer models more than their older ones).
Claude imo is usually #1, then everyone else is about the same. However from my usage of the model, it seems like gemini 2.0 exp 1206's responses get pretty bad/mediocre after 60k tokens of context.
@MarvijoSoftware 10 дней назад
All models get dumber as tokens increase. Also, they start to output random characters at a certain context

Следующие

Автовоспроизведение

DeepSeek R1 vs OpenAI O1 & Claude 3.5 Sonnet - Hard Code Round 1

DeepSeek R1 vs OpenAI O1 & Claude 3.5 Sonnet - Hard Code Round 1

Aider + Deepseek 3 vs Claude 3.5 Sonnet

Aider + Deepseek 3 vs Claude 3.5 Sonnet

Deepseek R1 671b Running LOCAL AI LLM is a ChatGPT Killer!

Deepseek R1 671b Running LOCAL AI LLM is a ChatGPT Killer!

Timothée Chalamet | This Past Weekend w/ Theo Von #551

Timothée Chalamet | This Past Weekend w/ Theo Von #551

Barstool Pizza Review - Del Rossi's (Philadelphia, PA) Bonus Cheesesteak Presented by Tommy John

Barstool Pizza Review - Del Rossi's (Philadelphia, PA) Bonus Cheesesteak Presented by Tommy John

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

Boston FBI announce arrest of two Iranians in connection with fatal drone strike

Boston FBI announce arrest of two Iranians in connection with fatal drone strike

How DeepSeek AI Helped Me Create Maps Effortlessly

How DeepSeek AI Helped Me Create Maps Effortlessly

Forget Deepseek, Here's another MAX Release from China!

Forget Deepseek, Here's another MAX Release from China!

DeepSeek vs ChatGPT vs Claude AI

DeepSeek vs ChatGPT vs Claude AI

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

How to use Cline + DeepSeek R1 to build everything you want

How to use Cline + DeepSeek R1 to build everything you want

Turn ANY Website into LLM Knowledge in SECONDS

Turn ANY Website into LLM Knowledge in SECONDS

Building a fully local "deep researcher" with DeepSeek-R1

Building a fully local "deep researcher" with DeepSeek-R1

Scale AI CEO Alexandr Wang on U.S.-China AI race: We need to unleash U.S. energy to enable AI boom

Scale AI CEO Alexandr Wang on U.S.-China AI race: We need to unleash U.S. energy to enable AI boom

Is This the Death of Tutorials? Testing out Google AI Studio/Gemini 2.0

Is This the Death of Tutorials? Testing out Google AI Studio/Gemini 2.0

УЛЬТРА ТЮНИНГ НИВЫ ЗА КОПЕЙКИ!

УЛЬТРА ТЮНИНГ НИВЫ ЗА КОПЕЙКИ!

Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер

Трамп отменяет гранты // DeepSeek - Революция в ИИ // Илон Маск читер

Скрежет - ТРЕШ ОБЗОР на фильм

Скрежет - ТРЕШ ОБЗОР на фильм

Игра в Кальмара, но там играет Глупый Парень, 1 серия (анимация в Роблокс)

Игра в Кальмара, но там играет Глупый Парень, 1 серия (анимация в Роблокс)

JENNIE & Dominic Fike - Love Hangover (Official Video)

JENNIE & Dominic Fike - Love Hangover (Official Video)

ВСЕ УМЕРЛИ?! Поппи Плейтайм 4 #5 - Poppy Playtime Chapter 4

ВСЕ УМЕРЛИ?! Поппи Плейтайм 4 #5 - Poppy Playtime Chapter 4

КОРОЧЕ ГОВОРЯ, ИГРА В КАЛЬМАРА В РЕАЛЬНОЙ ЖИЗНИ 2

КОРОЧЕ ГОВОРЯ, ИГРА В КАЛЬМАРА В РЕАЛЬНОЙ ЖИЗНИ 2

Китайка и Ксюша 3 серия😂😆

Китайка и Ксюша 3 серия😂😆