DeepSeek-Coder-V2.1 (0724) : BEST CODING LLM JUST GOT BETTER! (Fully Tested & Beats Claude, GPT-4O)

The COMPLETE TRUTH About AI Agents (2024)

LLaMA 405b Fully Tested - Open-Source WINS!

W Sound 01 "Soltera" - Blessd, Westcol, Ovy On The Drums (Official Video)

A Tribute to Lazar Đukić

I let my stream buy ANYTHING on Temu...

InternLM-2.5 (7b) : This NEW Model BEATS Qwen-2 & Llama-3 in Benchmarks! (Fully Tested)

AICodeKing

Просмотров 4 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 авг 2024
In this Video, I'll be telling you about the newly released InternLM-2.5 7b Model. This new model comes with a 1M Token Context Limit which is really amazing. This new Model claims to beat Qwen-2, Llama-3, Claude, DeepSeek and other Opensource LLMs. I'll be testing it out in this video. Watch the video to find more about this new model. It also beats Qwen-2, DeepSeek Coder, Codestral in all kinds of coding tasks.
------
Key Takeaways:
🌟 InternLM 2.5 Launch: Just launched, InternLM 2.5 is the latest AI model, outperforming Llama 3 and Gemma 2 9B in practical scenarios.
🚀 7 Billion Parameters: With 7 billion parameters, InternLM 2.5 offers outstanding reasoning capabilities and a long context window, perfect for complex AI tasks.
🏆 Benchmark Dominance: InternLM 2.5 excels in MMLU, CMMLU, BBH, and MATH benchmarks, showcasing superior performance against larger models.
🔧 Tool Usage: InternLM 2.5 excels at tool usage, making it ideal for applications that involve web search and other integrated tools.
📊 Real-World Performance: Despite benchmark success, real-world performance is where InternLM 2.5 shines, particularly in coding tasks with its 1M-long context window.
💻 Available on Major Platforms: Now accessible on Ollama, HuggingFace, and more, making it easy to test and integrate InternLM 2.5 into your projects.
🤖 Hands-On Testing: Watch as we put InternLM 2.5 through various language and coding tasks, highlighting its strengths and weaknesses.
------
Timestamps:
00:00 - Introduction
00:07 - About InternLM-2.5 (7b with 1Million Context)
01:16 - Benchmarks
03:03 - Testing
07:53 - Conclusion
Наука

Комментарии • 30

@user-no4nv7io3r Месяц назад ⁺¹¹
They train their models on benchmarking, claim to beat everyone else, turned out to be trash in most cases, what a crazy world we are living in
@superakaike Месяц назад
They also train their model on ChatGPT answers ...
@wolraikoc Месяц назад ⁺⁸
A copilot video with this model and neovim would be aweseome!
@Link-channel Месяц назад
I wonder how to integrate autoconpletion in vim, no wait, I wonder how to use vim
@nahuelpiguillem2949 Месяц назад ⁺¹
Thank you for doing honest review, it's rare to find someone saying "i tested and it is not worth it". Sometimes the last thing it isn't the best
@BadreddineMoon Месяц назад ⁺²
I'm addicted to your videos, keep up the good work ❤
@user-no4nv7io3r Месяц назад ⁺¹
@@BadreddineMoon me too especially his voice and tone and critiques that's magical
@waveboardoli2 Месяц назад ⁺⁴
Can you show how to use claude-engineer with opensource models?
@sammcj2000 Месяц назад ⁺²
I’d be interested in you trying it with coding with a number of different parameters (topp/k, temp, rep penalty etc)
@Revontur Месяц назад ⁺¹
as always a great video... thanks for your effort. Is there any site, where you publish your tests ? because it would be really great to compare new models with previous tested models.
@nahuelpiguillem2949 Месяц назад
Sameeee
@RedOkamiDev Месяц назад
Thanks Mr. AiKing, you are my daily source of AI news :)
@pudochu Месяц назад
6:47 How can I find the test here? It would also be great if they have answers.
@jaysonp9426 Месяц назад
You didn't test the needle in a haystack or what it does with 1m tokens?
@tianjin8208 Месяц назад
Intern series always train their models on eval dataset, it's their style, they need to surpass others quikly, so this is the fast way.
@paulyflynn Месяц назад
What size codebase will 1M Token Context support? Is there a LOC to Token formula?
@SpikyRoss Месяц назад
Hey, It would be great if you could add the links to the model in the description. 👍
@LucasMiranda2711 Месяц назад
Which one was the best tested until now? Any place or anyone counting the scores?
@AICodeKing Месяц назад ⁺¹
Currently, Qwen-2 is topping my list for general tasks and for coding DeepSeek-Coder-V2
@elchippe Месяц назад
Draw a butterfly in svg? Those task would be hard for a large LLM like claude and way more for an 7B LLM.
The transformer architecture biggest drawback is inability to rethink backwards, that is why this models mostly fail in these puzzles.
@AICodeKing Месяц назад ⁺¹
I generally do that test to check wheither the LLM can create something similar. Claude & GPT can do this. Also, I don't do other tests for smaller ones the tests are similar wheither it be 7b or 300b
@EladBarness Месяц назад
Hype for nothing, wouldn’t count on it in anything… thank you for the video!
@john_blues Месяц назад
If it can't build a basic python script, why would I want it chatting with my codebase? Anyway, thanks for the video and the actual testing on this.
@MeinDeutschkurs Месяц назад
The model seems to be horrendous! Thx for saving my time.
@Richi-8 Месяц назад
Which one do you consider to be the best model for general tasks nowadays?
@AICodeKing Месяц назад ⁺¹
Qwen2
@Lemure_Noah Месяц назад
This model is good in benchmarks, but it doesn't seem to be better than other moderm models like Llama-3, Phi-3 or even Mistral 7, at least on my internal review, dealing with summarization and other language tasks.
If someone could give real word example where it performs better than other models on same class, please share it ;)
@aryindra2931 Месяц назад
Please make 2 day❤❤, i like video
@LazarMateev 29 дней назад
Merge maestro with claude engineer and aider into 1. Make it is open source model orchestration recalling initial prompt with ascces to RAG and you would be the king of the kings 😊 locally hosted web apps looks very cool niche
@hollidaycursive Месяц назад
Pre-watch Comment

Следующие

Автовоспроизведение

DeepSeek-Coder-V2.1 (0724) : BEST CODING LLM JUST GOT BETTER! (Fully Tested & Beats Claude, GPT-4O)

DeepSeek-Coder-V2.1 (0724) : BEST CODING LLM JUST GOT BETTER! (Fully Tested & Beats Claude, GPT-4O)

The COMPLETE TRUTH About AI Agents (2024)

The COMPLETE TRUTH About AI Agents (2024)

LLaMA 405b Fully Tested - Open-Source WINS!

LLaMA 405b Fully Tested - Open-Source WINS!

W Sound 01 "Soltera" - Blessd, Westcol, Ovy On The Drums (Official Video)

W Sound 01 "Soltera" - Blessd, Westcol, Ovy On The Drums (Official Video)

A Tribute to Lazar Đukić

A Tribute to Lazar Đukić

I let my stream buy ANYTHING on Temu...

I let my stream buy ANYTHING on Temu...

Cooking on the American Homefront During WWII

Cooking on the American Homefront During WWII

ALL ROADS LEAD to AI CODING: Cursor, Aider in the browser, Multi file Prompting

ALL ROADS LEAD to AI CODING: Cursor, Aider in the browser, Multi file Prompting

Create fine-tuned models with NO-CODE for Ollama & LMStudio!

Create fine-tuned models with NO-CODE for Ollama & LMStudio!

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Stack Overflow stopped caring about developers a long time ago

Stack Overflow stopped caring about developers a long time ago

Claude 3.5 Deep Dive: This new AI destroys GPT

Claude 3.5 Deep Dive: This new AI destroys GPT

Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

Aider + LightningAI + DeepSeek-Coder-V2 + Udio + ElevenLabs : Generate Games with AI (w/ Local LLMs)

Aider + LightningAI + DeepSeek-Coder-V2 + Udio + ElevenLabs : Generate Games with AI (w/ Local LLMs)

Aider + Claude 3.5 Sonnet + DeepSeek : You can now GENERATE APPLICATIONS with ONE PROMPT (w/ Ollama)

Aider + Claude 3.5 Sonnet + DeepSeek : You can now GENERATE APPLICATIONS with ONE PROMPT (w/ Ollama)

"okay, but I want Llama 3 for my specific use case" - Here's how

"okay, but I want Llama 3 for my specific use case" - Here's how

TECNO CAMON 30 - ТЕЛЕФОН С НЕОБЫЧНЫМ ДИЗАЙНОМ И КРУТОЙ ФРОНТАЛКОЙ!

TECNO CAMON 30 - ТЕЛЕФОН С НЕОБЫЧНЫМ ДИЗАЙНОМ И КРУТОЙ ФРОНТАЛКОЙ!

Сравнение магазинов комплектующих 2024 г.

Сравнение магазинов комплектующих 2024 г.

🔥 Лютая вещь для геймеров Да и вообще для тех кто проводит время за компом 💻

🔥 Лютая вещь для геймеров Да и вообще для тех кто проводит время за компом 💻

КАКОЙ SAMSUNG КУПИТЬ В 2024 ГОДУ

КАКОЙ SAMSUNG КУПИТЬ В 2024 ГОДУ

Как теперь заказывать электронные компоненты в Китае?

Как теперь заказывать электронные компоненты в Китае?

Что делать если в телефон попала вода?

Что делать если в телефон попала вода?

Колхозный способ разблокировки iPhone #icloud от индуса

Колхозный способ разблокировки iPhone #icloud от индуса