InternLM-2.5 (7b) : This NEW Model BEATS Qwen-2 & Llama-3 in Benchmarks! (Fully Tested)

Поделиться
HTML-код
  • Опубликовано: 8 авг 2024
  • In this Video, I'll be telling you about the newly released InternLM-2.5 7b Model. This new model comes with a 1M Token Context Limit which is really amazing. This new Model claims to beat Qwen-2, Llama-3, Claude, DeepSeek and other Opensource LLMs. I'll be testing it out in this video. Watch the video to find more about this new model. It also beats Qwen-2, DeepSeek Coder, Codestral in all kinds of coding tasks.
    ------
    Key Takeaways:
    🌟 InternLM 2.5 Launch: Just launched, InternLM 2.5 is the latest AI model, outperforming Llama 3 and Gemma 2 9B in practical scenarios.
    🚀 7 Billion Parameters: With 7 billion parameters, InternLM 2.5 offers outstanding reasoning capabilities and a long context window, perfect for complex AI tasks.
    🏆 Benchmark Dominance: InternLM 2.5 excels in MMLU, CMMLU, BBH, and MATH benchmarks, showcasing superior performance against larger models.
    🔧 Tool Usage: InternLM 2.5 excels at tool usage, making it ideal for applications that involve web search and other integrated tools.
    📊 Real-World Performance: Despite benchmark success, real-world performance is where InternLM 2.5 shines, particularly in coding tasks with its 1M-long context window.
    💻 Available on Major Platforms: Now accessible on Ollama, HuggingFace, and more, making it easy to test and integrate InternLM 2.5 into your projects.
    🤖 Hands-On Testing: Watch as we put InternLM 2.5 through various language and coding tasks, highlighting its strengths and weaknesses.
    ------
    Timestamps:
    00:00 - Introduction
    00:07 - About InternLM-2.5 (7b with 1Million Context)
    01:16 - Benchmarks
    03:03 - Testing
    07:53 - Conclusion
  • НаукаНаука

Комментарии • 30

  • @user-no4nv7io3r
    @user-no4nv7io3r Месяц назад +11

    They train their models on benchmarking, claim to beat everyone else, turned out to be trash in most cases, what a crazy world we are living in

    • @superakaike
      @superakaike Месяц назад

      They also train their model on ChatGPT answers ...

  • @wolraikoc
    @wolraikoc Месяц назад +8

    A copilot video with this model and neovim would be aweseome!

    • @Link-channel
      @Link-channel Месяц назад

      I wonder how to integrate autoconpletion in vim, no wait, I wonder how to use vim

  • @nahuelpiguillem2949
    @nahuelpiguillem2949 Месяц назад +1

    Thank you for doing honest review, it's rare to find someone saying "i tested and it is not worth it". Sometimes the last thing it isn't the best

  • @BadreddineMoon
    @BadreddineMoon Месяц назад +2

    I'm addicted to your videos, keep up the good work ❤

    • @user-no4nv7io3r
      @user-no4nv7io3r Месяц назад +1

      @@BadreddineMoon me too especially his voice and tone and critiques that's magical

  • @waveboardoli2
    @waveboardoli2 Месяц назад +4

    Can you show how to use claude-engineer with opensource models?

  • @sammcj2000
    @sammcj2000 Месяц назад +2

    I’d be interested in you trying it with coding with a number of different parameters (topp/k, temp, rep penalty etc)

  • @Revontur
    @Revontur Месяц назад +1

    as always a great video... thanks for your effort. Is there any site, where you publish your tests ? because it would be really great to compare new models with previous tested models.

  • @RedOkamiDev
    @RedOkamiDev Месяц назад

    Thanks Mr. AiKing, you are my daily source of AI news :)

  • @pudochu
    @pudochu Месяц назад

    6:47 How can I find the test here? It would also be great if they have answers.

  • @jaysonp9426
    @jaysonp9426 Месяц назад

    You didn't test the needle in a haystack or what it does with 1m tokens?

  • @tianjin8208
    @tianjin8208 Месяц назад

    Intern series always train their models on eval dataset, it's their style, they need to surpass others quikly, so this is the fast way.

  • @paulyflynn
    @paulyflynn Месяц назад

    What size codebase will 1M Token Context support? Is there a LOC to Token formula?

  • @SpikyRoss
    @SpikyRoss Месяц назад

    Hey, It would be great if you could add the links to the model in the description. 👍

  • @LucasMiranda2711
    @LucasMiranda2711 Месяц назад

    Which one was the best tested until now? Any place or anyone counting the scores?

    • @AICodeKing
      @AICodeKing  Месяц назад +1

      Currently, Qwen-2 is topping my list for general tasks and for coding DeepSeek-Coder-V2

  • @elchippe
    @elchippe Месяц назад

    Draw a butterfly in svg? Those task would be hard for a large LLM like claude and way more for an 7B LLM.
    The transformer architecture biggest drawback is inability to rethink backwards, that is why this models mostly fail in these puzzles.

    • @AICodeKing
      @AICodeKing  Месяц назад +1

      I generally do that test to check wheither the LLM can create something similar. Claude & GPT can do this. Also, I don't do other tests for smaller ones the tests are similar wheither it be 7b or 300b

  • @EladBarness
    @EladBarness Месяц назад

    Hype for nothing, wouldn’t count on it in anything… thank you for the video!

  • @john_blues
    @john_blues Месяц назад

    If it can't build a basic python script, why would I want it chatting with my codebase? Anyway, thanks for the video and the actual testing on this.

  • @MeinDeutschkurs
    @MeinDeutschkurs Месяц назад

    The model seems to be horrendous! Thx for saving my time.

  • @Richi-8
    @Richi-8 Месяц назад

    Which one do you consider to be the best model for general tasks nowadays?

  • @Lemure_Noah
    @Lemure_Noah Месяц назад

    This model is good in benchmarks, but it doesn't seem to be better than other moderm models like Llama-3, Phi-3 or even Mistral 7, at least on my internal review, dealing with summarization and other language tasks.
    If someone could give real word example where it performs better than other models on same class, please share it ;)

  • @aryindra2931
    @aryindra2931 Месяц назад

    Please make 2 day❤❤, i like video

  • @LazarMateev
    @LazarMateev 29 дней назад

    Merge maestro with claude engineer and aider into 1. Make it is open source model orchestration recalling initial prompt with ascces to RAG and you would be the king of the kings 😊 locally hosted web apps looks very cool niche

  • @hollidaycursive
    @hollidaycursive Месяц назад

    Pre-watch Comment