MiniCPM-V 2.6 Training Guide

Поделиться
HTML-код
  • Опубликовано: 9 фев 2025
  • MiniCPM-V is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text outputs.
    MiniCPM-V 2.6: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model surpasses GPT-4V in single image, multi-image and video understanding. It outperforms GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet in single image understanding, and advances MiniCPM-Llama3-V 2.5's features such as strong OCR capability, trustworthy behavior, multilingual support, and end-side deployment. Due to its superior token density, MiniCPM-V 2.6 can for the first time support real-time video understanding on end-side devices such as iPad.
    📚:modelbest.feis...

Комментарии • 4

  • @alexanderholthoer8972
    @alexanderholthoer8972 10 дней назад

    Fantastic!
    Could ths model be trained to analyse sports techniques? This is a game changer!!!

  • @Widrohi
    @Widrohi 5 месяцев назад

    Watching it 13th time .. Love you Guys Love the Girl which explains to me even 13th time lol. will be here for 130000th time too..

  • @Widrohi
    @Widrohi 5 месяцев назад

    Thanks for the explanations.
    Your model really performs very well. Kudos to all the hard work done.
    Guys please hire me XD.

  • @aipreacher9378
    @aipreacher9378 5 месяцев назад

    Good work