How LLaVA works 🌋 A Multimodal Open Source LLM for image recognition and chat.

Поделиться
HTML-код
  • Опубликовано: 5 янв 2024
  • Arxiv Dives is a group from Oxen.ai of engineers, researchers, and practitioners that gets together every Friday to dig into state of the art research that relates to Machine Learning and Artificial Intelligence. If you would like to join the live discussion we would love to have you!
    Join here:
    lu.ma/oxenbookclub
    Each week we dive deep into a topic in ML/AI. Whether it is a research paper, a blog post, a book, or a RUclips video, we break down the content into a digestible format and have an open discussion with the Oxen.ai team, and anyone else who wants to join. We try to cover the content as high level so that anyone can understand it, and will dive into deeper technical details to get a clearer understanding.
    This week we cover the LLaVA paper which is a multimodal model that combines image recognition with an LLM through a chat like interface, removing the barrier to entry for many computer vision tasks.
    All the notes and previous dives can all be found on the Oxen.ai blog:
    blog.oxen.ai/tag/arxiv-dives/
  • НаукаНаука

Комментарии • 6

  • @albertmashy8590
    @albertmashy8590 7 месяцев назад +1

    Good video

  • @Pingu_astrocat21
    @Pingu_astrocat21 5 месяцев назад +2

    how can we fine-tune Llava on a custom image caption dataset?
    thank you for uploading this video:)

    • @oxen-ai
      @oxen-ai  5 месяцев назад

      It looks like they have some instructions in their github repo! github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md
      Also if you end up trying to fine tune, or need some people to collaborate with - let us know in our Discord: discord.com/invite/s3tBEn7Ptg

  • @Akshatgiri
    @Akshatgiri 5 месяцев назад +1

    That man is stressed. Give him a vacation

  • @bennguyen1313
    @bennguyen1313 5 месяцев назад

    I have pdf files of handwritten data that I'd like to OCR, perform calculations and finally edit or append the pdf with the results.
    I like the idea of using a Custom GPT, but only GPT4 Plus subscribers could use it. So I'd prefer a standalone browser or desktop solution, that anyone drag and drop a file into. However, not sure if ChatGPT4's API assistant has all the Vision / Ai PDF Plugin support.
    If using LLaVA+Ollama, would anyone who wants to use my application also need to install the 20GB Ollama?

    • @oxen-ai
      @oxen-ai  5 месяцев назад

      This is a good question - I haven't tried Ollama yet, but would be a cool integration to try. If you end up getting it working let us know in our discord! I'm sure people would be interested there.
      discord.com/invite/s3tBEn7Ptg