Build an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper

Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

THE WEEKND - SÃO PAULO LIVESTREAM

What is the GREATEST Fortnite Mythic of All Time?

Pokémon Black, but I'm Team Sky

Imp-V1-3B: How a Tiny Model is Beating Giants in Multimodal LLM Space

AI Anytime

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024
Join me on an exciting journey as we dive deep into the world of multimodal small language models (MSLMs) with a special focus on the ground-breaking "Imp project". In this tutorial, I unveil the capabilities of imp-v1-3b, a potent MSLM with a mere 3 billion parameters, crafted by integrating a compact yet formidable small language model, Phi-2, and an advanced visual encoder, SigLIP.
Discover how imp-v1-3b stands tall among its peers, not only outshining models of similar size but even surpassing the performance of the much larger LLaVA-7B model across a variety of multimodal benchmarks. This video is your ultimate guide to understanding and utilizing this powerful model, which is trained on the comprehensive LLaVA-v1.5 dataset.
I'll walk you through practical examples demonstrating the model's prowess in generating test cases for application screenshots, analysing stock charts, and providing insights into medical images, among other use cases. Whether you're a developer, researcher, or enthusiast in the fields of AI and machine learning, you'll find valuable insights and inspiration on how to leverage the power of imp-v1-3b for your projects.
Stay tuned as I also share a sneak peek into the model's architecture, the secret sauce behind its efficiency, and how you can get started with using the model weights in your own applications.
Don't forget to like, comment, and subscribe to my channel for more updates on this and other exciting developments in the world of Gen AI and machine learning. Your support helps me create more content like this. If you have questions or would like to see more use cases, feel free to drop a comment below. Let's embark on this learning adventure together and unlock the full potential of multimodal small language models!
Join this channel to get access to perks:
/ @aianytime
GitHub Code: github.com/AIA...
HF Repo: huggingface.co...
#multimodal #ai #llm

Комментарии • 26

@souvickdas5564 7 месяцев назад ⁺²
How to build llm for native languages like Hindi Bengali, Tamil in low resource infrastructure! Is there any framework that supports this?
@xgodo-com 3 месяца назад
is it possible to train this model for grounding tasks? Such as object localization? Similar to what cogagent does but on my custom dataset
@Ayushsingh019 7 месяцев назад
Great effort and could you please do a video on instructBLIP on custom dataset or medical dataset?
@souvickdas5564 6 месяцев назад
I am having one problem with input context length. For example given a research paper, I am trying to find relevant papers from the vector db containing 2000 papers. How to fit the entire research paper as the input? Is there any way to solve the problem? Also the vector db is huge. Is there any way to manage it efficiently?
@krishnagupta-ti8ch 7 месяцев назад
Ultimate bro, thanks for sharing ❤
@AIAnytime 7 месяцев назад
🤝
@user-iu4id3eh1x 7 месяцев назад
Fantastic thanks for sharing
@user-me9gf5js8i 7 месяцев назад
Hi Bro..., Your Videos are very helpful . Could you pls make a video on implementing Multi modal Capture and Multi modal Rendering using Dialogflow CX.
@muhammedajmalg6426 7 месяцев назад
great work, thanks for sharing!
@AIAnytime 7 месяцев назад
Thanks for watching!
@MukeshSharma-xq9nm 7 месяцев назад
@@AIAnytime Hey bro , a request RAG to use for excel data insights using any best open source LLM for data summarization and data understanding
@user-do4oi4do1v 7 месяцев назад
hello sir, how to recognize overfitting while finetuning.And make that finetune model better and better. can u make a video.
@kollaindrakotiekshith8896 7 месяцев назад ⁺¹
Make more on medical models
@sam5598 5 месяцев назад
informative! can you do a video on attacks on tiny models?
@xspydazx 5 месяцев назад
you can create a model from its config fileie model from config (it will not download the base but generate a new base model with random weights) so if you know the size of the embeddings and the size of the context window etc you can create a config file and instanciate aa new model , with 16 layers to make a 3b model ... but it will need training. (they train quite fast)
@July_Nov-r5o 7 месяцев назад
can you create a powerful RAG based search and summarization for excel sheets , as i have found many videos for EAG but it works only best for PDF , Thanks any idea will be appreciated
@SonGoku-pc7jl 7 месяцев назад
thanks, is fantastic :)
@AIAnytime 7 месяцев назад
Glad you like it!
@MrKB_SSJ2 7 месяцев назад
How can I fine tune a LLM such that it only outputs in json ?
@saumyajaiswal6585 7 месяцев назад
Please please get a video with Llava where chatbot gives images along with text in answers which are present in pdf.Will it also work for tables in pdf better than PandasAI and Llama2?🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏
@AngelWhite007 7 месяцев назад
Amazing
@AIAnytime 7 месяцев назад
Thank you! Cheers!
@ARkhan-xw8ud 7 месяцев назад
Any open source multilingual model
@ibrahim-sf9od 4 месяца назад
Coming soon brother
@alejandraporter2348 5 месяцев назад
😈 *PromoSM*

Следующие

Автовоспроизведение

Build an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper

Build an AI Voice Assistant App using Multimodal LLM "Llava" and Whisper

Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source

Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

THE WEEKND - SÃO PAULO LIVESTREAM

THE WEEKND - SÃO PAULO LIVESTREAM

What is the GREATEST Fortnite Mythic of All Time?

What is the GREATEST Fortnite Mythic of All Time?

Pokémon Black, but I'm Team Sky

Pokémon Black, but I'm Team Sky

I remade Pikmin in 24 hours.

I remade Pikmin in 24 hours.

Crawl4AI: The Ultimate Web Scraping Tool for AI🚀

Crawl4AI: The Ultimate Web Scraping Tool for AI🚀

Cursor Is Beating VS Code (...by forking it)

Cursor Is Beating VS Code (...by forking it)

Build a Medical RAG App using BioMistral, Qdrant, and Llama.cpp

Build a Medical RAG App using BioMistral, Qdrant, and Llama.cpp

Fine Tune a Multimodal LLM "IDEFICS 9B" for Visual Question Answering

Fine Tune a Multimodal LLM "IDEFICS 9B" for Visual Question Answering

✅ Easiest Way to Build AI Agents With RAG & CrewAI Locally

✅ Easiest Way to Build AI Agents With RAG & CrewAI Locally

Little-Known AI Tools Giving Academics an Unfair Advantage

Little-Known AI Tools Giving Academics an Unfair Advantage

Local Low Latency Speech to Speech - Mistral 7B + OpenVoice / Whisper | Open Source AI

Local Low Latency Speech to Speech - Mistral 7B + OpenVoice / Whisper | Open Source AI

Chat with Video File using Qwen2 VL Model

Chat with Video File using Qwen2 VL Model

I Analyzed My Finance With Local LLMs

I Analyzed My Finance With Local LLMs

Apple Event - September 9

Apple Event - September 9

Minecraft: Destroying Freddy Fazbears Pizza ! 😱🤡🦧 (Did you do this with はいよろこんで ) #shorts

Minecraft: Destroying Freddy Fazbears Pizza ! 😱🤡🦧 (Did you do this with はいよろこんで ) #shorts

ФОКУС -СВЕТОФОР

ФОКУС -СВЕТОФОР

Qizim 95-qism (milliy serial) | Қизим 95 қисм (миллий сериал)

Qizim 95-qism (milliy serial) | Қизим 95 қисм (миллий сериал)

Поджелудочная не будет без ЭТОГО работать #поджелудочная #желчныйпузырь #психосоматика #здоровье

Поджелудочная не будет без ЭТОГО работать #поджелудочная #желчныйпузырь #психосоматика #здоровье

Сломалась багги на Ладоге! Самая эпичная доставка запчастей!

Сломалась багги на Ладоге! Самая эпичная доставка запчастей!

Новгородские пираты - ушкуйники

Новгородские пираты - ушкуйники

Почему хамы долго не живут? #типылюдей #психология #общение #психоанализ

Почему хамы долго не живут? #типылюдей #психология #общение #психоанализ