Local Real Time AI Speech to Image | Stable Diffusion, Faster-whisper, Python, ComfyUI ++

All About AI

Просмотров 17 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 дек 2024

Комментарии • 42

@-Evil-Genius- 10 месяцев назад ⁺⁵
🎯 Key Takeaways for quick navigation:
00:00 🎙️ *Introduction to Speech to Image App*
- Demonstration of the speech to image app.
- Initial test with voice commands to generate images.
- Introduction to combining speech with RUclips audio.
02:15 🔄 *Components of Low Latency Speech to Image*
- Overview of the components involved in low-latency speech to image.
- Flowchart showing the microphone, Faster Whisper, Comfy UI Python extension, and Stable Diffusion model.
- Mention of the need for a separate tutorial for detailed setup.
03:41 🖱️ *Comfy UI and Python Extension*
- Introduction to Comfy UI for stable diffusion model workflow.
- The role of the Comfy UI Python extension in converting the workflow into Python code.
- The simplicity of setting up Comfy UI for desired workflows.
05:49 🎛️ *Setting Up Faster Whisper for Audio*
- Explanation of setting up Faster Whisper for audio transcription.
- Reference to a previous tutorial on configuring Faster Whisper.
- Availability of Faster Whisper on the community GitHub.
07:12 🐍 *Python Code Overview for Speech to Image App*
- Walkthrough of the Python code implementing the speech to image app.
- Explanation of functions and nodes in the code.
- Customization options for parameters like prompt length and image size.
09:22 🌐 *Selecting Stable Diffusion Model and Flask App*
- Choosing the stable diffusion model using CIT AI.
- Creating a Flask app to display the generated images in real-time.
- Brief overview of the back-end and front-end functionalities.
11:54 🎬 *Testing Different Use Cases*
- Testing the app with a RUclips video from The Joe Rogan podcast.
- Additional tests with a bedtime story, Taylor Swift music video, and a MrBeast video.
- Impressions and reactions to the results of each test.
13:05 🚀 *Conclusion and Future Development*
- Expressing enjoyment in building and testing the app.
- Plans for future development and improvements.
- Encouragement to become a member for access to the GitHub and further content.
Made with HARPA AI
@adventurelens001 10 месяцев назад ⁺⁸
well done! You're one of the few channels actually moving this forward with real examples and use cases.
@AllAboutAI 10 месяцев назад ⁺²
thnx mate :)
@tensiondriven 10 месяцев назад ⁺³
Totally love it; I've been hacking together a realtime STT -> LLM + RAG system, pretty amazing that we can do so much with off-the-shelf stuff. The image generation is an interesting sort of curiosity, but I think we could get some real value if all the text was saved with timestamps to a database, then when certain phrases are detected, we could trigger an LLM to answer a question or even perform a task with something like CrewAI. So cool!! please keep making!
@brando2818 10 месяцев назад ⁺²
Omg. This is great. Could easily take this and add some logic where a person could create blog articles simply by talking.
@12around1 10 месяцев назад
these are golden guides. appreciating your content and considering become a member if i can afford it after the paycheck is smashed to survive.
keep em coming!
@kawsarahmad 10 месяцев назад ⁺⁴
Amazing as always man! Wonder what ideas will come to reality next...
@samuelsamuel5505 10 месяцев назад ⁺³
Application: This can replace sign language. This could be refined and used to communicate with the deaf
@RyanSmith-rb1ch 10 месяцев назад ⁺¹
I love you exploring with this kind of stuff.
@ryanjames3907 10 месяцев назад ⁺²
You are at the tip of the spear, thank you for sharing this.
@AllAboutAI 10 месяцев назад
np =)
@FBHearty 10 месяцев назад
Subscribed!
All subjects are amazing!
Unfortunatelly not member for some obvious reasons,
please share some stuff for non members you are the best user of IA I saw on the net
in the mind I love, offline and open source tools.
My english is not so good, I have to watch again and again to catch the spirit of your videos,
Some of your experiences with transcription provide an approach to breaking down the language barrier,
and more generally, to universal communication.
Merci beaucoup pour vos démonstrations fascinantes !
@music_anarchy 10 месяцев назад
That's awesome! So much you could do with this!!
@UnleashWukong 10 месяцев назад
People have been so terrified of AI taking over the world. For me, this is the most exciting and fun time in development history, since the dawn of the internet! AI has made everything so much more streamlined, time efficient and productive. What a great time!
@gregas3068 6 месяцев назад
Really great stuff. Hats off, mister...
@dannyquiroz5777 10 месяцев назад ⁺¹
Great job Kris
@rapidreplay360 10 месяцев назад ⁺³
please make full tutorial and instructions on github members
@AllAboutAI 10 месяцев назад ⁺¹
will do
@sirrobinofloxley7156 10 месяцев назад ⁺²
In the cheap seats here, ie not a member, but I would love to see the full version of this, and I think it would go crazy viral and do your channel a great, great service by getting you tons of views... But, that's just my thought if you are to release the full version. : )
@AllAboutAI 10 месяцев назад ⁺¹
thnx :) yeah might do that
@sirrobinofloxley7156 10 месяцев назад
@@AllAboutAI Great, lycka til : )
@patrickctaylor 10 месяцев назад
Very cool! I would like to see a full tutorial, and review the code too.. How large were the model and sensor downloads?
@avgplayer 10 месяцев назад ⁺¹
Full tutorial appreciated
@AllAboutAI 10 месяцев назад ⁺¹
noted :)
@tonywhite4476 10 месяцев назад ⁺¹
I have access to github but I don't see this repo
@AllAboutAI 10 месяцев назад
uploading soon :)
@raphaelmercadobinario 10 месяцев назад ⁺³
This is sooo cool, ehehhe
@casper6532 9 месяцев назад
Is there a full tutorial?
@KeithdNeves 9 месяцев назад
Super cool!
@musumo1908 10 месяцев назад
This rocks! Yes tutorial please. What level of membership to get access?? What spec HW to run this….Linux server?? Windows thx
@VaibhavShewale 10 месяцев назад ⁺¹
i tried to run comfy ui and it gave me blue screen of death to my laptop
@nrixxking2123 10 месяцев назад
this is amazing.. if this goes really well i would love to try this and even willing to pay for it.
@FSK1138 10 месяцев назад ⁺¹
this would be great for converting audio books into comics or movies
persistent characters would also be good
this is amazing please develop this more !!!
@sirrobinofloxley7156 10 месяцев назад
Yes, haha... Could literally put on an audio and watch the brand new movie everytime. Would be good to have different slants, aka themes... Outer space version, underwater version, Ancient Rome version etc... The world is a oyster... oops, careful though : )
@build.aiagents 10 месяцев назад
Phenomenal
@Boemie_Gayatri 10 месяцев назад
Mungkin pengembangan nya menjadikan gambar itu bergerak yaitu vidio
@hardcorebyjoshely 10 месяцев назад ⁺¹
Fire
@MinnaGoiken 10 месяцев назад
How can I become member to access your github ? The link shows nothing. ruclips.net/user/AllAboutAIjoin
@MinnaGoiken 10 месяцев назад
I found the reason. I am living in Georgia. I can not become member on RUclips channel from this country. It is so sad. Please give me the other way to see your github. Thank you.
@Nerf_Jeez 8 месяцев назад ⁺¹
AHAHAHHAHAHA, love this!

Следующие

Автовоспроизведение

Local AI Speech to Image - Low Effort High Reward Use Case?