🎯 Key Takeaways for quick navigation: 00:00 🎙️ *Introduction to Speech to Image App* - Demonstration of the speech to image app. - Initial test with voice commands to generate images. - Introduction to combining speech with RUclips audio. 02:15 🔄 *Components of Low Latency Speech to Image* - Overview of the components involved in low-latency speech to image. - Flowchart showing the microphone, Faster Whisper, Comfy UI Python extension, and Stable Diffusion model. - Mention of the need for a separate tutorial for detailed setup. 03:41 🖱️ *Comfy UI and Python Extension* - Introduction to Comfy UI for stable diffusion model workflow. - The role of the Comfy UI Python extension in converting the workflow into Python code. - The simplicity of setting up Comfy UI for desired workflows. 05:49 🎛️ *Setting Up Faster Whisper for Audio* - Explanation of setting up Faster Whisper for audio transcription. - Reference to a previous tutorial on configuring Faster Whisper. - Availability of Faster Whisper on the community GitHub. 07:12 🐍 *Python Code Overview for Speech to Image App* - Walkthrough of the Python code implementing the speech to image app. - Explanation of functions and nodes in the code. - Customization options for parameters like prompt length and image size. 09:22 🌐 *Selecting Stable Diffusion Model and Flask App* - Choosing the stable diffusion model using CIT AI. - Creating a Flask app to display the generated images in real-time. - Brief overview of the back-end and front-end functionalities. 11:54 🎬 *Testing Different Use Cases* - Testing the app with a RUclips video from The Joe Rogan podcast. - Additional tests with a bedtime story, Taylor Swift music video, and a MrBeast video. - Impressions and reactions to the results of each test. 13:05 🚀 *Conclusion and Future Development* - Expressing enjoyment in building and testing the app. - Plans for future development and improvements. - Encouragement to become a member for access to the GitHub and further content. Made with HARPA AI
Totally love it; I've been hacking together a realtime STT -> LLM + RAG system, pretty amazing that we can do so much with off-the-shelf stuff. The image generation is an interesting sort of curiosity, but I think we could get some real value if all the text was saved with timestamps to a database, then when certain phrases are detected, we could trigger an LLM to answer a question or even perform a task with something like CrewAI. So cool!! please keep making!
these are golden guides. appreciating your content and considering become a member if i can afford it after the paycheck is smashed to survive. keep em coming!
Subscribed! All subjects are amazing! Unfortunatelly not member for some obvious reasons, please share some stuff for non members you are the best user of IA I saw on the net in the mind I love, offline and open source tools. My english is not so good, I have to watch again and again to catch the spirit of your videos, Some of your experiences with transcription provide an approach to breaking down the language barrier, and more generally, to universal communication. Merci beaucoup pour vos démonstrations fascinantes !
People have been so terrified of AI taking over the world. For me, this is the most exciting and fun time in development history, since the dawn of the internet! AI has made everything so much more streamlined, time efficient and productive. What a great time!
In the cheap seats here, ie not a member, but I would love to see the full version of this, and I think it would go crazy viral and do your channel a great, great service by getting you tons of views... But, that's just my thought if you are to release the full version. : )
this would be great for converting audio books into comics or movies persistent characters would also be good this is amazing please develop this more !!!
Yes, haha... Could literally put on an audio and watch the brand new movie everytime. Would be good to have different slants, aka themes... Outer space version, underwater version, Ancient Rome version etc... The world is a oyster... oops, careful though : )
I found the reason. I am living in Georgia. I can not become member on RUclips channel from this country. It is so sad. Please give me the other way to see your github. Thank you.
🎯 Key Takeaways for quick navigation:
00:00 🎙️ *Introduction to Speech to Image App*
- Demonstration of the speech to image app.
- Initial test with voice commands to generate images.
- Introduction to combining speech with RUclips audio.
02:15 🔄 *Components of Low Latency Speech to Image*
- Overview of the components involved in low-latency speech to image.
- Flowchart showing the microphone, Faster Whisper, Comfy UI Python extension, and Stable Diffusion model.
- Mention of the need for a separate tutorial for detailed setup.
03:41 🖱️ *Comfy UI and Python Extension*
- Introduction to Comfy UI for stable diffusion model workflow.
- The role of the Comfy UI Python extension in converting the workflow into Python code.
- The simplicity of setting up Comfy UI for desired workflows.
05:49 🎛️ *Setting Up Faster Whisper for Audio*
- Explanation of setting up Faster Whisper for audio transcription.
- Reference to a previous tutorial on configuring Faster Whisper.
- Availability of Faster Whisper on the community GitHub.
07:12 🐍 *Python Code Overview for Speech to Image App*
- Walkthrough of the Python code implementing the speech to image app.
- Explanation of functions and nodes in the code.
- Customization options for parameters like prompt length and image size.
09:22 🌐 *Selecting Stable Diffusion Model and Flask App*
- Choosing the stable diffusion model using CIT AI.
- Creating a Flask app to display the generated images in real-time.
- Brief overview of the back-end and front-end functionalities.
11:54 🎬 *Testing Different Use Cases*
- Testing the app with a RUclips video from The Joe Rogan podcast.
- Additional tests with a bedtime story, Taylor Swift music video, and a MrBeast video.
- Impressions and reactions to the results of each test.
13:05 🚀 *Conclusion and Future Development*
- Expressing enjoyment in building and testing the app.
- Plans for future development and improvements.
- Encouragement to become a member for access to the GitHub and further content.
Made with HARPA AI
well done! You're one of the few channels actually moving this forward with real examples and use cases.
thnx mate :)
Totally love it; I've been hacking together a realtime STT -> LLM + RAG system, pretty amazing that we can do so much with off-the-shelf stuff. The image generation is an interesting sort of curiosity, but I think we could get some real value if all the text was saved with timestamps to a database, then when certain phrases are detected, we could trigger an LLM to answer a question or even perform a task with something like CrewAI. So cool!! please keep making!
Omg. This is great. Could easily take this and add some logic where a person could create blog articles simply by talking.
these are golden guides. appreciating your content and considering become a member if i can afford it after the paycheck is smashed to survive.
keep em coming!
Amazing as always man! Wonder what ideas will come to reality next...
Application: This can replace sign language. This could be refined and used to communicate with the deaf
I love you exploring with this kind of stuff.
You are at the tip of the spear, thank you for sharing this.
np =)
Subscribed!
All subjects are amazing!
Unfortunatelly not member for some obvious reasons,
please share some stuff for non members you are the best user of IA I saw on the net
in the mind I love, offline and open source tools.
My english is not so good, I have to watch again and again to catch the spirit of your videos,
Some of your experiences with transcription provide an approach to breaking down the language barrier,
and more generally, to universal communication.
Merci beaucoup pour vos démonstrations fascinantes !
That's awesome! So much you could do with this!!
People have been so terrified of AI taking over the world. For me, this is the most exciting and fun time in development history, since the dawn of the internet! AI has made everything so much more streamlined, time efficient and productive. What a great time!
Really great stuff. Hats off, mister...
Great job Kris
please make full tutorial and instructions on github members
will do
In the cheap seats here, ie not a member, but I would love to see the full version of this, and I think it would go crazy viral and do your channel a great, great service by getting you tons of views... But, that's just my thought if you are to release the full version. : )
thnx :) yeah might do that
@@AllAboutAI Great, lycka til : )
Very cool! I would like to see a full tutorial, and review the code too.. How large were the model and sensor downloads?
Full tutorial appreciated
noted :)
I have access to github but I don't see this repo
uploading soon :)
This is sooo cool, ehehhe
Is there a full tutorial?
Super cool!
This rocks! Yes tutorial please. What level of membership to get access?? What spec HW to run this….Linux server?? Windows thx
i tried to run comfy ui and it gave me blue screen of death to my laptop
this is amazing.. if this goes really well i would love to try this and even willing to pay for it.
this would be great for converting audio books into comics or movies
persistent characters would also be good
this is amazing please develop this more !!!
Yes, haha... Could literally put on an audio and watch the brand new movie everytime. Would be good to have different slants, aka themes... Outer space version, underwater version, Ancient Rome version etc... The world is a oyster... oops, careful though : )
Phenomenal
Mungkin pengembangan nya menjadikan gambar itu bergerak yaitu vidio
Fire
How can I become member to access your github ? The link shows nothing. ruclips.net/user/AllAboutAIjoin
I found the reason. I am living in Georgia. I can not become member on RUclips channel from this country. It is so sad. Please give me the other way to see your github. Thank you.
AHAHAHHAHAHA, love this!