You have a very unique way of explaining deep learning concepts. The illustrations are very concise and to the point which really helps focus on the core concepts and not get distracted by technical details. Thanks for making this great video!
As a visual thinker, SD can be quite overwhelming under the hood. I have been using the graphical interface "Comfyui" and it has taken me quite a distance in understanding the dynamics of SD. Your video and page helped me a lot in taking the next step to the more advanced features and expanding my options. Thanks Jay!
Thanks Jay! just like your NLP Transformer series which still stands tall with the test of time.., one more added to the my list of go--to reference.! you are indeed a master in the art of teaching!!
Thanks Jay for the video, the concept of converting noised image to a clear image is understood. How does it creates a image which doesn't exist in its training ? It is understood that the model doesn't understand the concepts of the image and only focuses on the patterns. But how is the below operations performed, 1. Creating a cartoon image of cat based on caption ex: Place a hat on top of cat How does it creates a cartoon image of cat ? How does it know the exact location of cat's head ? How does it know to place the hat exactly at the head ? 2. A closeup shot of a dog facing the sun How does it knows to create a close shot of a dog ? How does it know to place the sun in the background ? How it makes the the object to turn towards the sun ? No videos exist to explain this concept. It would be of great help if you could make a video on this.
It renders the image from text instead of a 3D model. Its like Maya-but with words, and using 1B+’pre-trained models (images with their text descriptions) from the Internet wired up with plain English, so you don’t have to build the models in 3D, you can just type what you want to create using plain English, and the AI renders out the image.
Thanks Jay - I had been looking for something that does more than describe the denoising process and the attention bit related to prompts is what I was missing. That said, I still can't quite understand how you get a completely new image. I can understand that you should be able to get back to an original image (say a dog, or a flower) via the noisification and reverse process, but how can it, say, create an image with a flower and the dog such they are integrated in some way? Where does that data that come from? A visual example of the earlier stages which show this would be helpful. The examples you had jumped from basically to an image (albeit unrefined) in 3 steps - I'd like to see this broken down so I can "see" what is happening. Still requires a level of acceptance without evidence that I am not happy with....
Thanks Jay for all your efforts to share a bit of your knowledge in AI. I am not an expert, by far, but I came to the conclusion that AI is mainly a construction of hundreds of lego bricks, assembled together into specific architectures and trained with the same gradient back propagation algorithm. Some of them perform well some other don't. Therefore, the only genuine piece of AI theory is the mathematical background of the training algorithm. The rest is pure heuristics more or less well explained, a kind of AI cook books with ad hoc recipees. The training algorithm itself seems very limited (even if highly powerful), since it is applied in a centralized way onto a predefined architecture and does not participate to the architecture topology definition. In other words, the topology is defined before the training while, intituively, the training should probably define the topology. Therefore incremental learning remains a big issue in most of the AI architectures if not all. This lack of a consistent and unified AI theory (there is no, to my limited knowledge, any AI theorems nor demonstrations that some sort of optimum is reached using a given architecture) makes me believe that we are at the very beginning of a new science still to come. Could you react to the above humble considerations and share your thoughts ? Kind regards,
Dear Sir, I am your Subscriber I want to create a tool that finds text errors in the image. For Example: if I forgot to write CONTACT US, BUY NOW, CONTACT NUMBER, SPELLING MISTAKE, etc... in my social media post. that the tool finds error and suggests what are missing or what is incorrect in social media post. 🙏 Please guide me and suggest what course I need to buy or what I need to learn to create this tool Thank you!
Thanks for the explanation. Can you please make a 1 hr or 2hr video with more deep dive into the internal? Maybe you already have it recorded I guess. Thanks.
Is this simplified explanation of the process of noise in Stable Diffusion true? It's like teaching an artist about our visual world -- object definitions, shapes, dimensions, etc., and how they correspond to the person who commissioned the art (text prompts). The artist then watches a mosaic - say of an ice cream - being inserted by hundreds of tesserae (rectangular slabs used to create a mosaic) and then removed to restore the original mosaic. During this, the artist learns how to understand, recreate, and reinterpret the ‘ice cream’ image in other mosaics. The artist goes through this with millions of other depictions in mosaics (objects, locations, etc.) so they can create entirely new mosaics based on the requests (or text prompts) of the person commissioning them. Sampling steps are like commissioning an artist to interpret and construct a mosaic quickly or carefully. The more detail or accuracy you want, the more work and time have to go into it.
Would you kindly tell me if it is possible to sell the artwork that I made with stable diffusion , and does the administration allow this, and how can I communicate with them i mban the mangemment or soppert for this program-, and where can the pictures be sold as pieces of art? I do not speak English, help me
Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art. Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.
@@youtuberaphaell nope. It aint shit. Even with a shortcut. You will still have zero talent or expression. Anyone can say those words. So you have the same skill and expressive power as a toddler. Enjoy. Pretend with your orgy of robots all you like. But you are not special.
Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art. Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.
Thanks, great video again, but Your voice has a lot of sibilants, making the listening experience is atrocious. If you make enough money making these videos, I suggest hiring a professional audio producer/mixing guy to clean up the audio. Email me, I'll suggest someone.
Amazing Jalal, best material I've seen so far describing this matter.
Keep up the great work old friend😊
Glad to hear it, bro! Hope you're doing excellent!
You have a very unique way of explaining deep learning concepts. The illustrations are very concise and to the point which really helps focus on the core concepts and not get distracted by technical details. Thanks for making this great video!
Good for you, I understood nothing. Some concrete technical detail would have helped me.
As a visual thinker, SD can be quite overwhelming under the hood. I have been using the graphical interface "Comfyui" and it has taken me quite a distance in understanding the dynamics of SD. Your video and page helped me a lot in taking the next step to the more advanced features and expanding my options. Thanks Jay!
Thank you Sir for sharing ..your explanation is always different ..from transformer architecture i am following you..great
Thanks Jay! just like your NLP Transformer series which still stands tall with the test of time.., one more added to the my list of go--to reference.! you are indeed a master in the art of teaching!!
Very practical and useful information. Thanks!
Excellent video, thank you!
Thank you! I finally understand Stable Diffusion!
Thanks Jay for the video, the concept of converting noised image to a clear image is understood.
How does it creates a image which doesn't exist in its training ?
It is understood that the model doesn't understand the concepts of the image and only focuses on the patterns.
But how is the below operations performed,
1. Creating a cartoon image of cat based on caption ex: Place a hat on top of cat
How does it creates a cartoon image of cat ?
How does it know the exact location of cat's head ?
How does it know to place the hat exactly at the head ?
2. A closeup shot of a dog facing the sun
How does it knows to create a close shot of a dog ?
How does it know to place the sun in the background ?
How it makes the the object to turn towards the sun ?
No videos exist to explain this concept. It would be of great help if you could make a video on this.
Nice explanation, thanks!
Great explaination, loved it!
Good video, very inspiring😁
More than useful. Thanks
Thank you for this great explanation!
great Jay!
It renders the image from text instead of a 3D model. Its like Maya-but with words, and using 1B+’pre-trained models (images with their text descriptions) from the Internet wired up with plain English, so you don’t have to build the models in 3D, you can just type what you want to create using plain English, and the AI renders out the image.
Thanks Jay - I had been looking for something that does more than describe the denoising process and the attention bit related to prompts is what I was missing. That said, I still can't quite understand how you get a completely new image. I can understand that you should be able to get back to an original image (say a dog, or a flower) via the noisification and reverse process, but how can it, say, create an image with a flower and the dog such they are integrated in some way? Where does that data that come from? A visual example of the earlier stages which show this would be helpful. The examples you had jumped from basically to an image (albeit unrefined) in 3 steps - I'd like to see this broken down so I can "see" what is happening. Still requires a level of acceptance without evidence that I am not happy with....
Thanks Jay for all your efforts to share a bit of your knowledge in AI.
I am not an expert, by far, but I came to the conclusion that AI is mainly a construction of hundreds of lego bricks, assembled together into specific architectures and trained with the same gradient back propagation algorithm. Some of them perform well some other don't.
Therefore, the only genuine piece of AI theory is the mathematical background of the training algorithm. The rest is pure heuristics more or less well explained, a kind of AI cook books with ad hoc recipees.
The training algorithm itself seems very limited (even if highly powerful), since it is applied in a centralized way onto a predefined architecture and does not participate to the architecture topology definition. In other words, the topology is defined before the training while, intituively, the training should probably define the topology.
Therefore incremental learning remains a big issue in most of the AI architectures if not all.
This lack of a consistent and unified AI theory (there is no, to my limited knowledge, any AI theorems nor demonstrations that some sort of optimum is reached using a given architecture) makes me believe that we are at the very beginning of a new science still to come.
Could you react to the above humble considerations and share your thoughts ?
Kind regards,
Dear Sir, I am your Subscriber
I want to create a tool that finds text errors in the image.
For Example:
if I forgot to write CONTACT US, BUY NOW, CONTACT NUMBER, SPELLING MISTAKE, etc... in my social media post.
that the tool finds error and suggests what are missing or what is incorrect in social media post.
🙏 Please guide me and suggest what course I need to buy or what I need to learn to create this tool
Thank you!
Great Work, Good luck
Thanks for the explanation. Can you please make a 1 hr or 2hr video with more deep dive into the internal? Maybe you already have it recorded I guess. Thanks.
Excelente!!!
Is this simplified explanation of the process of noise in Stable Diffusion true?
It's like teaching an artist about our visual world -- object definitions, shapes, dimensions, etc., and how they correspond to the person who commissioned the art (text prompts).
The artist then watches a mosaic - say of an ice cream - being inserted by hundreds of tesserae (rectangular slabs used to create a mosaic) and then removed to restore the original mosaic. During this, the artist learns how to understand, recreate, and reinterpret the ‘ice cream’ image in other mosaics. The artist goes through this with millions of other depictions in mosaics (objects, locations, etc.) so they can create entirely new mosaics based on the requests (or text prompts) of the person commissioning them.
Sampling steps are like commissioning an artist to interpret and construct a mosaic quickly or carefully. The more detail or accuracy you want, the more work and time have to go into it.
Thanks, sir!
excellent
thank you
Simple question: does that mean it can't create a prompt (or specific word) that it hasn't been trained on? Thank you for your video!
Oppose to the end!
love from Saudi arabia!
Would you kindly tell me if it is possible to sell the artwork that I made with stable diffusion , and does the administration allow this, and how can I communicate with them i mban the mangemment or soppert for this program-, and where can the pictures be sold as pieces of art? I do not speak English, help me
👆just like the transformers series, excellent
*AI Pictures. Art means craftsmanship and personal expression
Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art.
Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.
Writing the prompts is personal expression
@@youtuberaphaell nope. It aint shit. Even with a shortcut. You will still have zero talent or expression. Anyone can say those words.
So you have the same skill and expressive power as a toddler. Enjoy.
Pretend with your orgy of robots all you like. But you are not special.
@@youtuberaphaell so is ordering food at a restaurant but that does not make you a chef.😉
@@youtuberaphaell no it isn't Writing a full text by yourself is.
Not all photographs are art, but photography can be an art. The nonsense I draw in a game of Pictionary has no craftsmanship or personal expression. However, illustration is an important form of art.
Not only would the AI never produce art on its own, it would never produce anything. The amount of craftsmanship and personal expression being put into the image is dependent on the person using it. A low effort random prompt to the AI is arguably not art, but that's not really the point.
Thanks, great video again, but Your voice has a lot of sibilants, making the listening experience is atrocious. If you make enough money making these videos, I suggest hiring a professional audio producer/mixing guy to clean up the audio. Email me, I'll suggest someone.
This is not art don't be silly
No, this is revolution
Thank you, really nicely explained!