ye zyda variance ko solve krne ki approach thi na ki koi timepass explanation , so its good ki aap ne sikha diya ye , aage kahi aur kam aa jayenga aisa hi kuch . . . 😃
Sir, you're absolutely gem. Incredible!! This type of intuition behind scaling really hats of to you. because of this topic most of my variance and softmax related concepts got clear.. Thank you so much sir. I always recommend everyone to watch your playlists for Data Science/ML concepts.
No Sir This is Great Explanation Ever , the Content your are providing , I think No One is Providing this type of Content With Same Effort and Same Energy . And The Best Part is that you explanation makes the Topic Simple and Easy
Amazing Explanation Bhaiya, Hats Off To You. Aap hame bhi is knowledge ka source bataiye na, Ye sara knowledge kaha se mila? Kosni books me diya hai ye itna details me? Please, Bhaiya, bohot help hogi hamari.
Sir, I want to buy dsmp 2.0 course and I'm from Bangladesh. Paypal isn’t available in our country. Can I pay through card? Or is there any other method? Please reply, I am very very interested in buying the course but I don't know how to make payment from Bangladesh for this course.
Nitish!!! .. truly awesome!! .. outstanding!! ... remarkable !! ....rare to find a gem like you who illuminates the intricate world of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) but also makes it accessible to both novices and professionals alike is rare. Explanation of transformers with self-attention mechanisms is a standout. It's a concept that lies at the heart of many modern AI breakthroughs. This video is out of the world explanation .. why to divid QKV by root of d_k .. no one can explain this level of details .. truly truly truly hat-off ... outstanding beyond the expectation ... Keep doing Nitish and Keep this momentum !!!
Hi sir first upon thanking you for all your videos and teaching. Sir i request you please upload one detail video on YoloV8, it's file structure how to create our own files in this architecture.
By far the best explanation anywhere. I can't believe how great you are as a teacher, teaching things from such a fundamental level with this astonishing clarity is god gifted. And Nitish sir, you are a god's gift to people like us. I am utterly in awe of your acumen and more so your teaching skills. Because not every great mind is a good teacher, you are a great mind AND a great teacher. Thank you for everything 🙏
you are like a research book, covering every single detail with lot of patience. I always end up your videos.. I felt like each of your video is a combination of many books.. thank you so much to sharing your knowledge sir..
Sir, I was working on detr transformer and I was unable to visualize how attention works. your playlist was like light in this darkness. Thanks alot for such awesome teaching sir please tell me your approach while reading research papers.
Bht he zbrdst sir ek apky lectures he hai jo detailed me maths ky sth prhne me enjoy krta hu .... Is trh ki detailed video sir bnaty rhe hr wo topic me jo apko lgta hai prhana chahiye ab future ky millions of Ai Engineer ap pe dependent hai Love from 🇵🇰
At 30 min, He said gradients would be small for smaller values and larger for large values. Where i am confused , Softmax gradienit formaula is : pi*(1-pi). I am writing it short but for general class. this would hold. Now if something is reaching to 0.99 then (1-0.99) would be small and same when something is reaching 1% then (1-0.01) would be big. so in nutshell both prob would be small. right? Can someone explain it clearly why he said so?
respected sir i cant express how to appreciate you and your hard work ,you clear each and everything very broadly ,you r gem accept love respect from pakistan sir m below average student and i learn a lot from you just like you put everything in my mind
🎯 Key points for quick navigation: 00:00 *🎥 Introduction and Overview* - Introduction to the video series and continuation of the self-attention concept. - Emphasis on explaining the scaling concept in self-attention. - Mention of the conceptual depth and importance of understanding this concept. 01:05 *🧠 Recap of Previous Video* - Summary of the previous video on creating self-attention from first principles. - Explanation of generating embeddings for words and creating query, key, and value matrices. - Description of the dot product operations to obtain query, key, and value vectors. 03:18 *🔍 Applying Self-Attention* - Steps to apply self-attention using query, key, and value matrices. - Detailed process of dot product operations and applying softmax. - Final calculation of the contextual embedding. 04:46 *📊 Mathematical Formulation* - Compact mathematical representation of the self-attention process. - Explanation of transforming key matrices and applying softmax. - Final formula summarizing the attention calculation. 05:01 *📝 Comparison with Original Paper* - Discussion of the formulation developed versus the original "Attention is All You Need" paper. - Highlighting the difference in scaling the operation with the square root of the key dimension (Dk). - Introduction to the concept of scaled dot product attention and its importance. 06:49 *🔄 Need for Scaling in Attention* - Explanation of the need to scale in attention to avoid unstable gradients. - Introduction to the concept of Dk (dimension of the key vector). 09:55 *📐 Dimension Calculation* - Detailed explanation of calculating the dimension of key vectors. - Example scenarios to simplify understanding: dimensions could be 3, 10, or 512. - How embedding dimensions and matrix shapes affect the resulting dimensions. 11:06 *🧮 Dot Product Nature* - Explanation of why scaling by the square root of Dk is necessary, linked to the nature of the dot product. - Discussion on how the dot product operates between multiple vectors behind the scenes. - How matrix dot products consist of multiple vector dot products. 13:45 *📊 Variance in Dot Product* - Explanation of the variance in dot products and its dependence on vector dimensions. - Calculation of mean and variance with multiple dot products. - Low-dimensional vectors produce low variance; high-dimensional vectors produce high variance. 16:06 *🧮 Practical Examples* - Comparison of variance in low-dimensional vs. high-dimensional vectors. - Example of 2D and 3D vectors demonstrating variance differences. - High-dimensional vectors show greater variance, leading to potential issues. 18:23 *🔬 Experimental Proof* - Experiment demonstrating variance in dot products with varying vector dimensions. - Histogram plots showing variance spread for different dimensional vectors. - Higher dimension vectors result in larger variance, illustrating the scaling necessity. 22:00 *📈 High Variance Problem* - Explaining why high variance in dot product calculations is problematic. - High variance leads to significant differences in softmax outputs, creating large probability gaps. - Larger numbers get much higher probabilities, while smaller ones get very low probabilities, affecting training focus. 25:04 *🧮 Training Issues* - High variance affects backpropagation in neural networks. - Training focuses on correcting larger numbers, ignoring smaller numbers, leading to vanishing gradient problems. - Small gradients mean parameters do not update, hampering the training process. 26:15 *🏫 Classroom Analogy* - Analogy of a classroom with students of varying heights to explain training issues. - Taller students get more attention from the teacher, similar to larger numbers in training. - A class with similar height students leads to better overall learning, just like balanced variance leads to better training. 28:18 *🔢 Reducing Variance* - Discussing the importance of reducing variance in high-dimensional vectors for better training. - High variance in vectors leads to extreme probabilities in softmax, causing focus on large values and ignoring small ones. - The goal is to reduce variance so the training process distributes focus evenly. 30:21 *📏 Scaling for Variance Reduction* - Describing the technique of scaling to reduce variance in matrices. - Scaling the numbers in a matrix by a factor can reduce variance effectively. - The key challenge is determining the appropriate scaling factor for optimal variance reduction. 32:35 *🔍 Understanding Scaling Factor* - Introducing the concept of a scaling factor to control variance. - Explaining that the scaling factor needs careful consideration and mathematical understanding. - Focusing on the first row of the matrix to simplify the problem and then applying the solution to the entire matrix. 35:52 *📊 Calculating Population Variance* - Explanation on the need to calculate population variance instead of sample variance for accuracy. - Describing expected variance for potential new vector values. - Emphasizing the importance of considering all possible values in variance calculation. 38:02 *🧮 Variance with Increased Dimensions* - Exploring the effects of increasing vector dimensions on variance. - Demonstrating how adding dimensions increases variance. - Establishing that variance increases linearly with dimensions. 42:02 *📈 Linear Relationship of Variance and Dimensions* - Summarizing the linear relationship between dimension increase and variance. - Showing that as dimensions increase, variance also increases proportionally. - Confirming the mathematical quantification of variance growth with dimension expansion. 43:09 *📉 Maintaining Constant Variance* - Explanation on maintaining variance constant across dimensions. - Use of division by a specific factor to achieve consistent variance. - Introduction of a mathematical rule to support the variance adjustment. 44:43 *🔢 Mathematical Rule Application* - Detailed explanation of using a constant to scale and adjust variance. - Calculations showing how dividing by the square root of the dimension maintains variance. - Examples of applying the rule to different dimensions. 48:03 *🤔 Summary and Practical Application* - Summary of the scaling process to maintain variance. - Integration of the scaling step into the self-attention model. - Final formula for calculating attention in transformers using the scaling factor. Made with HARPA AI
Sir it's just a suggestion, but there are some viewers in your channel who have the mathematical background to understand things but basic conceptual idea is missing (common among university students) so if you need to go into more mathematical details please do because we can relate things better that way
Mathematical ground is sufficient, relevant, and well explained in this Tutorial. you are advised to go on separate videos on those Methamatical concepts that you could not understand.
if Softmax is so much sensitive towards size of dimensions, then why using (Dk)^1/2 before using softman (to normalise values) is not a standard process ?
Very nice explanation. Really enjoyed the Transformer lecture series. The teaching style surely improves the thinking approach, which is helpful when reading a research paper.
dear viewers, please please do hit like share subscribe this channel, Nistish sir kisi bhi vajeh se naraz ho ker video bnana bund na kerde. This channel is a Granth, it's a Bible, it's a Quran for all the students. I wish you have billion subscribers and trillion likes on each video.
Thank you!! Sir Can you try to make a video on LLM project because I understand the all concept but i don't know how to implement and how to write code using LLMs
Sir, we appreciate your efforts, and they are truly helpful for us. However, we have one issue to address. In your videos, you thoroughly explain all types of questions that arise in our minds about concepts, but there is a lack of focus on the actual code. For instance, I personally encountered this issue when learning from your videos on LSTM, GRU, Bidirectional LSTM, and their concepts. While I grasped the concepts well, I faced difficulties when it came to implementing them in coding. Therefore, it would be immensely beneficial if you could create some videos specifically focusing on coding, using different datasets. This would greatly assist us in applying the concepts practically. Thank you
Sir, we appreciate your efforts, and they are truly helpful for us. However, we have one issue to address. In your videos, you thoroughly explain all types of questions that arise in our minds about concepts, but there is a lack of focus on the actual code. For instance, I personally encountered this issue when learning from your videos on LSTM, GRU, Bidirectional LSTM, and their concepts. While I grasped the concepts well, I faced difficulties when it came to implementing them in coding. Therefore, it would be immensely beneficial if you could create some videos specifically focusing on coding, using different datasets. This would greatly assist us in applying the concepts practically. Thank you
Sir, we appreciate your efforts, and they are truly helpful for us. However, we have one issue to address. In your videos, you thoroughly explain all types of questions that arise in our minds about concepts, but there is a lack of focus on the actual code. For instance, I personally encountered this issue when learning from your videos on LSTM, GRU, Bidirectional LSTM, and their concepts. While I grasped the concepts well, I faced difficulties when it came to implementing them in coding. Therefore, it would be immensely beneficial if you could create some videos specifically focusing on coding, using different datasets. This would greatly assist us in applying the concepts practically. Thank you
Sir too well explained/. We want such kind of detail. This is the potential that makes you different from others. Could you keep it up? Every time get a solution from your channel sir.
why rook Dk ?? in our case Dq ,Dk and Dv are same but as you said they may be different then why we are using Dk to adjust variance, any particular reason unless Dq ,Dk and Dv are always same. Please shed some light
Sir this is really nice approach, I would love to learn the topics in this detailed, One request please increase the frequency of videos and complete this playlist
Hi @Nitish, All videos on Transformer are truly remarkable. RUclips pe bhut sare videos hai iss topic pe, but your's are GEM, Main yaha explanation ke liye he aaya tha, and deep down I am Satisfied. Keep making this type of videos with explanation, warna ajkal har koi short me just for view video bana deta hai. Truly appreciating your hard work .
It was an amazing explanation, so thorough and still so simple to understand such complex topic. I have followed courses from NPTEL, Stanford, Deep learning but yet this was the smoothest explanation! Your content is highly underrated. I wish I had found your channel sooner! Thanks 🙂
I always wanted a teacher who could explain the hard concepts simply and in detail. I have that kind of teacher like you. Thank you sir for this awesome and detailed video.
I wish I could like 👍 his videos unlimited times!!!!! This is a masterpiece series on DL and Language Models... Still the likes are less than 1000... You deserve a lot Nitish... #CampusX
Its Truly Amazing to experience how beautifully you make us understand the concepts and give us a crystal clear intuition.. Hats-off to your level of patience/calmness. That soothing explanations...🤌😲
My friend got interview question how VGP and EGP problem is solved in transformer.....This is my 3rd time I'm seeing this playlist...it is the one stop solution for DL ....Love u Nitish Bhai.....Love from Maharashtra
Great explanation, many coaching institutes teachers are copying your content and teaching students. They dont have there own contents and concepts clear. I dont think anyone can explain these concepts in such a easy manner. You are a great teacher!
I have seen many tutorials and explanation's of transformers and its architecture. I have never seen a detailed explanation like this in crisp and precise way. Thanks Nitish
In our NIT everyone following your channel for Data Science content even professors also
Thanks, sir🖤
which nit are you from btw
In future This playlist will be the most viewed playlist for deep learning
Yes
INDEED
yes
Easily,waiting for that time
it is criminal to have this level of explanation for free. 😍
CRIME*
ye zyda variance ko solve krne ki approach thi na ki koi timepass explanation , so its good ki aap ne sikha diya ye , aage kahi aur kam aa jayenga aisa hi kuch . . . 😃
Thank You Sir. You Are Great Teacher. my fav and best.
Sir, you're absolutely gem. Incredible!! This type of intuition behind scaling really hats of to you. because of this topic most of my variance and softmax related concepts got clear.. Thank you so much sir. I always recommend everyone to watch your playlists for Data Science/ML concepts.
No Sir This is Great Explanation Ever , the Content your are providing , I think No One is Providing this type of Content With Same Effort and Same Energy . And The Best Part is that you explanation makes the Topic Simple and Easy
Amazing Explanation Bhaiya, Hats Off To You.
Aap hame bhi is knowledge ka source bataiye na, Ye sara knowledge kaha se mila? Kosni books me diya hai ye itna details me? Please, Bhaiya, bohot help hogi hamari.
nice, I knew that we divide by root d to handle the softmax but didn't know the concept of why root d.
brilliant!
Maine is 20 days mai at least 5 se 6 bar refresh karke dekhti thi sir ne video upload kiya h kya
hi smriti you are learning dl ??
Sir, I want to buy dsmp 2.0 course and I'm from Bangladesh. Paypal isn’t available in our country. Can I pay through card? Or is there any other method? Please reply, I am very very interested in buying the course but I don't know how to make payment from Bangladesh for this course.
No one else can explain this concept this way. And if anyone does... he/she follows you. Please don't shorten content. We need this level.
Nitish!!! .. truly awesome!! .. outstanding!! ... remarkable !! ....rare to find a gem like you who illuminates the intricate world of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) but also makes it accessible to both novices and professionals alike is rare. Explanation of transformers with self-attention mechanisms is a standout. It's a concept that lies at the heart of many modern AI breakthroughs. This video is out of the world explanation .. why to divid QKV by root of d_k .. no one can explain this level of details .. truly truly truly hat-off ... outstanding beyond the expectation ... Keep doing Nitish and Keep this momentum !!!
ye banda... dhanda band karayega coaching valo ka.......to give this heavenly made content .....💯
Sir, one small request please if it is possible please increase the frequency of video upload. I was waiting your lecture video since last 20 days.
Hi sir first upon thanking you for all your videos and teaching. Sir i request you please upload one detail video on YoloV8, it's file structure how to create our own files in this architecture.
By far the best explanation anywhere. I can't believe how great you are as a teacher, teaching things from such a fundamental level with this astonishing clarity is god gifted. And Nitish sir, you are a god's gift to people like us. I am utterly in awe of your acumen and more so your teaching skills. Because not every great mind is a good teacher, you are a great mind AND a great teacher. Thank you for everything 🙏
thanku bhaiya bahut aache se concept bataya aapne
you are like a research book, covering every single detail with lot of patience.
I always end up your videos..
I felt like each of your video is a combination of many books..
thank you so much to sharing your knowledge sir..
If you can please upload transformer lectures this week .it will be helpful.
Sir, I was working on detr transformer and I was unable to visualize how attention works. your playlist was like light in this darkness. Thanks alot for such awesome teaching sir please tell me your approach while reading research papers.
Bht he zbrdst sir ek apky lectures he hai jo detailed me maths ky sth prhne me enjoy krta hu .... Is trh ki detailed video sir bnaty rhe hr wo topic me jo apko lgta hai prhana chahiye ab future ky millions of Ai Engineer ap pe dependent hai
Love from 🇵🇰
At 30 min, He said gradients would be small for smaller values and larger for large values. Where i am confused , Softmax gradienit formaula is : pi*(1-pi). I am writing it short but for general class. this would hold. Now if something is reaching to 0.99 then (1-0.99) would be small and same when something is reaching 1% then (1-0.01) would be big. so in nutshell both prob would be small. right?
Can someone explain it clearly why he said so?
jese jese apki video dekh rha hu bas dil se dua nikal rhi hai , sir kabhi comment nhi krta but apki video pe ruk nhi paata
this is hardwork
respected sir i cant express how to appreciate you and your hard work ,you clear each and everything very broadly ,you r gem accept love respect from pakistan sir m below average student and i learn a lot from you just like you put everything in my mind
🎯 Key points for quick navigation:
00:00 *🎥 Introduction and Overview*
- Introduction to the video series and continuation of the self-attention concept.
- Emphasis on explaining the scaling concept in self-attention.
- Mention of the conceptual depth and importance of understanding this concept.
01:05 *🧠 Recap of Previous Video*
- Summary of the previous video on creating self-attention from first principles.
- Explanation of generating embeddings for words and creating query, key, and value matrices.
- Description of the dot product operations to obtain query, key, and value vectors.
03:18 *🔍 Applying Self-Attention*
- Steps to apply self-attention using query, key, and value matrices.
- Detailed process of dot product operations and applying softmax.
- Final calculation of the contextual embedding.
04:46 *📊 Mathematical Formulation*
- Compact mathematical representation of the self-attention process.
- Explanation of transforming key matrices and applying softmax.
- Final formula summarizing the attention calculation.
05:01 *📝 Comparison with Original Paper*
- Discussion of the formulation developed versus the original "Attention is All You Need" paper.
- Highlighting the difference in scaling the operation with the square root of the key dimension (Dk).
- Introduction to the concept of scaled dot product attention and its importance.
06:49 *🔄 Need for Scaling in Attention*
- Explanation of the need to scale in attention to avoid unstable gradients.
- Introduction to the concept of Dk (dimension of the key vector).
09:55 *📐 Dimension Calculation*
- Detailed explanation of calculating the dimension of key vectors.
- Example scenarios to simplify understanding: dimensions could be 3, 10, or 512.
- How embedding dimensions and matrix shapes affect the resulting dimensions.
11:06 *🧮 Dot Product Nature*
- Explanation of why scaling by the square root of Dk is necessary, linked to the nature of the dot product.
- Discussion on how the dot product operates between multiple vectors behind the scenes.
- How matrix dot products consist of multiple vector dot products.
13:45 *📊 Variance in Dot Product*
- Explanation of the variance in dot products and its dependence on vector dimensions.
- Calculation of mean and variance with multiple dot products.
- Low-dimensional vectors produce low variance; high-dimensional vectors produce high variance.
16:06 *🧮 Practical Examples*
- Comparison of variance in low-dimensional vs. high-dimensional vectors.
- Example of 2D and 3D vectors demonstrating variance differences.
- High-dimensional vectors show greater variance, leading to potential issues.
18:23 *🔬 Experimental Proof*
- Experiment demonstrating variance in dot products with varying vector dimensions.
- Histogram plots showing variance spread for different dimensional vectors.
- Higher dimension vectors result in larger variance, illustrating the scaling necessity.
22:00 *📈 High Variance Problem*
- Explaining why high variance in dot product calculations is problematic.
- High variance leads to significant differences in softmax outputs, creating large probability gaps.
- Larger numbers get much higher probabilities, while smaller ones get very low probabilities, affecting training focus.
25:04 *🧮 Training Issues*
- High variance affects backpropagation in neural networks.
- Training focuses on correcting larger numbers, ignoring smaller numbers, leading to vanishing gradient problems.
- Small gradients mean parameters do not update, hampering the training process.
26:15 *🏫 Classroom Analogy*
- Analogy of a classroom with students of varying heights to explain training issues.
- Taller students get more attention from the teacher, similar to larger numbers in training.
- A class with similar height students leads to better overall learning, just like balanced variance leads to better training.
28:18 *🔢 Reducing Variance*
- Discussing the importance of reducing variance in high-dimensional vectors for better training.
- High variance in vectors leads to extreme probabilities in softmax, causing focus on large values and ignoring small ones.
- The goal is to reduce variance so the training process distributes focus evenly.
30:21 *📏 Scaling for Variance Reduction*
- Describing the technique of scaling to reduce variance in matrices.
- Scaling the numbers in a matrix by a factor can reduce variance effectively.
- The key challenge is determining the appropriate scaling factor for optimal variance reduction.
32:35 *🔍 Understanding Scaling Factor*
- Introducing the concept of a scaling factor to control variance.
- Explaining that the scaling factor needs careful consideration and mathematical understanding.
- Focusing on the first row of the matrix to simplify the problem and then applying the solution to the entire matrix.
35:52 *📊 Calculating Population Variance*
- Explanation on the need to calculate population variance instead of sample variance for accuracy.
- Describing expected variance for potential new vector values.
- Emphasizing the importance of considering all possible values in variance calculation.
38:02 *🧮 Variance with Increased Dimensions*
- Exploring the effects of increasing vector dimensions on variance.
- Demonstrating how adding dimensions increases variance.
- Establishing that variance increases linearly with dimensions.
42:02 *📈 Linear Relationship of Variance and Dimensions*
- Summarizing the linear relationship between dimension increase and variance.
- Showing that as dimensions increase, variance also increases proportionally.
- Confirming the mathematical quantification of variance growth with dimension expansion.
43:09 *📉 Maintaining Constant Variance*
- Explanation on maintaining variance constant across dimensions.
- Use of division by a specific factor to achieve consistent variance.
- Introduction of a mathematical rule to support the variance adjustment.
44:43 *🔢 Mathematical Rule Application*
- Detailed explanation of using a constant to scale and adjust variance.
- Calculations showing how dividing by the square root of the dimension maintains variance.
- Examples of applying the rule to different dimensions.
48:03 *🤔 Summary and Practical Application*
- Summary of the scaling process to maintain variance.
- Integration of the scaling step into the self-attention model.
- Final formula for calculating attention in transformers using the scaling factor.
Made with HARPA AI
Nitish Sir .... your knowledge is like MIT equivalent.. really outstanding !!
Sir it's just a suggestion, but there are some viewers in your channel who have the mathematical background to understand things but basic conceptual idea is missing (common among university students) so if you need to go into more mathematical details please do because we can relate things better that way
Mathematical ground is sufficient, relevant, and well explained in this Tutorial. you are advised to go on separate videos on those Methamatical concepts that you could not understand.
Your background knowledge for explanation is so amazing. Where did you study? Please complete the whole playlist.
if Softmax is so much sensitive towards size of dimensions, then why using (Dk)^1/2 before using softman (to normalise values) is not a standard process ?
This was good and easy. Also it was as fine as we dont have to remember it again,It will flow automatically....Thank you so much!!!
Aesi explanation mjhy kahin bhi nahi mili. Never stop explaining like this Nitish Sir. ❤❤
Very nice explanation. Really enjoyed the Transformer lecture series. The teaching style surely improves the thinking approach, which is helpful when reading a research paper.
Multiclass Classification Problem Kaise Handel kare? How Handel Multiclass Classification Problem Please Make a Video Very Important
This is the pinnacle of teaching. Lots of respect from the other side of the border.
dear viewers, please please do hit like share subscribe this channel, Nistish sir kisi bhi vajeh se naraz ho ker video bnana bund na kerde. This channel is a Granth, it's a Bible, it's a Quran for all the students. I wish you have billion subscribers and trillion likes on each video.
Thank you!! Sir Can you try to make a video on LLM project because I understand the all concept but i don't know how to implement and how to write code using LLMs
Sir, we appreciate your efforts, and they are truly helpful for us. However, we have one issue to address. In your videos, you thoroughly explain all types of questions that arise in our minds about concepts, but there is a lack of focus on the actual code. For instance, I personally encountered this issue when learning from your videos on LSTM, GRU, Bidirectional LSTM, and their concepts. While I grasped the concepts well, I faced difficulties when it came to implementing them in coding. Therefore, it would be immensely beneficial if you could create some videos specifically focusing on coding, using different datasets. This would greatly assist us in applying the concepts practically. Thank you
Sir, we appreciate your efforts, and they are truly helpful for us. However, we have one issue to address. In your videos, you thoroughly explain all types of questions that arise in our minds about concepts, but there is a lack of focus on the actual code. For instance, I personally encountered this issue when learning from your videos on LSTM, GRU, Bidirectional LSTM, and their concepts. While I grasped the concepts well, I faced difficulties when it came to implementing them in coding. Therefore, it would be immensely beneficial if you could create some videos specifically focusing on coding, using different datasets. This would greatly assist us in applying the concepts practically. Thank you
Sir, we appreciate your efforts, and they are truly helpful for us. However, we have one issue to address. In your videos, you thoroughly explain all types of questions that arise in our minds about concepts, but there is a lack of focus on the actual code. For instance, I personally encountered this issue when learning from your videos on LSTM, GRU, Bidirectional LSTM, and their concepts. While I grasped the concepts well, I faced difficulties when it came to implementing them in coding. Therefore, it would be immensely beneficial if you could create some videos specifically focusing on coding, using different datasets. This would greatly assist us in applying the concepts practically. Thank you
6:40 DK = Dinesh Kartik XD
Thank you so much for the explanation. I feel satisfied after understanding your explanation.
Thank yo Nitish sir, you have given an amazing explanation of these concepts. Appreciate your time and efforts into this. Will soon clear interviews.
Sir too well explained/. We want such kind of detail. This is the potential that makes you different from others. Could you keep it up? Every time get a solution from your channel sir.
"NITISH_SIR" Is All You Need ......#Master_Blaster of the Data Domain!!!
Sir you are good teacher. Long explanation is so good. I have understand how they worked in self attention
Very well taught. Thank you for putting in so much effort.
why rook Dk ?? in our case Dq ,Dk and Dv are same but as you said they may be different then why we are using Dk to adjust variance, any particular reason unless Dq ,Dk and Dv are always same. Please shed some light
We really do appreciate the details. Explanation of the intuition behind every little step is very useful in understanding the concept
Amazing!!! You are truly unbelievable Nitish Sir ❤❤
i think it is a wonderful explaination and whatever time you took to explain, it was amazing
Bro, these are the best videos I have seen to understand Attention mechanism. Thank you.
Kya bta diye batate batate sir......
Subject ko agar feel karna ho to aj jao Nitish sir ke pass.
Best explanation
thank you very much for this video..
but we need a project video by using transformer...
Too good sir, i loved the detailed explanation. Absolute banger video again.
Can see the passion of the teacher in teaching, particularly a concept which other ignore
Sir this is really nice approach, I would love to learn the topics in this detailed, One request please increase the frequency of videos and complete this playlist
Sir mujhe full course chaheye deep learning tak...kiya karun?
Sir please please increase the frequence of vidio upload. Atleast 1-2 in a week
lot of love and respect from pakistan sir
Hi @Nitish, All videos on Transformer are truly remarkable. RUclips pe bhut sare videos hai iss topic pe, but your's are GEM, Main yaha explanation ke liye he aaya tha, and deep down I am Satisfied. Keep making this type of videos with explanation, warna ajkal har koi short me just for view video bana deta hai. Truly appreciating your hard work .
Thank you so much for restarting this playlist
It was an amazing explanation, so thorough and still so simple to understand such complex topic. I have followed courses from NPTEL, Stanford, Deep learning but yet this was the smoothest explanation! Your content is highly underrated. I wish I had found your channel sooner! Thanks 🙂
There could'nt be a better playlist for Deep Learning.
I always wanted a teacher who could explain the hard concepts simply and in detail. I have that kind of teacher like you. Thank you sir for this awesome and detailed video.
Sir, have you any plan of making tutorials on reinforcement learning? I want to learn from you sir,
Truly, a marvelous effort and a superb way of explaining as you said.
The explanation is detailed. kindly keep the explanation always this detailed. thanks :)
sql mein kaya padh na parega data science ke liye please make a video ?? 🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏
WOW WOW WOW, BEST EXPLAINED VIDEO I HAVE EVER WATCHED ON RUclips!
Honestly, Your way of explanation is D best Sir!
♾🌟
in favor of this detailed explaination!, get to know very minute details!
Really want to watch this. Aik baar hi pora transformer parhain gae. pichla bhool jata agli video ane tak
I wish I could like 👍 his videos unlimited times!!!!! This is a masterpiece series on DL and Language Models... Still the likes are less than 1000... You deserve a lot Nitish... #CampusX
Perfect way of explanation, thanks, keep it up
Its Truly Amazing to experience how beautifully you make us understand the concepts and give us a crystal clear intuition..
Hats-off to your level of patience/calmness. That soothing explanations...🤌😲
Your explain was so awesome, can you upload videos with in 2- 3 days
Truly awesome! Eagerly waiting for more videos!
Great Video sir, After this video I understood that you are a very curious teacher sir!
sir self attention ka code likhne wala part bhi please krwa dijiye
aap detail main samjhate hain to ache se samaj aata hain sir
ye bahoot badhiya tarika hai. ap bahoot sahi patha rha ho. hame to ase hi pathana hai
😄😄😄apke videos to ek din students ko researcher bana dege
Can anyone tell which topics are left now in 100 days of DL?
My friend got interview question how VGP and EGP problem is solved in transformer.....This is my 3rd time I'm seeing this playlist...it is the one stop solution for DL ....Love u Nitish Bhai.....Love from Maharashtra
hi! I am going through this masterpiece video by Nitish sir(all my love to him), wanted to ask what do you mean by VGP and EGP here?
no words sir for your explanation, love you sir...
Great explanation, many coaching institutes teachers are copying your content and teaching students. They dont have there own contents and concepts clear.
I dont think anyone can explain these concepts in such a easy manner. You are a great teacher!
Extraordinary….no one can do like this…did like this
Detailed approach develops creator for next ai model.
I have seen many tutorials and explanation's of transformers and its architecture. I have never seen a detailed explanation like this in crisp and precise way.
Thanks Nitish
Superb Explanation .... keep it detailed :)
Thanks a lot SIr for this. You are amazing.
Good .. this detailed way of Explanation is good
Sir can you do separate course for mloops and deep learning?
Superb explanation. Loved the video.
great video sir,best explanation this is wanted
Great explanation . Thank you Sir .
other (AI,ML,DL) teacher ❌ nitish sir ✅
this type of explaination we need to make our concept strong.. thank you sir...
good explanation, I need same kind of explanations.