Tutorial 26- Linear Regression Indepth Maths Intuition- Data Science

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии • 325

  • @mohitpatel7876
    @mohitpatel7876 4 года назад +33

    Best explanation of cost function, we learned it as masters students and the course couldnt explain it as well.. simply brilliant

  • @nandinibalyapally3388
    @nandinibalyapally3388 4 года назад +94

    I never understood what is a gradient descent and a cost function is until I watch this video 🙏🙏

  • @navjotsingh8372
    @navjotsingh8372 2 года назад +3

    I have seen many teachers explaining the same concept, but your explainations are next level. Best teacher.

  • @anuragmukherjee1878
    @anuragmukherjee1878 2 года назад +31

    For those who are confused.
    The convergence derivative will be dJ/dm.

    • @tusharikajoshi8410
      @tusharikajoshi8410 Год назад

      what's J in this? Y values? I'm super confused about this d/dm of m, cz it would be just 1. and m I think is just total number of values. Shouldn't the slope be d/dx of y?

    • @mdmynuddin1888
      @mdmynuddin1888 Год назад

      @@tusharikajoshi8410 it will be the cost or loss (J)

    • @mdmynuddin1888
      @mdmynuddin1888 Год назад +2

      new(m) = m- d(loss or cost)/dm * Alpha(learning rate.

    • @suhasiyer7317
      @suhasiyer7317 Год назад

      Super helpful

    • @threads25
      @threads25 Год назад

      I'dont think because it netwons method actually

  • @soumikdutta77
    @soumikdutta77 2 года назад +4

    Why am I not surprised with such a lucid and amazing explanation of cost function, gradient descent,Global minima, learning rate ...may be because watching you making complex things seems easy and normal has been one of my habit. Thank you SIR

  • @manikaransingh3234
    @manikaransingh3234 4 года назад +34

    I don't see a link on the top right corner for the implementation as you said in the end.

  • @pjanjanam
    @pjanjanam 3 года назад +1

    A small comment at 17:35. I guess it is Derivative of J(m) over m. In other words, the rate of change of J(m) over a minute change of m. That gives us the slope at instantaneous points, especially for non linear curves when slope is not constant. At each point of "m, J(m)", Gradient descent travels in the opposite direction of slope to find the Global minima, with the smaller learning rate. Please correct me if I am missing something.
    Thanks for a wonderful video on this concept @Krish, your videos are very helpful to understand the Math intuition behind the concepts, I am a super beneficiary of your videos, Huge respect!!.

  • @ayurdubey4818
    @ayurdubey4818 2 года назад +9

    The video was really great. But I would like to point out that the derivative that you took for convergence theorem, there instead of (dm/dm) it should be derivative of cost function with respect to m . Also a little suggestion at the end it would have been helpful, if you mentioned what m was, total number of points or the slope of the best fit line. Apart from this the video helped me a lot hope you add a text somewhere in this video to help the others.

  • @shubhamkohli2535
    @shubhamkohli2535 4 года назад

    Really awesome video , so much better than many famous online portals charging huge amount of money to teach things.

  • @akrsrivastava
    @akrsrivastava 4 года назад +29

    Hi Krish, Thanks for the video. Some queries/clarifications required:
    1. We do not take gradient of m wrt m. That will always be 1. We take the gradient of J wrt m
    2. If we have already calculated the cost function J at multiple values of m, then why do we need to do gradient descent because we already know the m where J is minimum
    3. So we start with an m , calculate grad(J) at that point and update m with m' = m - grad(J)* learn_rate and repeat till we reach some convergence criteria
    Please let me know if my understanding is correct.

    • @slowhanduchiha
      @slowhanduchiha 4 года назад

      Yes this is correct

    • @vamsikrishna4107
      @vamsikrishna4107 4 года назад

      I think we have to train the model to reach that min. loss point while performing grad. descent in real life problems.

    • @shreyasbs2861
      @shreyasbs2861 4 года назад

      How to find best Y intercept ?

  • @PritishMishra
    @PritishMishra 4 года назад +1

    I knew that their will be an Indian that can make all the stuffs easy !! Thanks Krish

  • @tarunsingh-yj9lz
    @tarunsingh-yj9lz Год назад

    Best video on youtube to understand the intution and math(surface level) behind Linear regression.
    Thank you for such great content

  • @dhainik.suthar
    @dhainik.suthar 3 года назад +4

    This maths is same as coursera machine learning courses
    Thank you sir for this great content ..

  • @mohitpatel7876
    @mohitpatel7876 4 года назад +6

    At 14:56, how do we decide how many slope values to try? and how about selecting intercepts in a certain range?..

    • @ruchit9697
      @ruchit9697 4 года назад

      The trials of slope selections go until the cost function reaches the local minima point ....and for intercept there are some random initialization techniques through which a fixed value is set for intercept....

  • @RJ-dz6ie
    @RJ-dz6ie 4 года назад +4

    How can I not say that you are amazing !! I was struggling to understand the importance of gradient descent and u cleared it to me in the simplest way possible.. Thank you so much sir :)

  • @nanditagautam6310
    @nanditagautam6310 3 года назад +1

    This is the best stuff i ever came across on this topic !

  • @mayureshgawai5951
    @mayureshgawai5951 4 года назад

    No one can find easiest explanation of gradient descent on youtube. This video is the exception.

  • @animeshkoley6478
    @animeshkoley6478 3 года назад +2

    Best explanation of Linear Regression🙏🙏🙏.Simply wow🔥🔥

  • @padduchennamsetti6516
    @padduchennamsetti6516 4 месяца назад

    you just made the whole concept clear with this video,you are a great teacher

  • @priyanshusharma2516
    @priyanshusharma2516 3 года назад +4

    Watched this video 3 times back to back .Now its embaded in my mind forever. Thanks Krish , great explanation !!

  • @varshadevgankar8242
    @varshadevgankar8242 4 года назад +3

    sir i can' find the simple regression and multiple regression video as u said and some videos are little jumbled its getting difficult
    to follow the videos and plz do explain the functionalities of each and every keyword or a inbuilt function when ur explaining the code...ofcourse ur explaining in a very good way but i faced a liitle problem while folllowing that practical implementation of univariate,multivariate,and bivariate analysis(there you have used FACETGRID function)..so will u plz expalin me what is the exact use of facetgrid...?

  • @annapurnaparida7655
    @annapurnaparida7655 3 года назад

    So beautifully explained...did not find anywhere this kind of clarity....keepnup the good work....

  • @varungupta2727
    @varungupta2727 5 лет назад +43

    Similar to Andrew NG course from coursera kind of revision for me 😊😊

    • @Gayathri-jo4ho
      @Gayathri-jo4ho 4 года назад

      Can you please suggest me how to begin with in order to learn machine learning

    • @Gayathri-jo4ho
      @Gayathri-jo4ho 4 года назад

      @@ArpitDhamija did you have knowledge on machine learning??if so, please suggest me I saw so many but I couldnt able to .

    • @shhivram929
      @shhivram929 4 года назад +2

      @@Gayathri-jo4ho This playlist itself is a fantastic place to start, Or can enroll in this course "Machine Learning A-Z by krill eremenkrov" in udemy. The course will give you an intuitive understanding of the ML Algorithms. Then it's up to you to research and study the math behind each concept..Reff (kgnuggets, Medium, MachineLearningplus and lot more)

    • @Gayathri-jo4ho
      @Gayathri-jo4ho 4 года назад

      @@shhivram929 thank you

    • @sarithajaligama9548
      @sarithajaligama9548 4 года назад

      Exactly. This is the equivalent of Andrew Ng's description

  • @Cricketnews-ek5fy
    @Cricketnews-ek5fy 3 года назад

    In 22:50 time sir said when it reaches to global minima the slope value will be 0 and And the value of m will be considered for best fit line but the value of slope and m is same.Please clear doubt @krishan Naik sir

  • @azizahmad1344
    @azizahmad1344 3 года назад +2

    Such a great explanation of gradient descent and convergence theorem.

  • @rezafarrokhi9871
    @rezafarrokhi9871 3 года назад +5

    Thanks for all great prepared videos, I think you meant (deriv.J(m) / deriv(m)) at 17'.45", is it correct?

  • @moulisiramdasu6753
    @moulisiramdasu6753 4 года назад

    Really thanks you krish.
    you just cleared my doubts on cost function and gradient descent. First I saw Andrew Ng class but have few doubts after seeing you video. Now its crystal clear..
    Thank You...

  • @chimadivine7715
    @chimadivine7715 3 месяца назад

    Now I understand what GD means. Thanks always, Krish

  • @ahmedbouchou6893
    @ahmedbouchou6893 5 лет назад +3

    Hi . Can you please do a video about the architecture of machine learning systems in real world . How does really work in real life .for example how hadop (pig,hive) , spark, flask , Cassandra , tableau are all integrated to create a machine learning architecture. Like an e2e

  • @ankitchauhan6629
    @ankitchauhan6629 4 года назад +1

    What about the C (intercept) value? how does the algorithm selects the C value?

  • @skviknesh
    @skviknesh 4 года назад +1

    Great! Fantastic! Fantabulous! tasting the satisfaction of learning completely - only in your videos!!!!!

  • @python_by_abhishek
    @python_by_abhishek 3 года назад +7

    Before watching this video I was struggling with the concepts exactly like you were struggling in plotting the gradient descent curve. ☺️Thanks for explaining this beautifully.

  • @pradeepmallampalli6510
    @pradeepmallampalli6510 3 года назад

    Thank you Soo much Krish. No where I could find such a detailed explanation
    You made my Day!

  • @anuragbhatt6178
    @anuragbhatt6178 4 года назад +1

    The best I've come across on gradient descent and convergence theorem

  • @PankajMishra-ey3yh
    @PankajMishra-ey3yh 3 года назад +2

    I think in the Convergence theorem part, the derivative should be d(J(m))/d(m), as in a y-x graph, we take derivative of y wrt x. Here our Y is J(m) and X is m.

  • @tezzbhandari3725
    @tezzbhandari3725 2 года назад

    The graph of the cost function is not gradient descent. The automatic differentiation of cost function with respect to m is gradient decent which is used to update the m.

  • @arunsundar489
    @arunsundar489 5 лет назад +6

    Please add the indepth math intution of other algorithms like logistic, random forest, support vector and ANN.. Many Thanks for the clearly explained abt linear regression

  • @shhivram929
    @shhivram929 4 года назад +3

    Hi krish, that was an awesome explanation of Gradient Descent. With respect to finding the optimal slope.
    But in linear regression both slope and the intercept are tweakable parameters, how do we achive the optimal intercept value in linear regression.

  • @SaroashRahil
    @SaroashRahil 10 месяцев назад

    the only video that made gradient descent so simple that even 2nd grade students woud understand

  • @koushikkumar4938
    @koushikkumar4938 3 года назад +6

    Implementation part:
    Multiple linear Regression - ruclips.net/video/5rvnlZWzox8/видео.html
    Simple linear Regression - ruclips.net/video/E-xp-SjfOSY/видео.html

  • @shailesh1981able
    @shailesh1981able 2 года назад

    Awesome!! Cleared all doubts seeing this video! Thanks alot Mr. Krish for creating indepth content on such subject!

  • @V2traveller
    @V2traveller 4 года назад

    every line you speak..so much important to understand ths concept......thank u

  • @madhavilathamandaleeka5953
    @madhavilathamandaleeka5953 3 года назад +1

    At 22:12 , why slope will be 0 ...At global minimum slope is 1 and cost function ( absolute square error) is 0 .... Isn't it sir ? Excuse , if I'm wrong .🙏

    • @sravanram9453
      @sravanram9453 2 месяца назад

      I’m also having the same doubt.

  • @shan5612
    @shan5612 4 года назад +2

    Great,but not able to find the link for how to implement in python,plz awaiting for your valuable reply.

  • @Crecky_21
    @Crecky_21 3 года назад

    Sir, what if our problem statement does not reach the global minima when C=0 is considered.
    And how will our algorithm come to know that the C=0 condition is not sufficient for the best fit line?

  • @rahuljindal3683
    @rahuljindal3683 4 года назад

    For different independent variables, we would have that many gradient descent. Individually, using convergence theorem, we would get global minimum, but how we are going to find the best fit combining all???

  • @supervickeyy1521
    @supervickeyy1521 4 года назад

    i knew the concept of Linear Regression but didn't know the logic behind it.. the way Line of Regression is chosen. Thanks for this!

  • @pranitaumarji5224
    @pranitaumarji5224 5 лет назад +4

    Thankyou for this awesome explanation!

  • @FaizanKhan-fn6ew
    @FaizanKhan-fn6ew 5 лет назад +1

    Thanq so much for all your efforts.... Knowledge, rate of speech and ability to make thing easy are nicest skill that you hold...

  • @kannanparthipan7907
    @kannanparthipan7907 4 года назад +2

    Why 2m in place of m in cost function calculation... Pls explain

    • @subhamnagar7794
      @subhamnagar7794 4 года назад

      you can write m also, authors prefer 2m because when you find the derivative the 2 gets cancelled

  • @tanjibsiddique2334
    @tanjibsiddique2334 2 месяца назад

    Hi krish, why we divide the cost function /2 also however in mse formula we just divide the number of data points i.e m

  • @w3r161
    @w3r161 9 месяцев назад

    Thank you my friend, you are a great teacher!

  • @durgeshmishra9449
    @durgeshmishra9449 3 года назад

    @ 17:34 it should have been d/dm (J(m))?

  • @salilmadhav9109
    @salilmadhav9109 2 года назад

    @krish Naik
    What will happen if it is a local minima ( for different equation)

  • @marwinsolomon9511
    @marwinsolomon9511 3 года назад

    What is the significance of (1/2n), n is no. of data points in the cost function?

  • @kevinsusan3345
    @kevinsusan3345 4 года назад +2

    I had so much difficulty in understanding gradient descent but after this video
    It's perfectly clear

  • @FaizanKhan-fn6ew
    @FaizanKhan-fn6ew 5 лет назад +7

    I am working in some company with bpm domain... I have no idea about programming but some how I manage to create interrest in ML... The best part is I just want to learn it to enhance my knowledge and I m ready to work for free... If you can suggest something will help...

  • @meetbardoliya6645
    @meetbardoliya6645 2 года назад

    Value of the video is just undefinable! Thanks a lot :)

  • @nitishkeshri2378
    @nitishkeshri2378 3 года назад

    Do we need to consider intercept value as zero initially, if not then how to proceed further.

  • @nagarjunakanneganti5953
    @nagarjunakanneganti5953 4 года назад

    Why does considering intercept C will have you draw 3d plots?

  • @9902152322
    @9902152322 2 года назад

    god bless you too sir, explained very well. basics helps to grow high level understanding

  • @rakeshenjapuri3143
    @rakeshenjapuri3143 4 года назад

    why we are using cost function and gradient sir what is the concluson
    ? can we apply as well multilinear and logistic regression also?

  • @ishanthakur3315
    @ishanthakur3315 4 года назад

    You are taking derivative of cost function w.r.t. m in convergence theorem? Please reply!

  • @nidhimehta9278
    @nidhimehta9278 3 года назад

    Best video on theory of linear regression! Thankyou soo much Krish!

  • @jamesrobisnon9165
    @jamesrobisnon9165 3 года назад +1

    Dear Krish: At 14:42' you mention that curve is called gradient descent. I believe this is not true. Gradient descent is not the name of that curve. Gradient descent is an optimization algorithm.

  • @ngarwailau2665
    @ngarwailau2665 3 года назад

    Your explanations are the clearest!!!

  • @guptarohyt
    @guptarohyt 2 года назад

    Great explanation, how to figure out which direction to move?

  • @B.D.M1999
    @B.D.M1999 5 месяцев назад

    how slope can be minimized of a given dataset ? can you make a vedio of a practical all calculation doing things like make a slope lesser or bigger?

  • @karthiavenger4577
    @karthiavenger4577 4 года назад

    Yaar you nailed it man after watching sooo many videos i had some Idea , By Finishing your Video now i m completely clear 😍😍😍😍

  • @nagarjunakanneganti5953
    @nagarjunakanneganti5953 4 года назад

    And statistical regression analysis is different from Machine learning (gradient descent) estimation right?

  • @nurali2525
    @nurali2525 3 года назад

    This guy was born to teach

  • @vishnuppriya5263
    @vishnuppriya5263 Год назад

    Really great sir. I very much thank you sir for this clear explanation

  • @Karthik-s4y5f
    @Karthik-s4y5f Год назад

    Finally I understood the perfect answer of gradient descent..

  • @walidhossain3655
    @walidhossain3655 2 года назад

    14:11 Gradient Descent

  • @arrooow9019
    @arrooow9019 3 года назад

    Oh my gosh this is awesome tutorial I ever seen God bless you sir🤩🤩

  • @YogeshKumar-ye8nd
    @YogeshKumar-ye8nd 3 года назад

    initially How would I decide the m value of m ???

  • @scifimoviesinparts3837
    @scifimoviesinparts3837 3 года назад

    Why multiply the cost function with 1/2 ? I mean, what's the need of 1/2 in cost function : 1/2m * sum( y_ - y )^2

  • @123man123man1
    @123man123man1 11 месяцев назад

    Thank you for sharing this insightful video about linear regression. While I found it informative, I'm uncertain about how it addresses the challenge of avoiding local minima. I'd greatly appreciate it if you could provide some insights on this aspect as well.

  • @ravithakur0041
    @ravithakur0041 4 года назад

    In the convergence theorem eqn, m = m - (dm/dm)* alpha . How dm/dm is the slope of J(m) vs m curve ?? slope should be dJ(m)/dm rather than dm/dm.

    • @niladribiswas1211
      @niladribiswas1211 4 года назад

      it is that thing only, probably he should make m' in convergence th. to avoid confussion with m......:D

  • @kartikshrivastava1500
    @kartikshrivastava1500 4 года назад +3

    17:33 Shouldn't it be d(costFunc(m)) / d(m) ?

  • @Neuraldata
    @Neuraldata 4 года назад +1

    We would also recommend your videos to our students!

  • @kunaltibrewal2881
    @kunaltibrewal2881 5 лет назад +2

    It would be great if you could suggest some best books for python programming?

  • @askshivansh
    @askshivansh 3 года назад

    Sir, In most of the cases C will be not zero, then how will we find the value of C.
    Will we find value of C using gradient Descent?

  • @kits1111
    @kits1111 Год назад

    as c and m both are changing , shouldn't covergence theorem have rate of change of c also ?

  • @sravanram9453
    @sravanram9453 2 месяца назад

    Small correction: @22:25 instead of slope as 0, it should be Cost function as 0. Correct me if I’m wrong…

  • @aayushsuman4592
    @aayushsuman4592 9 месяцев назад

    Thank you so much, Krish!

  • @PRASHANTSHARMA-ev7rr
    @PRASHANTSHARMA-ev7rr 5 лет назад +4

    Hi Sir, I am from cloud & DevOps background is it make sense to go & learn Ml AI, what path I can follow to become a dataops engineer or devops ml ai engineer.

  • @dhruv1324
    @dhruv1324 Год назад

    never found a better explaination

  • @dsc40sundar18
    @dsc40sundar18 Год назад +1

    H i sir great content and a big fan of your work let me ask a doubt in cost function many books or blogs takes the cost function as 1/NSUMATION( Y - Y^) BUT you used 1/2N SUMATION( Y - Y^) so i was bit confused in that part and tq u for wonderful content thnak you so much sir

  • @TheBala7123
    @TheBala7123 2 года назад

    Excellent explanation sir. I have started following your videos for all the ML related topics its very interesting.
    One doubt = In Gradient Descent, when slope is zero, M value will be considered as the slope of the best file line. I do not understand this. Can you please explain here? Thanks.

  • @palashmoon3808
    @palashmoon3808 5 лет назад +1

    hi Krish, What if we have many local minima and then a global minima. in that case how will the convergance theorm will work?

    • @krishnaik06
      @krishnaik06  5 лет назад +1

      Check my complete deep learning playlist

  • @SanjeevKumar-dr6qj
    @SanjeevKumar-dr6qj Год назад

    Great sir. Love this video

  • @rambaldotra2221
    @rambaldotra2221 3 года назад

    Thank You Sir, You have explained everything about gradient Descent in the best possible easiest way !!

  • @subrahmanyamkv8168
    @subrahmanyamkv8168 5 лет назад +6

    Sir..thanks for the explanation.should the coefficient be derivative of loss function w.r.t m?

  • @aritra8820
    @aritra8820 2 года назад

    when you are writing convergence theorm it should be m - d(j(m))/dm * alpha

  • @mahikhan5716
    @mahikhan5716 3 года назад

    krish u are saying here discussed details about simple linear regression in ur previous videos but the previous one actually regarding PDF and CDF . is the playlist is sorted ??

  • @nayanjain3594
    @nayanjain3594 3 года назад

    Hi Krish, how to calculate the intercept value as in this we have initialized it to 0 and we have not calculated at the end. We have calculated only slope of best fit line.

  • @shaktirajsinhzala4588
    @shaktirajsinhzala4588 4 года назад

    Sir,there is no playlist of this series where can I found that? About cdf,pdf...

  • @adiflorense1477
    @adiflorense1477 4 года назад

    17:04 Sir, why every machine learning model looks for global minimums instead of local minimums?

    • @abhishek_maity
      @abhishek_maity 4 года назад +3

      See global minimum is nothing but the smallest region .....so suppose you are standing in a hilly areas so there are many ups and downs ....(consider these small downs as your local minima) but at that hilly area one point will be the lowest one which will be much lower than all your other smaller ups and downs .....so this lowest region is known as global minima ....and the main aim of your algorithm is to converge at the lowest point .....as low as possible ....hence we consider global minimum ...
      I hope you got your doubts clear? 🙂

    • @adiflorense1477
      @adiflorense1477 4 года назад

      @@abhishek_maity Noted

  • @priyankachoubey4570
    @priyankachoubey4570 3 года назад

    As always Krish very well explained!!