The Beauty of Linear Regression (How to Fit a Line to your Data)

Поделиться
HTML-код
  • Опубликовано: 25 ноя 2022
  • In this video, we'll explore the concepts surrounding linear regression. Linear regression is very useful in math, science, and engineering, and is a gateway to other kinds of regression, and optimization problems in general.
    Download the Linear Regression Example Code here: pastebin.com/7cgh951s
    Thanks to fesliyanstudios.com for the background music! :)

Комментарии • 256

  • @RichBehiel
    @RichBehiel  Год назад +42

    Hi everyone, this video has been getting a lot of views lately so I just wanted to say thank you, and I really appreciate all the positive feedback. It’s great to see such a positive response, and I’m glad that so many people are enjoying linear regression! :)
    I also appreciate the constructive criticism! A few of you have pointed out that the music is distracting, the motion is too repetitive, and the pace is a bit slow. I didn’t see that when posting the video, but I can totally see where you’re coming from, so I’ll definitely take that into account when making future videos. This was one of my earlier videos and I was still figuring things out. So I really appreciate your feedback, and I hope these videos will get better over time.

    • @myetis1990
      @myetis1990 Год назад +2

      You are not only teaching math stuff but also teaching how to think, thank you very much for the great video.
      Really inspiring, glad i discovered this channel, waiting for the videos about jacobian , translation , rotation, quaternions

    • @ehfik
      @ehfik 9 месяцев назад

      the constant animation loop gets a bit annoying. reversing, stopping and changing the animation from time to time would be a solution (and your newer videos are even better anyway!)

    • @RichBehiel
      @RichBehiel  9 месяцев назад +1

      I agree. Honestly I look back on this video and cringe at a few of the details, like how the animation loop goes on and on and is a bit nauseating, and music is too loud. But you live and learn! 😅 When I first started making these videos I really had no idea what I was doing.

    • @phenixorbitall3917
      @phenixorbitall3917 7 месяцев назад

      @RichBehiel 18:19 on the left hand side you used Laplace Symbol instead of Nabla Symbol.
      But except that => great video! 👍

    • @atticmuse3749
      @atticmuse3749 3 месяца назад +1

      With regards to pacing, I want to say that I really enjoy your general presentation style. You're not simply reading a script and getting the perfect take, you're actually doing a "live" presentation and I really appreciate the way you ad lib or go off on little tangents. I burst out laughing in your buoyancy video when you read the integral "zndS" phonetically.

  • @patricktanoeyjaya4430
    @patricktanoeyjaya4430 Год назад +69

    I really love how calmly you speak and how the lines you say feel unscripted. Makes it feel very personal.
    You also speak so clearly and concisely. I was able to get the gist of this with only high school calculus!
    This is making me like math again.

    • @RichBehiel
      @RichBehiel  Год назад +2

      I’m very glad to hear that! :)

  • @user-pw5do6tu7i
    @user-pw5do6tu7i Год назад +20

    unbelievably crisp explanation of gradient decent. It is remarkable to see it play out in those dimensions. Thank you

    • @whannabi
      @whannabi Год назад +2

      And he repeats the animation so we can assimilate what's going on instead of quickly switching to the next thing. Very relaxed explanation which is nice.

  • @matteokimura1449
    @matteokimura1449 Год назад +28

    Another beautiful way to get a linear regression formula is to take the vector space of all real-valued functions that are defined for the x values, choose the hypothetical ideal function that maps all of the x's to their y's, and orthogonally project that hypothetical function onto the subspace of linear functions. By defining the inner product as the cartesian dot product between the output of the functions due to the x values, you'll see that the distance the projection minimizes is the error between the linear function and ideal function.

  • @TheRiverNyle
    @TheRiverNyle Год назад +69

    As an Applied Math (Stats/Probability Theory focused) major, this really got me excited!

  • @andreiimbru6835
    @andreiimbru6835 Год назад +16

    As an Econ Major, you have no idea how much this helped me understand the behind the scenes of regression lines and everything I've done in Statistics this semester, I've learned soo many new techniques with equation manipulation so, thank you!

  • @zeyogoat
    @zeyogoat Год назад +9

    A rare video that's technically adept and, most importantly, not condescending or pedantic! Well done, from a chemist and educator =)

  • @tommyproductions891
    @tommyproductions891 Год назад +12

    great video! I love how at the start you explain the equation of a straight line and by the end it's multivariable vector calculus

  • @mroygl
    @mroygl Месяц назад

    This is a piece of art, a captivating blend of deep understanding of the matter, beauty of plain graphics, voice acting, matrices, and "simple" software.

  • @simonleonard5431
    @simonleonard5431 Год назад +4

    Thank you! I've been playing with a spherical geometry problem and there's so much I've forgotten from my school days. This video reminded me of so many things, including ways of expanding my approaches to problem solving. Brilliant 👌

  • @berndkopera7723
    @berndkopera7723 Год назад +7

    Absolutely beautiful visualization! Simple, smart and intuitive.

  • @MattHudsonAtx
    @MattHudsonAtx 2 месяца назад +1

    I saw the calculus approach coming a mile away but it's great to see the linear algebra done so clearly. I need to take that again.

  • @johnstuder847
    @johnstuder847 7 месяцев назад +3

    Thank you! This is definitely one of RUclipss math gems! Ties so many ideas together. I would love for you to do a video on Fourier Epicycles. For reference, GoldPlatedGoofs ‘Fourier for the rest of us’ is a great starting point. I’m sure you could do a beautiful refined version showing how the Inner Product, Fourier, QM, function spaces and Art all come together in a beautiful way.
    Thank you so much for your sharing your videos!

    • @RichBehiel
      @RichBehiel  7 месяцев назад

      Thanks for the kind comment, John! :) I touch on Fourier analysis in my upcoming video on relativistic QM, the Klein-Gordon equation. Hoping to upload it within a week.

  • @M.KRISHNAKANTACHARY
    @M.KRISHNAKANTACHARY 2 месяца назад +1

    Thanks a lot for clearly explaining the concept of fitting a linear regression so beautifully.

  • @Liberty5_3000
    @Liberty5_3000 Год назад +24

    It's so beautiful! Thank you a lot! I hope your channel is gonna grow fast soon

  • @ivopfaffen
    @ivopfaffen Год назад +2

    Sooo cool! As a cs major struggling with a numerical analysis class, this helped me understand linear regression so much better.
    Thanks man!

  • @atticmuse3749
    @atticmuse3749 3 месяца назад +1

    12:16 "it should keep you up at night"
    Very apropos considering it's almost 4:30 am right now and I've been watching your videos for hours 😅

  • @Ayesha_F
    @Ayesha_F Год назад +3

    Oh this was so SATISFYING! I don't think i have ever seen regression explained this way. It's like parts of how i understand it, is being so wonderfully articulated by someone who obviously knows the subject matter well. I have had to teach myself mathematics and statistics, and I've always been drawn to this intuitive and philosophical way of understanding it. Thank you for this!

    • @RichBehiel
      @RichBehiel  Год назад +2

      Thanks for the kind comment, and I’m glad you enjoyed the video! :)

  • @ehfik
    @ehfik 9 месяцев назад +1

    this was SO satisfying! hope to see many more explanations, such a great execution!

  • @jiadong2246
    @jiadong2246 Год назад +1

    Great work! Thank you, and I'm looking forward to your linear regression and gradient decent videos you mentioned at the end of the video

  • @sujalgvs987
    @sujalgvs987 Год назад +4

    I absolutely loved this video. Please do more videos on regression and machine learning as a whole.

  • @TheScepticalChymist
    @TheScepticalChymist Год назад +1

    I cannot finish the video because your voice is SO charming and comforting and makes me feel so safe, I just cannot pay attention in the maths

  • @enricolucarelli816
    @enricolucarelli816 3 месяца назад +1

    Wow! This is perfection explaining/visualizing complexity and its beauty! ❤❤❤❤ 👏👏👏👏👏

  • @xxge
    @xxge Год назад +2

    Great video! Coming from a linear algebra heavy background I still think taking the singular value decomposition of X, inverting it, and multiplying by y to find b is a more elegant and simple approach especially for multiple linear regressions, but I imagine if you have more experience with physics this approach would be more familiar and easier to digest. Keep these videos coming!

  • @anthonyrojas9989
    @anthonyrojas9989 Год назад +2

    This was amazing! So fun to watch and appreciate this concept.

    • @RichBehiel
      @RichBehiel  Год назад

      Thanks, glad you enjoyed the video! :)

  • @dadamczyk
    @dadamczyk Год назад +3

    Great video! With those animations it would be wonderful to see an essay about bayesian linear regression since it is quite different and powerful approach to similar topic.

  • @benwinstanleymusic
    @benwinstanleymusic Год назад +1

    Really enjoyed this, you're great at explaining stuff

  • @tesstera
    @tesstera Год назад +1

    Amazing! Thanks for showing us how to solve a Maths problem in a Physics way. Even though this method has been used in nowadays AI already, it is still very interesting to see it works outside AI. The conceptual journey you taken reminds my trial on machine proving, or ATP; and helps most to eliminate the intimidation of numerical analysis. Thanks!

  • @davidandrewthomas
    @davidandrewthomas Год назад +1

    This is beautifully put together! What a great explanation!

  • @jwilliams8210
    @jwilliams8210 Год назад +2

    Fantastic presentation!

  • @Aziqfajar
    @Aziqfajar Год назад +1

    This is beautifully explained and visualized! I'm glad to be on the first wagon for the ride of this video.

    • @RichBehiel
      @RichBehiel  Год назад +1

      Thanks, I’m glad you liked the video! It’s one of my favorite mathematical concepts, so it’s great to see others enjoying it too :)

  • @alexkushnir8073
    @alexkushnir8073 Год назад +1

    Cool music Richard, it opens my mind and makes me understand things better! It's like combining hypnosis and a class;-) I wish my math teacher at school would have explained it to us in that way 🙂

  • @Cristi4n_Ariel
    @Cristi4n_Ariel Год назад +1

    This was interesting! Thanks for sharing.

  • @levimillerfandom
    @levimillerfandom Год назад +1

    I was really stuck on a practical, I have to make a graph of my readings the book Stated that I should get a straight line but instead I got curves was really stressful, but thankfully found your video,
    It really helped❤
    Thanks again

  • @wishIKnewHowToLove
    @wishIKnewHowToLove Год назад +1

    he just dropped the most beautiful linear regression video and thought we wouldn't notice

  • @marktahu2932
    @marktahu2932 Год назад +1

    Really very helpful - and I'm no professional in any of these fields, but just an old technician who is being reminded of all those brain neurons that have lain dormant for decades,

  • @coreymonsta7505
    @coreymonsta7505 Год назад +1

    I love code and taught calc 3 a couple times, which is my favorite class, but never learned about this topic in school (only hear of its name a lot). That was really interesting

  • @bernard2735
    @bernard2735 Год назад +1

    Beautifully explained, thank you. Liked and subscribed and looking forward to more.

  • @user-hl8sv1if7j
    @user-hl8sv1if7j Год назад +1

    wow. So well explained. Thank you

  • @mskiptr
    @mskiptr Год назад +1

    The parameter space is a super powerful concept. Especially in computer vision, where you can take a bunch of pixels and quickly detect all the lines they approximately form

  • @8megabitz706
    @8megabitz706 Год назад +1

    Ive been waiting for this for too long 10:17

  • @Lado916
    @Lado916 Год назад +1

    Great video! I absolutely love the visual and dynamical proofs in math.
    I just wanted to add that there is a beautiful point-line duality between the two spaces:
    While a dot in parameter space corresponds to a line in real space, a line in parameter space defines a family of curves in real space that intersect at the same point.
    Moreover, if you map your datapoints to their corresponding dual lines, the center of mass of these lines will be a dual point to the best fit line of the data!
    Hope you find this as cool as I do.

    • @RichBehiel
      @RichBehiel  Год назад +1

      That’s really cool! I’ve read about that kind of thing in an intro to differential geometry book, but hadn’t connected the dots in the context of this video. Thanks for a very interesting comment :)

  • @kalaiselvan6907
    @kalaiselvan6907 Год назад +1

    ❤️❤️❤️This is Gold ❤️❤️❤️ Thank you

  • @zeb4827
    @zeb4827 Год назад +1

    very cool video, this connected some dots that I've been struggling to reconcile

  • @CarlosHlavacek
    @CarlosHlavacek Год назад +1

    Really beautiful class.

  • @benjaminshropshire2900
    @benjaminshropshire2900 Год назад +2

    IIRC there *is* a way to leverage that outer product observation: If D is a matrix where each column is [xᵢ 1] and Y is another matrix where each row is [yᵢ] then the entire left Σ becomes DDᵀ and the entire right Σ becomes DY.
    also (I think) this actually generalizes to linear equations with more terms by adding the data as more rows in D. And the data can also be functions of existing simpler terms (e.g. Nth powers of x to get polynomial fits, sin(nx)/cos(nx) to get discrete Fourier transforms, etc.).

  • @ABKW119
    @ABKW119 Год назад +4

    Why do your videos only get recommended to me at 1am, they send me straight down a rabbit hole 😂

  • @TranquilSeaOfMath
    @TranquilSeaOfMath Год назад

    I really like all you put into this video. It helps connect ideas in interesting ways. Thank you for including the Python code.

  • @micahwithabeard
    @micahwithabeard Год назад +1

    i just liked, subbed and commented :D i don't think i can be any more "violently complementary" than that. this was excellent thanks!

  • @RocaSeba
    @RocaSeba Год назад +1

    This video is genius. Subscribed.

  • @michahejman6712
    @michahejman6712 Год назад +1

    Great video! 30 minutes felt like 5 :) Thanks!!!

    • @RichBehiel
      @RichBehiel  Год назад +1

      Thanks, glad you enjoyed the video! :)

  • @chrislau9835
    @chrislau9835 Год назад +1

    Very good explanation 👍🏻👍🏻

  • @AfroNyokki
    @AfroNyokki Год назад +1

    Great explanation, loving it so far. I'm majoring in applied math with a focus in numerical analysis, so this stuff is always fascinating haha. I noticed around 18:20, you started using delta instead of del. Thought it might be a typo but just wanted to check!

    • @RichBehiel
      @RichBehiel  Год назад

      Yeah that’s a typo, sorry! 😅 Thanks for pointing that out.

  • @andytroo
    @andytroo Год назад +4

    introducing the Jacobean could be a nice extension - the shape of best fit is an ellipse, which can make converging towards the best solution hard, as many of the gradient directions in the top half of your example are not pointed towards the best solution, simply towards that valley of best fit. Reshaping the gradients to make that ellipse a circle allows much quicker conversion

    • @RichBehiel
      @RichBehiel  Год назад

      Great idea! I’d love to do a video on that someday.

  • @IAmTheFuhrminator
    @IAmTheFuhrminator Год назад +1

    Such a great video! I had a lecture about this years ago in my engineering analysis class in undergrad, but I took such poor notes that I was never able to reproduce this function. Now as homework I'm going to take your process and solve for other functions like parabolas or cubics which will require me to use 3 and 4 dimensional parameter spaces. Thanks again for the great video!

    • @RichBehiel
      @RichBehiel  Год назад +1

      That’s awesome, I love to hear that! Challenge for you: can you solve it for a general N-degree polynomial? Like with some kind of recursive algorithm. I actually don’t know if this is possible but it seems like a fun puzzle!

    • @IAmTheFuhrminator
      @IAmTheFuhrminator Год назад +1

      @@RichBehiel that would be a fun problem to solve! And even if it can't be solved, I'm sure proving or disproving the possibility of a solution would make a great paper!

  • @nooks12
    @nooks12 Год назад +1

    Satisfying video. Took me back to University.

  • @user-pn1lm3pi6p
    @user-pn1lm3pi6p Год назад +1

    Very good!

  • @maxfitzkin9422
    @maxfitzkin9422 Год назад +1

    I really loved how you put this video together! What did you use to animate and edit everything? It was really clean!

    • @RichBehiel
      @RichBehiel  Год назад

      Thanks! :) I used matplotlib in Python.

  • @pickle.taesan
    @pickle.taesan Год назад +1

    Great video! I never thought parameter space with 'Error Force'.

  • @ydl6832
    @ydl6832 Год назад +1

    Yeah, this is a nice explanation. Neural network is just a more sophisticated version of line fitting with more parameters.

  • @scienceuser4014
    @scienceuser4014 Год назад +1

    Perfect video

  • @GradientAscent_
    @GradientAscent_ Год назад +1

    Very cool animations

  • @rouninph6349
    @rouninph6349 Год назад +1

    It looks like you are trying to hypnotize your listener. 😂 Great explanation btw. Using physical arguments to explain a mathematical concept, I like that.

  • @Osniel02
    @Osniel02 Год назад +1

    just gorgeous!!!

  • @torquencol
    @torquencol Год назад +1

    Lmao thank you for this, this video just came into my recommendations when I needed it most: I've been stressed these last few days just doing laboratory reports, where I have to use a lot the regression line 🛌 It made me hate it less

  • @alexander_adnan
    @alexander_adnan Год назад +1

    Thank you 🙏 ❤❤❤

  • @account4345
    @account4345 Год назад +1

    Just gotta remind myself this is why I must master linear algebra.

    • @RichBehiel
      @RichBehiel  Год назад

      Mastering linear algebra is a great and enduring source of spiritual fulfillment 🙏

  • @kummer45
    @kummer45 Год назад

    Imagine you have a surface with a magnet. That's a game changer.
    Understanding the concept of statistics doing physics is the correct way of UNDERSTANDING mathematics and PHYSICS. However physics has nothing to do with mathematics and mathematics has nothing to do with physics.
    The magic of this is MODELING. Linear regression, average, the gauss curve are concept of fundamental use in statistical mechanics. Eventually higher mathematical physics will launch the student into the field of MODEL MAKING.

  • @williamfurtado1555
    @williamfurtado1555 Год назад +4

    This video is wonderful. How did you create the interactive visualization with the "Parameter Space" and "Real Space" subplots? I'd love to be able to create one on my own.

    • @RichBehiel
      @RichBehiel  Год назад +8

      Thanks William! :) For this video I used Python, specifically matplotlib. You can use that by downloading Anaconda, which will install Python and some scientific modules, then call “from matplotlib import pyplot as plt”. After calling that line, you can use things like plt.figure() and plt.plot() to make a figure and plot things. In this case the parameter space and real space are two subplots in a figure. They’re refreshing at 60 frames per second in a loop which sets the dot’s position in the parameter space while making the line in the real space, based on the current a and b values. To turn on the error landscape, I also added some code to evaluate the error metric (objective function) at all points in the parameter space for each a and b. Then for the error force I calculated and plotted the negative gradient of that. For the part where the dot descends down the gradient, I used F = ma - kv with mass parameter m and friction-ish parameter k to make the dot roll down the hill and then stop at the optimal point.
      I’ll be more careful in future videos to post the source code of the animations too. Well, at least for videos after the one I’m going to post this week; for that one, and the previous videos, I was very sloppy with the code and it wouldn’t be too helpful to see them. But there have been a few comments now about how these animations were made, so I figure the best answer is the code itself. In the future I’ll be better about writing cleaner animation codes and sharing them.

    • @rocknroll909
      @rocknroll909 Год назад

      ​@@RichBehiel wow, you're awesome for such an in-depth reply to this. Thank you, I might try this on my own

  • @denisbaranov1367
    @denisbaranov1367 9 месяцев назад +1

    The beauty of: Linear Regression

  • @peterwolf8092
    @peterwolf8092 Год назад +1

    😂 I realy love this and wish my highschool students would understand it so I could share it with them.

  • @flexeos
    @flexeos Год назад +1

    There is always something that bothers me when the linear regression is approched that way. It is that from the start you consider that y and y are of a different nature: the value of x is known perfectly and the error is on y. This is a pretty strong constraint. I am a metrology engineer and I saw in the comments that you are a metrology engineer too, so you are well aware that in the real world there are errors on both x and y. In which case the error could be the distance from the data point to the line for example

    • @RichBehiel
      @RichBehiel  Год назад

      That’s true! And there are ways of doing regression with ds rather than dy. Although often x is more precise than y, for example if you have a sensor array or are sampling data at a fast and precise rate relative to the change in your signal.
      For example, if we’re looking at a trend in some signal that drifts linearly over an hour, and sampling one datapoint per second, with error on the order of microseconds, then x is very precise in that context.
      But you’re right that there are some cases where x and y might be similarly varying.

    • @flexeos
      @flexeos Год назад +1

      @@RichBehiel my world is more the relation between 2 voltages at different location in an analog network so the noise on both are of the same nature.

    • @angelmendez-rivera351
      @angelmendez-rivera351 6 месяцев назад +1

      @@flexeos I think you are missing the big picture. In most of these data sets (in practice), x(i) is the data set corresponding to the independent variable, the one which you can actually control for much more easily, and y(i) is the data set corresponding the independent variable, and you want to understand y as a function of x, not the other way around, because the other way around is (in every scenario I have seen physicists, engineers, and any other applied S.T.E.M. worker deal with) very impractical and not useful. Now, are there circumstances which are more complicated? Of course there are, but they are the exception, and in those circumstances, the complexities involved are of such a nature that dealing with residuals, as the video does, is not the practical approach anyway.

    • @flexeos
      @flexeos 6 месяцев назад

      @@angelmendez-rivera351 It is not my experience in practice. let 's say that you want to measure a resistor. you inject a current I that you "control" usually using a digital to analog converter and you measure the voltage V at the edge of the resistor and V/I is your resistor. because the world is not perfect, if you want to have a better result, you do the measurement with a bunch of Is and the resistor is now the slope of the best line through the cloud of points V,I. To have a better idea of the exact value of I, while you set it digitally, you have to measure the actual value of it as the translation between the digital value and the actual current is everything but perfect. so in practice you have a cloud of points V,I with the same kind of error (noise, offset, non linearity...) on both V and I. If you assume that I is an independent variable you will end up with a bias. There was a math paper on that bias effect almost 100 years ago that I read but I cannot find the reference just right now. If an electronic example seems too specific, let's look at something that is a typical example given to students like annual income vs age in years. age looks like an independent variable, but in reality by definition there is 1 year uncertainty on it which is not too good as the relative error bar is not even constant. of course in such an example the required precision is not a big problem so you can forget about those subtleties. But in metrology you are tracking few parts per million. Not taking that into account would be like trying to design the GPS without taking the general relativistic effects into account (~accuracy on location becomes > 10kms). my 2 cents

  • @brianli3493
    @brianli3493 Год назад +1

    electric potential actually helped me understand this omg

  • @guslackner9270
    @guslackner9270 Год назад +13

    This video is a wonderful explainer! You've listed in the description that linear regression is "very useful in math, science, and engineering" to which I would like to add economics, which is what I am studying. This video and Jazon Jiao's work (ruclips.net/video/3g-e2aiRfbU/видео.html) are the best explanations of the concept that I have seen in video, lecture, or textbook form. I look forward to seeing what else you share on this channel!

  • @jursamaj
    @jursamaj 9 месяцев назад +1

    And you can fit to other curves with simple transforms of one or both axes, like log or exp.

  • @SD-ni9jh
    @SD-ni9jh Год назад +1

    beautiful vid

  • @einsteingonzalez4336
    @einsteingonzalez4336 Год назад

    That’s awesome! But what happens if we let N approach infinity where the data points are in a finite domain?

  • @sarthakjain1824
    @sarthakjain1824 Год назад +2

    That was on the level of 3 blue 1 brown videos

    • @RichBehiel
      @RichBehiel  Год назад

      Thanks! :) Grant is a role model for sure. The aesthetics of his videos are much better than mine though 😅 But I’ll get better over time.

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq Год назад +1

    Wow. Seen so many videos, read so many papers and books - but this one takes the cake. Would love to see you doing this but for more complex models with fixed effects and all sorts of other bells and whistles. Impressive!!

  • @StudyEnggFocus
    @StudyEnggFocus 2 месяца назад

    Hello, Richard! Could you explain what you meant by error metric? Thanks

  • @DavyCDiamondback
    @DavyCDiamondback Год назад +1

    Nice video on OLS. I've often wondered though why lessons on regression focus on OLS rather than Deming Regression, as OLS seems objectively inferior, so to have so many projections based on the inferior model, we are shooting our research methods in the foot from the start

    • @RichBehiel
      @RichBehiel  Год назад +1

      Good point. Frankly I think it’s because OLS is easier, and gets the job done in most situations. But I agree that there are times when Deming regression is better. Although someone who uses Deming would presumably have learned OLS first. OLS is also conceptually ideal for explaining how calculus can be used to minimize fit error, so it’s a good go-to image to have in mind when solving fancier optimization problems.

    • @DavyCDiamondback
      @DavyCDiamondback Год назад

      @@RichBehiel I completely understand, in fact, this subject is making me think about applied mathematics, because if we go deeper, it's not like linear regression in any form is the the best way to actually model most data, so I'm thinking about dividing a function into splines to create a good fit, you can go too far and smoothly fit every point into a function, but then your function is skewed towards the data set, losing the ability for good projections. It's an interesting puzzle (and I hated applied mathematics in college)

    • @turun_ambartanen
      @turun_ambartanen Год назад +2

      Well, there are quite a few advantages of OLS compared to total least squares fit.
      For one, every measurement where x is tightly controlled and y is the thing you want to learn about, OLS is the right tool. Because there are no or only negligible errors in x, the distance of datapoints to the prediction, dx, doesn't matter and must not be included in the fit.
      And also it works much better with arbitrary functions than total least squares. For an arbitrary function I don't think there even is _any_ way to calculate the total least squares error. Only well behaved functions work, and even then you have to define the derivative to perform a total least squares fit.

  • @tylerbakeman
    @tylerbakeman Год назад +1

    Instead of calculating Dy, it might be better to calculate the distance a point is from the line (especially for smaller data sets, where Dy could be large, bur infact the line could be very close).

  • @PatrickDoolittle
    @PatrickDoolittle Год назад

    I like Sujal Gupta watched this video because I am studying machine learning. I have been studying simple linear regression for the past couple weeks now! Just yesterday I started to think about how the moore-penrose psuedo inverse generalizes the idea of an inverse to situtations where the matrix is not square. I call linear maps to a higher dimensional space "embeddings" and linear maps to a lower dimensional space "projections". For a square matrix, which is neither an embedding or a projection but a linear operator in the same dimension, we can undo the linear mapping by finding the inverse X^-1. In the case of projections, there are many high dimensional vectors that can be projected down to a given low dimensional vector, so there is no unique inverse. However we can solve the system Xb=y for b using the Moore-Penrose *psuedo* inverse: (X^T X)^-1 X^T. When we apply the moore-penrose psuedo-inverse on the vector of response variable y, we project y onto the row-space of X, which is formed by the row vectors, which are linear combinations of the parameters. By projecting y onto every data point (row vector) and adding it up(in essence projecting onto the entire row space), we get our coefficients, and that is the beauty of the moore-penrose psuedo-inverse!

    • @davidmurphy563
      @davidmurphy563 Год назад

      I code DNNs too. Um. I understood your words but not your point. Genuinely curious here.
      So we can calculate the inv matrix. Take the reciprocal of the determinant and multiply it by the matrix with the diagonal swapped and the upper/lower negated. This spits out a new matrix with the property that if you multiply that by the original you get the identity (assuming linear independence).
      Ok fine, all very useful. But what's that got to do with the price of fish?

  • @benandrew9852
    @benandrew9852 Год назад +1

    holy shit
    I have genuinely never even come close to thinking about it like this
    top marks, no notes

  • @bronga645
    @bronga645 Год назад +1

    sub, like and comment for your effort, even if you dont make much on yt you are a great mathematician! And i am sure you will make it in life and be a help to humanity as a whole. thank you

    • @RichBehiel
      @RichBehiel  Год назад

      Thanks for the kind comment! :)

  • @zukofire6424
    @zukofire6424 Год назад +1

    Beautiful and surprised I never knew some of what you explained. I wanna add something irrelevant : you are so handsome!

  • @nofalldamage
    @nofalldamage Год назад +1

    Great video.
    Is the matrix at the end always invertible?

    • @RichBehiel
      @RichBehiel  Год назад

      Great question! It’s invertible as long as its determinant isn’t zero. Since it has the form [A,B,B,N] where A and B are real numbers and N is a positive integer, so its determinant is AN - B^2. For this to be zero would require that AN = B^2, in other words for the sum of x_i^2 times N to equal (the sum of x_i)^2. I’m not sure if this can happen, feels like it can be proven one way or the other without a ton of work, but I’ve gotta go. So I leave that as an exercise to the reader! :)

    • @nofalldamage
      @nofalldamage Год назад

      @@RichBehiel I think one of the cases where the matrix is not invertible is if all the points are on a vertical line. Kind of makes sense since then the form y = ax + b doesn't really work.

  • @badermuteb1012
    @badermuteb1012 7 месяцев назад

    How did code these interactive plots? Thanks

  • @kennethtrimble5144
    @kennethtrimble5144 Месяц назад +1

    excellent

  • @trustnoone81
    @trustnoone81 Год назад +1

    Do I understand correctly that the "valley" in the error landscape is the set of all lines that pass through the point (x-bar, y-bar)?

    • @RichBehiel
      @RichBehiel  Год назад

      Great question, and I’m actually not sure. Anyone know the answer?

  • @PrismaticCatastrophism
    @PrismaticCatastrophism Год назад +1

    Could you make similar video about parabolic graphs?

    • @RichBehiel
      @RichBehiel  Год назад

      I’d like to someday! The procedure is very similar, but ax^2 + bx + c instead of ax + b. It’s a 3D parameter space, but the same techniques work.

  • @code_explorations
    @code_explorations Год назад +1

    Thanks

  • @akidnag
    @akidnag Год назад +1

    Great vid, thank you!
    I'm struggling knowing how do you visualize the "parameter space" in python?

    • @akidnag
      @akidnag Год назад +1

      I did a mesh grid for a and b from -5 to 5 100 points X,Y. Then I calculate the modulus as Z= the sum of the sqrt of the ^2 of each eq and did a contourf(X,Y,Z), but no luck :/

    • @akidnag
      @akidnag Год назад

      I think the quiver plot is ok as quiver(X,Y,eq1,eq2)

    • @RichBehiel
      @RichBehiel  Год назад +1

      I did a contourf and a quiver. If the contourf isn’t working, it’s possible the color limits are off? Oh, actually come to think of it, I might have taken the log or sqrt of the error, to flatten out the landscape so it’s easier to see. Basically applying a nonlinear colormap.

    • @akidnag
      @akidnag Год назад

      @@RichBehiel Thanks a lot! Keep up the great work!

    • @akidnag
      @akidnag Год назад

      Still no good, I'm sorry.
      So in contourf is V (or log(V) or sqrt(V)) and in quiver is Fa,Fb, with spanned a and b, right?
      Sorry to bother but I feel I understand, but not having the same results make doubt what I'm doing wrong :/
      Is it too much if you share the code for visualizing the parameter space?

  • @agentdarkboote
    @agentdarkboote 10 месяцев назад

    I would love it if you could show why the pseudoinverse recovers this method!

  • @Truth4thetrue
    @Truth4thetrue Год назад +1

    Very nice video and a quite interesting and actually useful topic
    I'd just like to say that the line wiggling around for most of the video was (to me) irritating, great work nonetheless

    • @RichBehiel
      @RichBehiel  Год назад

      Thanks for your feedback! :) That’s not something I noticed, but now that I see it I totally get where you’re coming from. I’ll try to avoid having large repetitive motions in future videos.

  • @potatochipbirdkite659
    @potatochipbirdkite659 Год назад +1

    Do you have the blue dot following a Lissajous curve?

    • @RichBehiel
      @RichBehiel  Год назад

      I forget what I did for that, I think I just had some sines and cosines of different frequency in x and y.

  • @m9l0m6nmelkior7
    @m9l0m6nmelkior7 3 месяца назад

    But is that matrix invertible if there is more than one extremum ?

  • @jamesmcfarlane3469
    @jamesmcfarlane3469 Год назад +1

    Is this method, or something similar applicable to non linear least squares? I did a project over Christmas using non linear least squares regression and this would’ve been super helpful 😅

    • @RichBehiel
      @RichBehiel  Год назад +1

      The same concept of minimizing a least squares objective function by setting the gradient to zero applies to nonlinear least squares, but there are also extra steps involved.

  • @peterwolf8092
    @peterwolf8092 Год назад +1

    Is it possible to get a „second best“ valley? A pseudo best solution?

    • @RichBehiel
      @RichBehiel  Год назад +1

      Not for linear regression, but for fits with more parameters yes. Gradient descent can sometimes get stuck in a local minimum, a valley other than the best one. If there’s an analytic solution, it might involve the roots of a polynomial or something, so you can have multiple values which are locally optimal. In that situation, the height of the objective function at each optimum can be quickly compared, since the list should be pretty short.

  • @bettercalldelta
    @bettercalldelta Год назад +1

    0:13 am i the only one who when watching this part was like "god damn it just place it in the correct place already you idiot"