Prof. Brunton, thank you for the lecture! However, in some cases such as maximize a posterior and maximum likelihood estimation, under the assumption that the noise is Gaussian distributed, minimizing the L2 norm provides the optimal solution. Usually certain heuristics such as M-Estimation are applied to mitigate issues arise from outliers, in other words changing the kernel to a shape that can tolerate certain amount of outliers in the system. It sounds like using L1 norm here has very similar effects to that of robust kernels where we are effectively changing the shape of the cost/error. Can you please elaborate on the differences between using (L1 norm) and (L2 norm + M-estimator), and how the L1 norm performs in applications where data uncertainty is considered? Thanks!
Is there any way we might generate a sampling matrix which is maximally incoherent? What if the samples are positioned randomly and maximally distant from each other? Can we add additional constraints on the sampling matrix?
Hi Steve L1 solution (i.e regularization) error surface is not convex. Are you planning to explain how do we optimize such functions ? Mathematical derivations would be helpful :) Thanks
Dear sir, I am from Sri Lanka and I am really admired by your video series. My doubt is l1 norm does not differentiable at zero due to its non-continuty. To impose sparsity, researchers use ISTA (Iterative Soft Thresholding Algorithm) to handle the weights when they come near to the zero with a certain threshold. What are your thoughts related to this?
I think it's so popular because you need it so damn often. Pythagoras or the distance between two points in 2D knows basically everybody. This idea dominates mechanical engineering. The whole idea of complex numbers require l2 norm with i = sqrt(-1) is designed around l2 norm. So all the differential equation in mechanics and electronics need it. And basic optics need it too.
This channel is one of the most important channels for me! MANY Thanks Steve
Along with visual aids you have explained the concept in a very understandable manner. Thanks for the video.
Thumbs up Steve!
Nice video. Your channel and book are amazing! Congratulations.
I was just talking in a meeting about this, get out of my head Brunton.
Prof. Brunton, thank you for the lecture! However, in some cases such as maximize a posterior and maximum likelihood estimation, under the assumption that the noise is Gaussian distributed, minimizing the L2 norm provides the optimal solution. Usually certain heuristics such as M-Estimation are applied to mitigate issues arise from outliers, in other words changing the kernel to a shape that can tolerate certain amount of outliers in the system. It sounds like using L1 norm here has very similar effects to that of robust kernels where we are effectively changing the shape of the cost/error. Can you please elaborate on the differences between using (L1 norm) and (L2 norm + M-estimator), and how the L1 norm performs in applications where data uncertainty is considered? Thanks!
I think you are right
Thank you, Professor. It is pretty helpful for me.
Mr. Brunton What material and program did you use while shooting this video?
Is there any way we might generate a sampling matrix which is maximally incoherent? What if the samples are positioned randomly and maximally distant from each other? Can we add additional constraints on the sampling matrix?
Excellent!
Hi Steve
L1 solution (i.e regularization) error surface is not convex. Are you planning to explain how do we optimize such functions ?
Mathematical derivations would be helpful :)
Thanks
Dear sir, I am from Sri Lanka and I am really admired by your video series. My doubt is l1 norm does not differentiable at zero due to its non-continuty. To impose sparsity, researchers use ISTA (Iterative Soft Thresholding Algorithm) to handle the weights when they come near to the zero with a certain threshold. What are your thoughts related to this?
Does anyone know historical reasons for such popularity of L2 norm? Very entertaining videos! Namaste!
I think it's so popular because you need it so damn often. Pythagoras or the distance between two points in 2D knows basically everybody. This idea dominates mechanical engineering. The whole idea of complex numbers require l2 norm with i = sqrt(-1) is designed around l2 norm. So all the differential equation in mechanics and electronics need it. And basic optics need it too.
there is a point in each video, where you loose consciousness of time passing :D