Bayesian Curve Fitting - Your First Baby Steps!

Kapil Sachdeva

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 сен 2024
In this third part of the series, we start to see our unknown variables such as weights as Random Variables as well. I explain and show how to use Bayes' rule to get the distribution overweights and how to find the most probable value using MAP (Maximum a Posteriori)
This tutorial is based on the content from chapter 1 of Dr. Bishop's book.

Комментарии • 40

@TallminJ 2 месяца назад
Your videos are amazing, very clear and concise explanations!
@riadhbennessib3961 2 года назад ⁺²
Thank you so much for the videos lessons, it encourage me to re-see the hard book of Bishop!
@KapilSachdeva 2 года назад
🙏
@sbarrios93 3 года назад ⁺²
This video is pure gold. Thank you!!
@KapilSachdeva 3 года назад
🙏🙏
@vi5hnupradeep 3 года назад ⁺³
thank you so much for your videos! they are really good at explaining concepts
@KapilSachdeva 3 года назад
🙏
@yogeshdhingra4070 Год назад ⁺¹
Your lectures are gems..there is so much to learn here! Thanks for such a great explanation.
@KapilSachdeva Год назад
🙏
@pythonerdhanabanshi4554 2 года назад ⁺¹
I would push multiple likes if available...so satisfying...
@KapilSachdeva 2 года назад
🙏
@bodwiser100 3 месяца назад
Thank you! One request -- can you explain the reason behind the equivalence between assuming that the target variable is normally distributed and the assumption that the errors are normally distributed. While I understand that the two assumptions are simply the two sides of the same coin, the mathematical equivalence between them appeared to me like something that is implicitly assumed in moving from part 2 video to part 3 video.
@wsylovezx 9 месяцев назад ⁺¹
greatly appreciate your super clear video. I have an question at 5:48, by Bayesian formula, p(w|x,t) ∝ p(x,t|w)*p(w), it is different with the expression p(w|x,t) ∝ p(t,x|w)*p(w) from your slide.
@lakex24 9 месяцев назад ⁺¹
should be: it is different with the expression p(w|x,t) ∝ p(t|x, w)*p(w) from your slide.
@goedel. 2 года назад ⁺¹
Thank you!
@KapilSachdeva 2 года назад
🙏
@YT-yt-yt-3 28 дней назад
P(w|x) - what is x mean her exactly. Probability of weight given different data within training or different training set or something else?
@adesiph.d.journal461 3 года назад
Sorry for spamming with questions. In terms of programming when we say p(w) is a prior is this equivalent to initializing the weight with random a Gaussian, like in PyTorch "torch.nn.init.xavier_uniform(m.weight)"
@KapilSachdeva 3 года назад ⁺¹
Please do not hesitate to ask questions.
Prior is “your belief” about the value of a random variable ( w in this case).
“Your belief” is your (data analysts/scientists) domain knowledge about w that you express as a random variable.
Let’s take a concrete example. You were modeling the distribution of heights of adult males in India. Even before you go and collect the dataset you would have a belief about the height of adult males in India. Based on your experience you would say that it could be anything in between 5’ and 6’.
If you think that all the values between 5’ and 6’ are equally likely then you would say that my prior is the uniform distribution with support from 5’ to 6’
Now coming to your pytorch expression, it is creating a tensor whose values are uniformly distributed (between 0 and 1). In neural networks, typically you fill in the “random” values to initialize your weights. You donot typically express your domain knowledge (aka the prior as in Bayesian statistics)
Based on above, philosophically, the answer is no prior is not equivalent to your expression however implicitly it your belief (albeit completely random) about the “initial” values of weights.
@adesiph.d.journal461 3 года назад ⁺¹
@@KapilSachdeva thank you so much! This makes total sense. I went on to watch the videos a few times to make sure the concepts sync in completely before I advance and every iteration of the video things are becoming more clear and I am able to connect things! Thanks!
@KapilSachdeva 3 года назад
@@adesiph.d.journal461 🙏
@yeo2octave27 2 года назад
Thank you for the video! I am currently reading up on manifold regularization and I am curious about applying Bayesian methods with MR.
12:11 for elastic net/manifold regularization we introduce a second regularization term to our analytical solution, could we simply express the prior as being conditioned on the 2 hyperparameters, i.e. p(w | \alpha, \gamma), by applying the Bayes theorem? How then could we arrive at an expression of the distribution of w? i.e. w ~ N(0, \alpha^(-1) * I)
@SANJUKUMARI-vr5nz 2 года назад ⁺¹
Very nice vedio
@KapilSachdeva 2 года назад
🙏
@adesiph.d.journal461 3 года назад
I came for this part from the book and you nailed it! Thanks.
A quick question how are you differentiating in terms of notation between Conditional Probability and Likelihood. I find it confusing in PRML. To my understanding, Conditional Probability is a scalar value that indicates the chance of an event (the one in the numerator (I understand this is not a numerator but to convey my point)) given the the events in (denominator) have occurred. While the Likelihood is trying to find the best values of mean, standard deviation to maximize the occurrence of a particular value. I might be wrong! happy to be corrected :) The confusion is ideally raised because in the previous part we had p(t|x,w,beta) here we wanted to find optimal w,beta to "maximize the likelihood of t". While here p(w|alpha) becomes conditional probability or even p(w|x,t) also as conditional probability. They maybe naive questions! Sorry!
@KapilSachdeva 3 года назад ⁺²
No not a naive question. The difference between probability and likelihood has bothered many people. Your confusion stems from the fact that overloaded and inconsistent usage of notations and terminology is one of the root causes of why learning maths & science is difficult.
Unfortunately, the notation of likelihood is the same as that of conditional probability. The "given" is indicated using the "|" operator. Both likelihood and conditional probabilities you have "given" operators. In some literature & context, the symbol "L" is used (with parameter and data flipped)
> While the Likelihood is trying to find the best values of mean, standard deviation to maximize the occurrence of a particular value.
> While here p(w|alpha) becomes conditional probability or even p(w|x,t) also as conditional probability.
Here is one way to see all this and make sense of the terminology. In MLE, your objective is to find the values of parameters (mu, etc) keeping the data fixed. The outcome is what we call - likelihood. This likelihood is kind of a relative plausibility (probability) or proportional to probability.
Now when we treat "parameters" as Random Variables then we seek their "probability distributions". A parameter (RV) could depend on another parameter (RV or Scalar) and hence these probability distributions take the form of conditional prob distributions.
Hope this makes sense.
@adesiph.d.journal461 3 года назад ⁺¹
@@KapilSachdeva Thank you so much for such a detailed response. Yes my confusion did come from the fact my previous knowledge of Likelihood had L as the notation and reversed notation for the distribution. This makes sense thank you!
@KapilSachdeva 3 года назад ⁺¹
🙏
@zgbjnnw9306 2 года назад
at 12:38, if you make two equations equal, lambda is not calculated as the ration of alpha/beta... the equation of lambda includes the sum of deviation and W^tW...
@KapilSachdeva 2 года назад
The value of lambda is not obtained by equating 2 equations. It’s purpose is to show that the hyper parameter (lambda) in ridge regression can be seen as a ratio of alpha and beta. In other words, the MAP equation is scaled by 1/beta.
@zgbjnnw9306 2 года назад
@@KapilSachdeva Thanks! Where I can see the derivation of lambda written as alpha/beta? Could I find it in the book by Bishop?
@KapilSachdeva 2 года назад
@@zgbjnnw9306 section 1.2.5 of Bishop …the very last lines …
@zgbjnnw9306 2 года назад
why the posterior p(w | x t) uses the likelihood p(t | x w B) instead of p( x t | w )? Why there's B in the likelihood?
@KapilSachdeva 2 года назад
This is the inconsistency of the notation that I talk about. Normally we would think that whatever goes after “|” (given) is a probability distribution but the notation allows to use scalar/hyper parameter/point estimates as well.
Logically it is ok as even though in this exercise we are not treating beta as prob distribution, the likelihood still depends on it. Hence it is okay to include it in the notation.
This is what makes me sad. The inconsistency in notation in the literature and books.
@zgbjnnw9306 2 года назад
Kapil Sachdeva Thanks for your help! So beta and X are both considered ‘constant’ like alpha?
@KapilSachdeva 2 года назад
@@zgbjnnw9306 you can see it like that. Nothing wrong with it, however a better way of saying that would be -:
Beta is either a hyper parameter (something u guess or set it based on your domain expertise) or a point estimate that you obtain using frequentists methods.
@stkyriakoulisdr 3 года назад ⁺¹
The only mistake in this video is that "a posteriori" is latin and not french. cheers!
@KapilSachdeva 3 года назад ⁺¹
You are absolutely correct. Many thanks for spotting it and informing me.
@stkyriakoulisdr 3 года назад ⁺¹
@@KapilSachdeva I meant it as a compliment. Since the rest of the video was so well-explained
@KapilSachdeva 3 года назад
I understood it :) ... but I am genuinely thankful for this correction because so far I had thought it is French. Your feedback will help me not make this mistake again.

Следующие

Автовоспроизведение

Sum Rule, Product Rule, Joint & Marginal Probability - CLEARLY EXPLAINED with EXAMPLES!