This introduction to Bayesian Statistics is simply the best available on RUclips. I cannot imagine why it got 89,000 views but only 2400 likes. I think it is because many persons looked at the comments and thought it didn't need praising. That is unfortunate The comments are a PRIOR, and every added comment adds evidence to create a better posterior.
I forgot to add one thing. Perhaps an unfortunate effect of the lack of commendation has been that Woody Lewenstein has not added to his Bayesian videos. Tragic. There is so much more that he could have put into the second video in this playlist.
Excellent. In general I leave comments 0.5% of the time. But when I think something is really superb I always leave a comment. What is the chance I thought Woody’s tutorial was really superb?
To answer your question: we can't know. You didn't give enough information. All we know is that you leave comments only 0.5% of the time, and that you leave a comment 100% of the time when you find the video is "really superb". But we don't know at which rate (r) you leave comments on video that you don't find "really superb". So all we can say is that among the videos you watch, you qualify only at maximum 0.5% (if the rate r would equal 0%) of them as "really superb", which is not much. Maybe the real conclusion is that you should stop watching so mediocre videos overall haha :p
As I wrote before this is the best ever explanation of the Bayesian statistics - THANK YOU VERY MUCH!!! and I am coming back to it every time to refresh this concept and vocabualry :)! When we talk about the numerator ( P(a=5, b=3 | Bob wins)*P(Bob wins) ) of the probability equation (that Bob wins given A=5 and B=3) at 1:07:20. Basically we can not really operate with these terms separately (P(a=5, b=3 | Bob wins) * 0.057) and have to merge them together into the distribution (y=(8x3)x^6(1-x)^5). Just trying to catch a moment and a point when it pivots from the frequentist to the bayesian :). In other words the point is that when we look at this formula as a math expression then we can camcel out the y=(8x3)x^3(1-x)^5 while left only with (3/8)^3 which then would be purely frequentist estimation of the probability of Bob winning (I do understand that in fact we have the areas under the graphs for the respective distributions in numerator and denominator). If you were so patient to having read till this point :) - what is confusing is the P(a=5, b=3 | Bob wins) part of the numerator which is hard to imagine...
Brilliantly taught! I thought the simulation was particularly interesting, especially as it shows that simulations can completely avoid having to perform the difficult integrals that crop up with more complex analyses. Thank you so much.
The first lesson is everything! I finally understand the fundamental difference between Bayesian and frequentist statistics. 🎉 beautifully explained, thank you!
This is a top shelf explanation of an intuitively difficult concept. Introducing Incremental complexity using examples is a superb teaching method. The visualization using Excel to break out the calculation stages was for me the icing on the cake. Thank you, Woody!
I have been interested in Bayesian analysis for a few years and seen dozens of videos. This is the best video I have seen to learn the concepts. Thank you so much for producing and sharing this knowledge!
Thank you for this absolute gem of a lecture. I think it might tickle you to know how I approached the problem re Tanya's tennis and sunny days (23:00 ish). As a healthcare professional, I'm much more familiar with sensitivity, specificity, positive/negative predictive values, prevalence etc, than I am with an equation for conditional probability. So for my first attempt at the problem, I basically framed the situation as "tennis" being a diagnostic test for "sunny", and drew a 2x2 table for "tennis" against "sunny"! The sensitivity of the test is 80%, ie. 80% of "sunnies" had a positive "tennis" result. The specificity of the test is 65%, ie. 65% of "not sunnies" had a negative "tennis" result. The "prevalence" of sunny is 60% and not sunny is 40%. Therefore, I just had to solve for the positive predictive value of the "tennis" test for "sunny", by using the relative prevalence to "weight" the sunnies vs not-sunnies within the "tennis" group. Et voila, it yielded the exact same process of multiplication as using the cond probability equation, which I used for my second attempt. I know this may sound like a much more complicated method, but seeing the probabilities in a table and applying concepts I already know truly helped me actually understand the multiplications within the conditional probability equation, rather than just solving for it blindly. This was a lightbulb moment for me. Thank you!
This is really great class!!! Many thanks! You are a great teacher as you can put yourself into the shoes of a student and highlight the least clear connections !!! The graphics is awesome as well - very clear, down to detail not redundant!!! Really pity and weired that this channel did not attrackt too many subscribers !
@woodyrow why are we using the conditional formulas instead of the Bayes formulas that I know which is P = P(A/B) × P(B) / P(A). I am new to statistics. Please explain
🎯 Key Takeaways for quick navigation: 00:00 *🎓 Introduction to Bayesian Statistics* - Exploring Bayesian statistics from scratch. - Suitable for anyone interested in probability and statistics, from students to professionals. - Starting with fundamental questions about probability and its applications. 01:10 *🎲 Objective vs. Subjective Views on Probability* - Contrasting objective (frequentist) and subjective (Bayesian) views on probability. - Highlighting limitations of frequentist approach, especially for one-off events like horse races. - Illustrating subjective Bayesian model's flexibility and rationality in handling uncertainty. 09:33 *📊 Degrees of Belief in Bayesian Probability* - Bayesian probability as degrees of belief or uncertainty measures. - Illustrating subjective probabilities through scenarios involving pregnancy and gender prediction. - Emphasizing rationality in adjusting beliefs based on available evidence. 10:01 *🧠 Conditional Probability Basics* - Introduction to conditional probability using simple visual examples. - Building intuition for conditional probability through visualizations. - Setting the stage for understanding Bayes' theorem. 13:19 *📝 Formulating Baby Bayes Theorem* - Deriving a simplified version of Bayes' theorem using visual probability representations. - Demonstrating application of the theorem in simple probability problems. - Introducing notation and terminology for hypothesis and evidence probabilities. 20:20 *🌳 Bayes Theorem Application with Tree Diagrams* - Applying Bayes' theorem to complex scenarios using tree diagrams. - Solving probability problems involving multiple events and conditional probabilities. - Demonstrating how evidence updates prior probabilities to yield posterior probabilities. 23:34 *📈 Bayesian statistics application example: Updating probability with evidence* - Bayes' theorem updates the probability of an event given new evidence. - Example: Given the probability of sunny weather and playing tennis, Bayes' theorem helps update the probability of sunny weather given that tennis was played. - Demonstrates how prior beliefs are adjusted based on new information. 24:45 *📊 Bayesian statistics application example: Probabilistic analysis in economics* - Scenario: Analyzing the probability of a recession given job loss using Bayes' theorem. - Demonstrates the use of prior probabilities and conditional probabilities in economic analysis. - Shows how Bayesian statistics can be applied to decision-making in economic forecasting. 29:22 *🏃♀️ Bayesian statistics application example: Probability distributions in sports* - Example: Analyzing the probability of a girl running 100 meters in a certain time frame using normal distribution. - Shows how Bayesian statistics is used to update probabilities based on additional information (e.g., being in the school running team). - Illustrates how conditional probability influences the assessment of outcomes in sports. 33:19 *🧠 Bayesian statistics application example: Counter-intuitive results* - Examines counter-intuitive outcomes using conditional probability in IQ distribution scenarios. - Demonstrates how small changes in distributions can lead to significant shifts in probabilities. - Highlights the importance of understanding conditional probability in interpreting statistical results. 41:42 *🦠 Bayesian statistics application example: Medical diagnosis* - Examines a medical diagnosis scenario using Bayes' theorem. - Illustrates how prior beliefs are updated based on diagnostic test results. - Emphasizes the significance of understanding conditional probability in medical decision-making. 48:11 *📊 Understanding Bayes Theorem through an Example* - Explains the application of Bayes Theorem using an example involving Steve, a shy individual, to illustrate how prior probabilities and evidence combine. - Demonstrates how intuition can be misleading when prior probabilities and evidence are not considered. - Breaks down the calculation process step by step, showing the application of Bayes Theorem in determining the probability of Steve being a librarian given certain traits. 51:42 *📈 Formal Naming and Components of Bayes Theorem* - Defines the formal components of Bayes Theorem: prior, posterior, likelihood, and evidence. - Illustrates the terminology used in relation to each component, such as "prior" for the initial probability, "posterior" for the updated probability, "likelihood" for the probability of evidence given a hypothesis, and "evidence" for the total probability of the observed evidence. - Provides insights into the significance of each component in Bayesian inference and decision-making processes. 56:36 *🔍 Exploring a Complex Example: Bayesian Approach vs. Frequentist Approach* - Introduces a more complex example involving a game between Alice and Bob to compare Bayesian and frequentist approaches. - Contrasts the frequentist method, which relies on straightforward calculations, with the Bayesian method, which involves applying Bayes Theorem to update probabilities based on evidence. - Demonstrates how Bayesian inference can provide more accurate predictions by considering prior probabilities and updating them with observed evidence, even in complex scenarios. 48:40 *📊 Bayesian Intuition: Steve's Occupation* - Daniel Kahneman presents a scenario about Steve, a shy and tidy individual, posing the question of whether he's more likely to be a farmer or a librarian. - Despite intuitive judgment favoring Steve being a librarian, Bayesian analysis challenges this assumption by considering the proportion of farmers and librarians meeting Steve's criteria. - Applying Bayesian theorem, the analysis shows that Steve is more likely to be a farmer, emphasizing the importance of considering the base rate in making probability assessments. 51:28 *📈 Bayesian Terminology: Understanding Bayes Theorem Components* - Prior: The probability of a hypothesis before considering any new evidence. - Posterior: The probability of a hypothesis after considering new evidence. - Likelihood: The probability of observing the evidence given that the hypothesis is true. - Evidence (Marginal Likelihood): The probability of observing the evidence, accounting for both scenarios where the hypothesis is true and where it's not. 56:36 *🎲 Bayesian Approach: Alice and Bob's Game* - Illustration of a game between Alice and Bob where a ball is randomly placed on a table, dividing it into two sections, with each player scoring points based on where the ball lands. - Contrasting frequentist and Bayesian approaches in assessing Bob's probability of winning the game. - Through simulation, Bayesian analysis consistently yields a higher probability of Bob winning compared to the frequentist approach, demonstrating the Bayesian method's reliability in probabilistic assessments. Made with HARPA AI
Well done! Among the best of the best what I have watched so far on Bayes theorem. I suggest you delve further with this lessions into Bayes stats in future vedios
32:30 minor nitpick but I would say it is higher than 45% because running faster would make someone more likely to be on the running team. Good lecture so far!
Excellent presentation. Wouldn't the Monty Hall Problem be an example of where using Bayes would be helpful? The update info would be that the host, Monty Hall will switch to a door with a goat. The setup is this: Monty Hall is the host of a TV Show, where a contestant must choose one of three doors , where there are goats behind two of the doors, and a car behind the other door. If the contestant chooses the door with the car, they get to keep it. If they choose a door with a goat behind it, they're out. The additional info is that once the contestant selects one of the doors, Monty will stop the show, open up one of the doors containing a goat, and proceed to ask the contestant if they'd prefer to switch to another door. Then the question is whether it is a good idea for the contestant to switch. Answer is yes, given by choosing at random, uniformly, the contestant will have initially chosen the car only 1/3 of the time, and one of the two goats 2/3 of the time. So, 2/3 of the time, contestant will have made the wrong choice, and will improve the odds by switching. I hope this isn't too confusing.
@@woodyrowHi again, if I may ask a more general question in Statistics: Once a hypthesis test has been setup: Difference of Means, Proportion, Anova, etc. Is there a "Natural" way of defining the statistic to be used to determine whether we reject or don't reject the Null Hypothesis?
Great question@@fernandojackson7207 . I suppose overall I'd say there is not a general natural way of doing this. In practice, you learn when and where to use each one through practice. I think the best I could do is maybe to suggest thinking through the following: 1. Type of Data: The nature of the data (e.g., categorical vs. continuous, paired vs. independent samples) significantly influences the choice of the test statistic. For example, a t-test is appropriate for comparing the means of two independent samples of continuous data, whereas a chi-square test is used for categorical data. 2. Hypothesis Being Tested: The hypothesis itself (difference of means, proportion, variance, etc.) guides the choice of the statistic. For instance: For testing differences between means, you might use a t-statistic in a t-test. For proportions, a z-statistic might be used in a z-test. For comparing variances, an F-statistic is used in ANOVA (Analysis of Variance). 3. Assumptions Underlying the Statistical Test: Each statistical test comes with its own set of assumptions (e.g., normality, homogeneity of variances, independence). The choice of statistic is contingent upon whether these assumptions are met. For example: A t-test assumes normally distributed differences, but if this assumption is violated, a non-parametric test like the Mann-Whitney U test might be more appropriate. ANOVA assumes homogeneity of variances among groups; if this is not met, you might use a Welch's ANOVA instead. 4. Design of the Study: The study design (e.g., matched pairs, blocked designs) also influences the choice. For matched pairs, a paired t-test uses the differences within each pair as the data for analysis.
But beyond these sorts of observations, it's often just the case that through experience you begin to recognise what is appropriate in which situations. Hope that helps a little!
@woodyrow great video 👍 the only thing not clear for me is how you derived the 1/9 denominator at 1:05, how did you calculate it? Can you please refer to a resource exploring the formula in detail 🙏
@@nassersaed4993 glad you like it! The 1/9 is calculated using integration around 1:05. Watch that section and see if it makes sense. You'll need to know about integration though, which is a pretty big topic.
Woody, Thanks for an outstanding high-quality video. I was quite happy until the very end when you did your simulation in Excel. That is not reproducible (even though this is pseudo-random) as it would have been in a program like R (or python). There are indeed many packages in R that do Bayes but the recent addition "bayesrules" takes the cake. It is a simple and user-friendly packes that is worth taking a look at.
Thank you for this video. What I missed is how you calculated the probability in your first example for normal distribution. You just said look at the area under the curve and you said calculate and get the value.
Calculating probabilities with the normal distribution essentially involves using a computer or a calculator to find the area. While there is a (complicated) function for the normal distribution, there is no formula for computing the area under the curve. you have to just use a computer/calculator to find it. Before computers, it just had to be approximated by hand!
Great video, thank you. I feel like the pill sample calculation is not correct. If there is a very tiny change in mean like 100 -> 100.0000000001, the calc made(from the video) will yield almost 50% that the genius has taken the pill which seems incorrect. I feel like it should be [P(102)-P(100)]/P(100) - I may be wrong but the original calc seems wrong to me. Also the number of zeros for calculated probability should be 5 instead of 6 zeros. Thanks
34:16 "How much more likely is it that your child will be a genius if they take this pill?" Probability ( Took a Pill | Given Is a Genius ) Why is it Probability ( Pill | Genius ) and not Probability ( Genius | Pill ) Probability ( Took Red Pill when as a child | Were going to be a genius when taking pill )
I did it myself. Actually I was doing it correctly. I was getting the distribution of the bernoulli probability of x as 0.4 which is correct based on the solution given in BDA ie (k+1)/(n+2) which is (3+1)/8+2 = 0.4. Also the MAP estimate was 3/8. My mistake was that I was taking the point estimate and cubing it. In reality I had to play 3 bernoulli trials with the obtained value of theta and see which one is actually winning using the uniform rng like you did in excel. After doing that, I can see the mean of quantity generated to be 0.09 and SD of 0.29
Thanks for this great into. However, I got a bit confused around 1:06 when you evaluate the numerator. You even make the remark "I simplify a bit ...". A then, it seems you actually evaluate the numerator as "P(a=5, b=3).P(Bob wins)" , instead of "P(a=5, b=3| Bob wins).P(Bob wins)". What exactly was simplified here? Is indeed "P(a=5, b=3)" equal to "P(a=5, b=3| Bob wins)" ?
Great question. This is a slippery part of the argument and you're right to ask about it. Essentially you are right, "P(a=5, b=3)" does equal "P(a=5, b=3| Bob wins)", or at least, the expressions in terms of x are the same. If I ask you what's the probability of alice having 5 and bob having 3, the answer is (8C3)*x^3*(1-x)^5. If I told you that bob actually went on to win, the probability of alice having 5 and bob having 3 after 8 rounds is still (8C3)*x^3*(1-x)^5. The difference is that you might need to adjust your thinking about what x was. But this is all taken care of by the integration and division. Hope that helps clarify a bit!
I'm not seeing the error. P(E) = p(he loses his job). We calculate this by saying its either that there is a recession and he loses it, or there's not a recession and he loses it. The calculation for that is: 0.1x0.4 (recession and he loses his job) + 0.9x0.05 (no recession and he loses his iob) =0.085. Let me know if that makes sense of if you think there is an error with that.
Thanks for the interesting and clear video! I have a doubt about Amira and Jane problem: why do you assume that Jane being late has an impact on Amira being late? From what we know, one could come from the Moon and the other from Mars. Did I miss something?
Good question. I don't assume that Jane being late had an impact on Amira (in the sense that one caused the other), but we have proved that the events are not independent. This means there is some relationship between one being late and the other being late. If they were truly independent of one another, we would see the property that P(A and J)=P(A)P(J). Since we don't see this, they are not independent. So this could be that Jane being late caused Amira to be late, it could be that Amira being late caused Jane to be late, or it could be some other thing caused both to be late. Eg one is coming from the moon and one is coming from Mars, and a meteor shower caused both rockets to do an extra loop before landing!
38:44 I don't understand how P(Taking pill given they are genius) = 0.66 translates to number of taking pill is double the number who don't take the pill. Please explain this part.
Question: at 34:18 The question is if a child take the pill then how likely they will be a genius. Why at 35:25, it is that being a genius is a given? shouldn't it be P(genius | pill)?
I'm not sure if I agree with your solution at 20:13. This works assuming conditional probability (or Jane being late is influenced by Amira). But if these were independent events then the answer would just be 20%. If you know nothing about Jane and Amira, is it rational to assume it's conditional or independent? Lastly does causal reasoning play a part here? Thanks :)
You are absolutely right that if the events were independent then the probability of Amira being late would just be 20% still. However, the information given in the question actually proves they are not independent. If we knew nothing about them then it's hard to say if it's rational to assume independence. But we DO know something about them: thr information given in the question. So this shows their lateness is not independent of each other. But this does not indicate causality. It might be that Jane's lateness causes Amira's, or the other way around. Or maybe something else causes both (eg maybe they catch the same bus and when this arrives late they are often both late).
@@woodyrow Thanks for your reply! I only just discovered your channel and I really like it. I've been going through examples slowly to build intuition. What if Janes lateness was always caused by bus delays, and Amiras lateness was always caused by bad weather. In this case, they would both be independent because there's no way Janes lateness could depend on Amiras lateness and visa versa. Why does the information in question prove that they're dependant on each other? Is it because of the "70% neither of them is late" fact?
@@matthewjames7513 Sorry for the slow reply! If Jane's lateness was always caused by the bus, and Amira's was caused by bad weather, and these didn't interact, then we might expect the events to be independent. However, the probabilities absolutely prove they are not! Remember, the formula P(A&B) = P(A)P(B) is the necessary and sufficient condition for events to be independent. Look at the conditional calculation for P(Amira late given Jane is late). In general, Amira is late 20% of the time. But when Jane is late, Amira is late 60% of the time! Here's what this means: If I just asked you "what is the probability that Amira is late?", you should say "20%". However, if I also tell you that Jane is late, you should say "60%". That is, if you learn that Jane is late, you change your view about the likelihood of Amira being late. This is what it means for events to be not independent. If one happens, you adjust your view on the other. These figures (20% vs 60%) emerge mathematically from the Venn diagram. And they tell us the events aren't independent. I hope that all makes sense! I'm really glad you enjoyed this video by the way. Thanks for the questions!
Great session. A doubt : in calculating Bayesian prob (~1:07), the prob of E/H (numerator) has not been multiplied by prior prob (presumably 0.5). Am I missing something here ?
Hi. Excellent questions. You are right to wonder about this, and to keep things simple I COMPLETELY ignored the subtle details. Firstly, the prior for this shouldn't be 0.5, since we just don't have any evidence at all before the game starts. So assuming 0.5 at the beginning artificially skews results. In technical terms, this is known as a "nuisance parameter", and we essentially get to ignore it and just integrate. Check out this article for a full explanation of why: jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/
Hi Woody, Thanks for this lesson! It is very useful. It's quite challenging to get rid of the frequentist mind after spending my entire life as so, though. I just a have a question: in the simulation, why did you calculated the last probability by using the number of rounds won by either Alice or Bob and not by the number of rounds (10000)? That's how simulations usually work. So I'll be more than grateful if you could help me out here with this doubt.
Hi Jessica. We are trying to work out the probability of someone winning from a position where they are losing (by 5 points to 3). So we simulate and find all the situations where someone was losing by this score, and then see (out of these!) how many times they go on to win. If we divided by 10000 we would be working out the probability that someone plays the game, falls 5-3 behind and then goes on to win, which is a different question. Our question was: IF someone is already 5-3 behind, what is the probability that they win. To use the lingo, we want to probability they win "given" that they are 5-3 behind. Hope that makes sense!
Thank you for the video. I started to see the real power of Bayesian statistics only after watching this video. In the final example (56:45), the problem was first solved the frequentist way, yielding an incorrect probability of 0.05. Then the problem was solved the Bayesian way, yielding a correct probability of 0.09. I think the frequentist logic is wrong because it consider Bob's chance of winning a point always 3/8. In reality, Bob's chance of winning a point follows a distribution with the mean of 3/8. It is possible to reach the evidence from many values of Bob's chance of winning. It does not have to be 3/8. Is this a valid explanation?
I've some doubt about the simulation. You just assume the line and ball place randomly with normal distribution. What if bob is really suck on throwing balls so his ball are always on the side of the table and therefore get to the score of 5::3?
Then you should use a different prior probability distribution- one in which the odds between Alice-wins-the-round vs Bob-wins-the-round is always greater than 1. And to be fair he didn't assume that, it was given as random in the question itself.
Thats a great Video Bro! I am doing an Essay right know, comparing frequentist and bayesian approaches. In the example with the Genius and the Red Pill, what would be an frequentist approach? is there any?
Thanks! Glad you liked it. Frequentists would come to exactly the same conclusion for the Genius/Pill example. They agree with Bayes' theorem in all theoretical settings such as this. The disagreement is in two places: 1) a philosophical disagreement about what probability means, and 2) In situations where the parameters are not known, like the billiards example at the end. In situations where the parameters are known, like when dealing with a known normal distribution, Bayesians and Frequentists agree. These examples are there to show how to work with conditional probability in a range of cases. Thanks for the question!
Correction @26:49 'we therefore know that there's a 95% chance that he will NOT lose his job ' instead of 'we therefore know that there's a 95% chance that he will lose his job '
I used a normal distribution calculator setting the mean as 15.5, the standard deviation as 1.2 and calculating the probability or a result less than 14. I did it either using excels norm.dist function or using my calculator. Can't remember! You can also use websites like this: onlinestatbook.com/2/calculators/normal_dist.html
Yes, but those are different races. The strict frequentist approach is to consider the very same event occurring many times. Makes sense for a dice roll, which can essentially be the same every time. Makes no sense with a horse race, since races on previous days with different horses/weather/ground are too different. But good question. And yes, in reality this is what people do. But this is people being bayesians!
Also as a question regarding your simulation: thats pretty evident, but WHY is bayesians answer more precise? isnt it that the frequentist approach also argues with the law of large numbers?
Not just more precise - it's the only correct answer in this case! The frequentists approach goes wrong when the underlying parameters (e.g. the probability of Bob winning any given game) are not known. Frequentists assume that there is a fixed answer to what this is, and use the available data to determine what they think it is. In this case, they assume it is 3/8. Bayesians don't assume this is fixed, but think there is a distribution of different probabilities that could have explained the data. Depending on how technical you want to get with this, you could check out this article here: www.countbayesie.com/blog/2021/4/27/technically-wrong-when-bayesian-and-frequentist-methods-differ. There's also a good discussion here: stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english. These get quite technical though. Thanks for the question.
I guess what u are referring to is the part where he derive the equation for P(a = 5 \land b = 3 | bob win)*P(bob win) on 1:07:49? This probability simplify to P((a = 5 \land b = 3) \land bob win), which should be equivalent to P(a = 5 \land b =6): this is the only state under which bob could win.
I am only 4:30 (min/sec) into the video - I will complete the session; however, futurely, die is singular for dice. It's a little unsettling calling a die, dice. ☺️☺️☺️ I will return to this comment for my overall objective opinion- although, to date, I am a Frequentist.
I'm a modern man - "dice" can be both plural and singular (grammarist.com/usage/dice-die/) 😉 "Die" sounds unnatural and unsettling to me! Hope you enjoy the rest of the video. Let's see if I can convince you bayesianism is the way to go!
They keep playing this game and recording who wins each time. Once a player has won 6 times they win the whole game. Think of it like tennis! The first player to win 6 games wins the set. It's a bit like that.
I have a slight gripe in the disease example, false negatives are independent of false positives and that should be made explicitly clear. Even if it’s just stating the false positive probability and not inferring it without comment.
Absolutely right. False positives and false negatives are almost never equally likely. And fair point, it would have been better to just mention this, even if I kept things as they are to keep the the maths simple.
This introduction to Bayesian Statistics is simply the best available on RUclips. I cannot imagine why it got 89,000 views but only 2400 likes. I think it is because many persons looked at the comments and thought it didn't need praising.
That is unfortunate The comments are a PRIOR, and every added comment adds evidence to create a better posterior.
I forgot to add one thing. Perhaps an unfortunate effect of the lack of commendation has been that Woody Lewenstein has not added to his Bayesian videos.
Tragic.
There is so much more that he could have put into the second video in this playlist.
This is the best description of Bayesian statistics I've seen
Amazing explanation. I've known and used Baye's theorem for a long time, but never saw thw whole picture until now. Thanks so much
Excellent. In general I leave comments 0.5% of the time. But when I think something is really superb I always leave a comment. What is the chance I thought Woody’s tutorial was really superb?
Thanks! Glad my video made the cut!
To answer your question: we can't know. You didn't give enough information. All we know is that you leave comments only 0.5% of the time, and that you leave a comment 100% of the time when you find the video is "really superb". But we don't know at which rate (r) you leave comments on video that you don't find "really superb". So all we can say is that among the videos you watch, you qualify only at maximum 0.5% (if the rate r would equal 0%) of them as "really superb", which is not much. Maybe the real conclusion is that you should stop watching so mediocre videos overall haha :p
great explanation, i like how you gradually introduced the different concepts
from probability -> conditional probability -> baby bayes -> bayes
Everything was explained perfectly. This video deserves more viewers and comments. Thank you so much for sharing with us.
As I wrote before this is the best ever explanation of the Bayesian statistics - THANK YOU VERY MUCH!!! and I am coming back to it every time to refresh this concept and vocabualry :)! When we talk about the numerator ( P(a=5, b=3 | Bob wins)*P(Bob wins) ) of the probability equation (that Bob wins given A=5 and B=3) at 1:07:20. Basically we can not really operate with these terms separately (P(a=5, b=3 | Bob wins) * 0.057) and have to merge them together into the distribution (y=(8x3)x^6(1-x)^5). Just trying to catch a moment and a point when it pivots from the frequentist to the bayesian :). In other words the point is that when we look at this formula as a math expression then we can camcel out the y=(8x3)x^3(1-x)^5 while left only with (3/8)^3 which then would be purely frequentist estimation of the probability of Bob winning (I do understand that in fact we have the areas under the graphs for the respective distributions in numerator and denominator). If you were so patient to having read till this point :) - what is confusing is the P(a=5, b=3 | Bob wins) part of the numerator which is hard to imagine...
The best tutorial on youtube that explains Bayesian Statistics so far! 🌷
Brilliantly taught! I thought the simulation was particularly interesting, especially as it shows that simulations can completely avoid having to perform the difficult integrals that crop up with more complex analyses. Thank you so much.
Brilliant explanation of Bayesian Basics. Best I have seen, easy to follow and understand.👍👍👍
What a greatly structured session. Learnt so much new stuff. Thanks.
The first lesson is everything! I finally understand the fundamental difference between Bayesian and frequentist statistics. 🎉 beautifully explained, thank you!
Thank you so much!
Wonderful teaching, earlier i found it difficult to understand the probability but now its seems easy. thanks prof.
This is a top shelf explanation of an intuitively difficult concept. Introducing Incremental complexity using examples is a superb teaching method. The visualization using Excel to break out the calculation stages was for me the icing on the cake. Thank you, Woody!
Thanks so much! Really glad you enjoyed it and thanks for the cmment.
I have been interested in Bayesian analysis for a few years and seen dozens of videos. This is the best video I have seen to learn the concepts. Thank you so much for producing and sharing this knowledge!
I'm thrilled to hear this! So glad you enjoyed it. Woody
@@woodyrow Most europeans have 100IQ, East asians 105, ashkenazis 112, african americans 85, sub saharan africans 70.
Thank you for this absolute gem of a lecture. I think it might tickle you to know how I approached the problem re Tanya's tennis and sunny days (23:00 ish).
As a healthcare professional, I'm much more familiar with sensitivity, specificity, positive/negative predictive values, prevalence etc, than I am with an equation for conditional probability. So for my first attempt at the problem, I basically framed the situation as "tennis" being a diagnostic test for "sunny", and drew a 2x2 table for "tennis" against "sunny"!
The sensitivity of the test is 80%, ie. 80% of "sunnies" had a positive "tennis" result. The specificity of the test is 65%, ie. 65% of "not sunnies" had a negative "tennis" result. The "prevalence" of sunny is 60% and not sunny is 40%. Therefore, I just had to solve for the positive predictive value of the "tennis" test for "sunny", by using the relative prevalence to "weight" the sunnies vs not-sunnies within the "tennis" group. Et voila, it yielded the exact same process of multiplication as using the cond probability equation, which I used for my second attempt.
I know this may sound like a much more complicated method, but seeing the probabilities in a table and applying concepts I already know truly helped me actually understand the multiplications within the conditional probability equation, rather than just solving for it blindly. This was a lightbulb moment for me. Thank you!
@@jayashrishobna really interesting! So nice to hear about how you went about solving this!
This is not one of the but the best video on Bayesian theory....thank you so much for doing this....
Thank you so much!
Excellent video! I needed to refresh my Bayesian statistics knowledge and this was a perfect start.
Glad it helped!
Much better explanations than the famous channel of StatQuest. Thanks a lot.
That's very kind of you to say, though I think in general StatsQuest is really good!
Thank you for presenting an informative and concise lecture. Time well spent for me.
This is the best explanation I ever saw.
I just love the flow of the lecture.
Thank you so much! Lovely to hear that!
This is excellent! Clear, concise and systematic. Best explanation I have seen of Bayes thus far.
This is really great class!!! Many thanks! You are a great teacher as you can put yourself into the shoes of a student and highlight the least clear connections !!! The graphics is awesome as well - very clear, down to detail not redundant!!! Really pity and weired that this channel did not attrackt too many subscribers !
Thanks so much!
@woodyrow why are we using the conditional formulas instead of the Bayes formulas that I know which is P = P(A/B) × P(B) / P(A). I am new to statistics. Please explain
Indeed your video is FANTASTIC and IMMENSILY helpful. Thanks!!!
Thanks! Glad you liked it!
So clear, excelllent video! Was very helpful 👏🏼👏🏼 Thank you!
Thanks so much! Really glad you enjoyed it!
Excellent breakdown of the topic! The final parts about simulating Bayes theorem in the Excel really drove the whole idea home really well.
Thanks for educating us mr. Lewenstein!
🎯 Key Takeaways for quick navigation:
00:00 *🎓 Introduction to Bayesian Statistics*
- Exploring Bayesian statistics from scratch.
- Suitable for anyone interested in probability and statistics, from students to professionals.
- Starting with fundamental questions about probability and its applications.
01:10 *🎲 Objective vs. Subjective Views on Probability*
- Contrasting objective (frequentist) and subjective (Bayesian) views on probability.
- Highlighting limitations of frequentist approach, especially for one-off events like horse races.
- Illustrating subjective Bayesian model's flexibility and rationality in handling uncertainty.
09:33 *📊 Degrees of Belief in Bayesian Probability*
- Bayesian probability as degrees of belief or uncertainty measures.
- Illustrating subjective probabilities through scenarios involving pregnancy and gender prediction.
- Emphasizing rationality in adjusting beliefs based on available evidence.
10:01 *🧠 Conditional Probability Basics*
- Introduction to conditional probability using simple visual examples.
- Building intuition for conditional probability through visualizations.
- Setting the stage for understanding Bayes' theorem.
13:19 *📝 Formulating Baby Bayes Theorem*
- Deriving a simplified version of Bayes' theorem using visual probability representations.
- Demonstrating application of the theorem in simple probability problems.
- Introducing notation and terminology for hypothesis and evidence probabilities.
20:20 *🌳 Bayes Theorem Application with Tree Diagrams*
- Applying Bayes' theorem to complex scenarios using tree diagrams.
- Solving probability problems involving multiple events and conditional probabilities.
- Demonstrating how evidence updates prior probabilities to yield posterior probabilities.
23:34 *📈 Bayesian statistics application example: Updating probability with evidence*
- Bayes' theorem updates the probability of an event given new evidence.
- Example: Given the probability of sunny weather and playing tennis, Bayes' theorem helps update the probability of sunny weather given that tennis was played.
- Demonstrates how prior beliefs are adjusted based on new information.
24:45 *📊 Bayesian statistics application example: Probabilistic analysis in economics*
- Scenario: Analyzing the probability of a recession given job loss using Bayes' theorem.
- Demonstrates the use of prior probabilities and conditional probabilities in economic analysis.
- Shows how Bayesian statistics can be applied to decision-making in economic forecasting.
29:22 *🏃♀️ Bayesian statistics application example: Probability distributions in sports*
- Example: Analyzing the probability of a girl running 100 meters in a certain time frame using normal distribution.
- Shows how Bayesian statistics is used to update probabilities based on additional information (e.g., being in the school running team).
- Illustrates how conditional probability influences the assessment of outcomes in sports.
33:19 *🧠 Bayesian statistics application example: Counter-intuitive results*
- Examines counter-intuitive outcomes using conditional probability in IQ distribution scenarios.
- Demonstrates how small changes in distributions can lead to significant shifts in probabilities.
- Highlights the importance of understanding conditional probability in interpreting statistical results.
41:42 *🦠 Bayesian statistics application example: Medical diagnosis*
- Examines a medical diagnosis scenario using Bayes' theorem.
- Illustrates how prior beliefs are updated based on diagnostic test results.
- Emphasizes the significance of understanding conditional probability in medical decision-making.
48:11 *📊 Understanding Bayes Theorem through an Example*
- Explains the application of Bayes Theorem using an example involving Steve, a shy individual, to illustrate how prior probabilities and evidence combine.
- Demonstrates how intuition can be misleading when prior probabilities and evidence are not considered.
- Breaks down the calculation process step by step, showing the application of Bayes Theorem in determining the probability of Steve being a librarian given certain traits.
51:42 *📈 Formal Naming and Components of Bayes Theorem*
- Defines the formal components of Bayes Theorem: prior, posterior, likelihood, and evidence.
- Illustrates the terminology used in relation to each component, such as "prior" for the initial probability, "posterior" for the updated probability, "likelihood" for the probability of evidence given a hypothesis, and "evidence" for the total probability of the observed evidence.
- Provides insights into the significance of each component in Bayesian inference and decision-making processes.
56:36 *🔍 Exploring a Complex Example: Bayesian Approach vs. Frequentist Approach*
- Introduces a more complex example involving a game between Alice and Bob to compare Bayesian and frequentist approaches.
- Contrasts the frequentist method, which relies on straightforward calculations, with the Bayesian method, which involves applying Bayes Theorem to update probabilities based on evidence.
- Demonstrates how Bayesian inference can provide more accurate predictions by considering prior probabilities and updating them with observed evidence, even in complex scenarios.
48:40 *📊 Bayesian Intuition: Steve's Occupation*
- Daniel Kahneman presents a scenario about Steve, a shy and tidy individual, posing the question of whether he's more likely to be a farmer or a librarian.
- Despite intuitive judgment favoring Steve being a librarian, Bayesian analysis challenges this assumption by considering the proportion of farmers and librarians meeting Steve's criteria.
- Applying Bayesian theorem, the analysis shows that Steve is more likely to be a farmer, emphasizing the importance of considering the base rate in making probability assessments.
51:28 *📈 Bayesian Terminology: Understanding Bayes Theorem Components*
- Prior: The probability of a hypothesis before considering any new evidence.
- Posterior: The probability of a hypothesis after considering new evidence.
- Likelihood: The probability of observing the evidence given that the hypothesis is true.
- Evidence (Marginal Likelihood): The probability of observing the evidence, accounting for both scenarios where the hypothesis is true and where it's not.
56:36 *🎲 Bayesian Approach: Alice and Bob's Game*
- Illustration of a game between Alice and Bob where a ball is randomly placed on a table, dividing it into two sections, with each player scoring points based on where the ball lands.
- Contrasting frequentist and Bayesian approaches in assessing Bob's probability of winning the game.
- Through simulation, Bayesian analysis consistently yields a higher probability of Bob winning compared to the frequentist approach, demonstrating the Bayesian method's reliability in probabilistic assessments.
Made with HARPA AI
Woah! Quality stuff and your examples added more to grasp the intuition underlying these Bayesian concepts. Regards from Pakistan
Thanks so much!
I am from Brazil. Excelent explanation, very good job. Thank you and congratulations.
Thanks so much! Glad you enjoyed it.
@@woodyrow Thank you for your answer. I just subscribed to your channel.
Well done! Among the best of the best what I have watched so far on Bayes theorem. I suggest you delve further with this lessions into Bayes stats in future vedios
Thanks so much! Yes I'd love to do a second part where I go deeper into this.
@@woodyrow great, am looking forward to it and sharing to my colleagues students as well
Outstanding explanations.
Thank you.
Excellent explanation!!!
This video deserves more likes.
32:30 minor nitpick but I would say it is higher than 45% because running faster would make someone more likely to be on the running team. Good lecture so far!
Thank you so much for this tutorial. Very clear and with very interesting examples, I am so glad i found this channel
Thank you for teaching us trash goblins! We are forever in your debt
This is my best class ever
Thanks Rohan! Glad you like it!
Very nice, thanks for the lecture
Bro that was a good tutorial. Learnt alot
Excellent simulation with Excel functions for Bayesian estimates !!
Thank you! Glad you liked it.
This was an awesome video; really appreciate it!
Great! Really enjoyed learning this. Thank you
Thanks so much Justin!
Excellent lesson - thanks!
Thanks very much!
Great !!!
Great effort💗
Keep making more videos on this topic.
Thanks! I will try to do more!
Amazing video! Such clarity & presentation! Thank you! Learned a lot!
Thanks so much Abhishek! So nice to hear that.
Well done Woody!
Great post mate,keep it going.
This was a great intro and I enjoyed it! Thank you.
Excellent presentation. Wouldn't the Monty Hall Problem be an example of where using Bayes would be helpful? The update info would be that the host, Monty Hall will switch to a door with a goat. The setup is this: Monty Hall is the host of a TV Show, where a contestant must choose one of three doors , where there are goats behind two of the doors, and a car behind the other door. If the contestant chooses the door with the car, they get to keep it. If they choose a door with a goat behind it, they're out. The additional info is that once the contestant selects one of the doors, Monty will stop the show, open up one of the doors containing a goat, and proceed to ask the contestant if they'd prefer to switch to another door. Then the question is whether it is a good idea for the contestant to switch. Answer is yes, given by choosing at random, uniformly, the contestant will have initially chosen the car only 1/3 of the time, and one of the two goats 2/3 of the time. So, 2/3 of the time, contestant will have made the wrong choice, and will improve the odds by switching. I hope this isn't too confusing.
Exactly! Yes, the Monty Hall problem is a very good example of Bayes' theorem in practice!
@@woodyrowHi again, if I may ask a more general question in Statistics: Once a hypthesis test has been setup: Difference of Means, Proportion, Anova, etc. Is there a "Natural" way of defining the statistic to be used to determine whether we reject or don't reject the Null Hypothesis?
Great question@@fernandojackson7207 . I suppose overall I'd say there is not a general natural way of doing this. In practice, you learn when and where to use each one through practice.
I think the best I could do is maybe to suggest thinking through the following:
1. Type of Data: The nature of the data (e.g., categorical vs. continuous, paired vs. independent samples) significantly influences the choice of the test statistic. For example, a t-test is appropriate for comparing the means of two independent samples of continuous data, whereas a chi-square test is used for categorical data.
2. Hypothesis Being Tested: The hypothesis itself (difference of means, proportion, variance, etc.) guides the choice of the statistic. For instance: For testing differences between means, you might use a t-statistic in a t-test.
For proportions, a z-statistic might be used in a z-test.
For comparing variances, an F-statistic is used in ANOVA (Analysis of Variance).
3. Assumptions Underlying the Statistical Test: Each statistical test comes with its own set of assumptions (e.g., normality, homogeneity of variances, independence). The choice of statistic is contingent upon whether these assumptions are met. For example: A t-test assumes normally distributed differences, but if this assumption is violated, a non-parametric test like the Mann-Whitney U test might be more appropriate.
ANOVA assumes homogeneity of variances among groups; if this is not met, you might use a Welch's ANOVA instead.
4. Design of the Study: The study design (e.g., matched pairs, blocked designs) also influences the choice. For matched pairs, a paired t-test uses the differences within each pair as the data for analysis.
But beyond these sorts of observations, it's often just the case that through experience you begin to recognise what is appropriate in which situations. Hope that helps a little!
@@woodyrow Thank you so much for the explanation, Woody.
Great lesson! Thank you. Keep up the good work.
41:28 similar to IQ distributions for men and woman
Incredibly cool stuff. You're a great teacher, thank you so much for this
Thank you so much!! Glad you liked it!
@woodyrow great video 👍 the only thing not clear for me is how you derived the 1/9 denominator at 1:05, how did you calculate it? Can you please refer to a resource exploring the formula in detail 🙏
@@nassersaed4993 glad you like it! The 1/9 is calculated using integration around 1:05. Watch that section and see if it makes sense. You'll need to know about integration though, which is a pretty big topic.
This is so under rated
Pls Make some more videos like this on stats topics . Thank You Sir for this wonderful explanation.
This is so good!
Woody,
Thanks for an outstanding high-quality video. I was quite happy until the very end when you did your simulation in Excel. That is not reproducible (even though this is pseudo-random) as it would have been in a program like R (or python). There are indeed many packages in R that do Bayes but the recent addition "bayesrules" takes the cake. It is a simple and user-friendly packes that is worth taking a look at.
Thank you for this video. What I missed is how you calculated the probability in your first example for normal distribution. You just said look at the area under the curve and you said calculate and get the value.
Calculating probabilities with the normal distribution essentially involves using a computer or a calculator to find the area. While there is a (complicated) function for the normal distribution, there is no formula for computing the area under the curve. you have to just use a computer/calculator to find it. Before computers, it just had to be approximated by hand!
Great video, thank you. I feel like the pill sample calculation is not correct. If there is a very tiny change in mean like 100 -> 100.0000000001, the calc made(from the video) will yield almost 50% that the genius has taken the pill which seems incorrect. I feel like it should be [P(102)-P(100)]/P(100) - I may be wrong but the original calc seems wrong to me. Also the number of zeros for calculated probability should be 5 instead of 6 zeros. Thanks
Great tutorial. Thanks, man.
This is so clear. Thank you so much!
So glad you enjoyed it!
Thank You. So helpful!
Thanks! Glad you found it helpful!
thanks for the lecture! well explained!
excellent lecture
Best explanation! Finally I get it😂
Thanks so much! So pleased to hear this.
34:16 "How much more likely is it that your child will be a genius if they take this pill?"
Probability ( Took a Pill | Given Is a Genius )
Why is it Probability ( Pill | Genius ) and not Probability ( Genius | Pill )
Probability ( Took Red Pill when as a child | Were going to be a genius when taking pill )
Great session sir! Just one que: Why did you take rand() < a particular column.. why not >?
Hi. Sorry for the slow reply! It's just a handy way to get a random variable based on a specific probability. If I say rand()
How can I simulate the billiards example in R?
I did it myself. Actually I was doing it correctly. I was getting the distribution of the bernoulli probability of x as 0.4 which is correct based on the solution given in BDA ie (k+1)/(n+2) which is (3+1)/8+2 = 0.4. Also the MAP estimate was 3/8. My mistake was that I was taking the point estimate and cubing it. In reality I had to play 3 bernoulli trials with the obtained value of theta and see which one is actually winning using the uniform rng like you did in excel. After doing that, I can see the mean of quantity generated to be 0.09 and SD of 0.29
Thanks for this great into. However, I got a bit confused around 1:06 when you evaluate the numerator. You even make the remark "I simplify a bit ...". A then, it seems you actually evaluate the numerator as "P(a=5, b=3).P(Bob wins)" , instead of "P(a=5, b=3| Bob wins).P(Bob wins)". What exactly was simplified here? Is indeed "P(a=5, b=3)" equal to "P(a=5, b=3| Bob wins)" ?
Great question. This is a slippery part of the argument and you're right to ask about it.
Essentially you are right, "P(a=5, b=3)" does equal "P(a=5, b=3| Bob wins)", or at least, the expressions in terms of x are the same.
If I ask you what's the probability of alice having 5 and bob having 3, the answer is (8C3)*x^3*(1-x)^5. If I told you that bob actually went on to win, the probability of alice having 5 and bob having 3 after 8 rounds is still (8C3)*x^3*(1-x)^5. The difference is that you might need to adjust your thinking about what x was. But this is all taken care of by the integration and division.
Hope that helps clarify a bit!
I think there is a mistake in calculation at 28:58 P(E) = 0.445 and not 0.085 (In question related to recession and losing job)
I'm not seeing the error. P(E) = p(he loses his job). We calculate this by saying its either that there is a recession and he loses it, or there's not a recession and he loses it. The calculation for that is: 0.1x0.4 (recession and he loses his job) + 0.9x0.05 (no recession and he loses his iob) =0.085. Let me know if that makes sense of if you think there is an error with that.
@@woodyrow Understood sorry my mistake Thanks for the reply
Thanks for the interesting and clear video! I have a doubt about Amira and Jane problem: why do you assume that Jane being late has an impact on Amira being late? From what we know, one could come from the Moon and the other from Mars. Did I miss something?
Good question. I don't assume that Jane being late had an impact on Amira (in the sense that one caused the other), but we have proved that the events are not independent. This means there is some relationship between one being late and the other being late. If they were truly independent of one another, we would see the property that P(A and J)=P(A)P(J). Since we don't see this, they are not independent. So this could be that Jane being late caused Amira to be late, it could be that Amira being late caused Jane to be late, or it could be some other thing caused both to be late. Eg one is coming from the moon and one is coming from Mars, and a meteor shower caused both rockets to do an extra loop before landing!
Excellent!
38:44 I don't understand how P(Taking pill given they are genius) = 0.66 translates to number of taking pill is double the number who don't take the pill. Please explain this part.
Wow that was very clear and engaging
Thanks!
Question: at 34:18 The question is if a child take the pill then how likely they will be a genius. Why at 35:25, it is that being a genius is a given? shouldn't it be P(genius | pill)?
I'm not sure if I agree with your solution at 20:13. This works assuming conditional probability (or Jane being late is influenced by Amira). But if these were independent events then the answer would just be 20%. If you know nothing about Jane and Amira, is it rational to assume it's conditional or independent? Lastly does causal reasoning play a part here? Thanks :)
You are absolutely right that if the events were independent then the probability of Amira being late would just be 20% still. However, the information given in the question actually proves they are not independent. If we knew nothing about them then it's hard to say if it's rational to assume independence. But we DO know something about them: thr information given in the question. So this shows their lateness is not independent of each other. But this does not indicate causality. It might be that Jane's lateness causes Amira's, or the other way around. Or maybe something else causes both (eg maybe they catch the same bus and when this arrives late they are often both late).
@@woodyrow Thanks for your reply! I only just discovered your channel and I really like it. I've been going through examples slowly to build intuition.
What if Janes lateness was always caused by bus delays, and Amiras lateness was always caused by bad weather. In this case, they would both be independent because there's no way Janes lateness could depend on Amiras lateness and visa versa. Why does the information in question prove that they're dependant on each other? Is it because of the "70% neither of them is late" fact?
@@matthewjames7513 Sorry for the slow reply! If Jane's lateness was always caused by the bus, and Amira's was caused by bad weather, and these didn't interact, then we might expect the events to be independent. However, the probabilities absolutely prove they are not! Remember, the formula P(A&B) = P(A)P(B) is the necessary and sufficient condition for events to be independent.
Look at the conditional calculation for P(Amira late given Jane is late). In general, Amira is late 20% of the time. But when Jane is late, Amira is late 60% of the time! Here's what this means: If I just asked you "what is the probability that Amira is late?", you should say "20%". However, if I also tell you that Jane is late, you should say "60%". That is, if you learn that Jane is late, you change your view about the likelihood of Amira being late. This is what it means for events to be not independent. If one happens, you adjust your view on the other. These figures (20% vs 60%) emerge mathematically from the Venn diagram. And they tell us the events aren't independent.
I hope that all makes sense! I'm really glad you enjoyed this video by the way. Thanks for the questions!
Do you also have videos on binomial probability? Or perhaps you know of a good introductory course or book?
Great session. A doubt : in calculating Bayesian prob (~1:07), the prob of E/H (numerator) has not been multiplied by prior prob (presumably 0.5). Am I missing something here ?
Hi. Excellent questions. You are right to wonder about this, and to keep things simple I COMPLETELY ignored the subtle details. Firstly, the prior for this shouldn't be 0.5, since we just don't have any evidence at all before the game starts. So assuming 0.5 at the beginning artificially skews results. In technical terms, this is known as a "nuisance parameter", and we essentially get to ignore it and just integrate. Check out this article for a full explanation of why: jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/
really enjoyed
Thanks so much! Really glad you enjoyed it!
Hi Woody, Thanks for this lesson! It is very useful. It's quite challenging to get rid of the frequentist mind after spending my entire life as so, though. I just a have a question: in the simulation, why did you calculated the last probability by using the number of rounds won by either Alice or Bob and not by the number of rounds (10000)? That's how simulations usually work. So I'll be more than grateful if you could help me out here with this doubt.
Hi Jessica. We are trying to work out the probability of someone winning from a position where they are losing (by 5 points to 3). So we simulate and find all the situations where someone was losing by this score, and then see (out of these!) how many times they go on to win. If we divided by 10000 we would be working out the probability that someone plays the game, falls 5-3 behind and then goes on to win, which is a different question. Our question was: IF someone is already 5-3 behind, what is the probability that they win. To use the lingo, we want to probability they win "given" that they are 5-3 behind. Hope that makes sense!
notsoErudite has sent her trash goblins. Ty sir woody
Thank you for the video. I started to see the real power of Bayesian statistics only after watching this video. In the final example (56:45), the problem was first solved the frequentist way, yielding an incorrect probability of 0.05. Then the problem was solved the Bayesian way, yielding a correct probability of 0.09.
I think the frequentist logic is wrong because it consider Bob's chance of winning a point always 3/8. In reality, Bob's chance of winning a point follows a distribution with the mean of 3/8. It is possible to reach the evidence from many values of Bob's chance of winning. It does not have to be 3/8. Is this a valid explanation?
I've some doubt about the simulation. You just assume the line and ball place randomly with normal distribution.
What if bob is really suck on throwing balls so his ball are always on the side of the table and therefore get to the score of 5::3?
Then you should use a different prior probability distribution- one in which the odds between Alice-wins-the-round vs Bob-wins-the-round is always greater than 1. And to be fair he didn't assume that, it was given as random in the question itself.
Wow, thank you.
Thats a great Video Bro! I am doing an Essay right know, comparing frequentist and bayesian approaches. In the example with the Genius and the Red Pill, what would be an frequentist approach? is there any?
Thanks! Glad you liked it. Frequentists would come to exactly the same conclusion for the Genius/Pill example. They agree with Bayes' theorem in all theoretical settings such as this. The disagreement is in two places: 1) a philosophical disagreement about what probability means, and 2) In situations where the parameters are not known, like the billiards example at the end. In situations where the parameters are known, like when dealing with a known normal distribution, Bayesians and Frequentists agree. These examples are there to show how to work with conditional probability in a range of cases. Thanks for the question!
Thank you!
My pleasure!
Correction @26:49 'we therefore know that there's a 95% chance that he will NOT lose his job ' instead of 'we therefore know that there's a 95% chance that he will lose his job '
Good spot! 4 years and I never noticed this!
can someone explain why the pool problem isnt 3/8 * 4/9 * 5/10 = 1/12?
since the data points are updated per point scored
it also correctly hovers 8%-9% so the simulation doesnt fully disprove it
well explained
For the running question with probabilities represented as areas how did you compute P(T < 14) as 0.106? (32.10 mark)
I used a normal distribution calculator setting the mean as 15.5, the standard deviation as 1.2 and calculating the probability or a result less than 14. I did it either using excels norm.dist function or using my calculator. Can't remember! You can also use websites like this: onlinestatbook.com/2/calculators/normal_dist.html
Wouldn't frequentist use the races THAT ALREADY OCCURRED and esimate BayesCamp chances of winning? That seems both practical and intuitive.
Yes, but those are different races. The strict frequentist approach is to consider the very same event occurring many times. Makes sense for a dice roll, which can essentially be the same every time. Makes no sense with a horse race, since races on previous days with different horses/weather/ground are too different.
But good question. And yes, in reality this is what people do. But this is people being bayesians!
Also as a question regarding your simulation: thats pretty evident, but WHY is bayesians answer more precise? isnt it that the frequentist approach also argues with the law of large numbers?
Not just more precise - it's the only correct answer in this case! The frequentists approach goes wrong when the underlying parameters (e.g. the probability of Bob winning any given game) are not known. Frequentists assume that there is a fixed answer to what this is, and use the available data to determine what they think it is. In this case, they assume it is 3/8. Bayesians don't assume this is fixed, but think there is a distribution of different probabilities that could have explained the data. Depending on how technical you want to get with this, you could check out this article here: www.countbayesie.com/blog/2021/4/27/technically-wrong-when-bayesian-and-frequentist-methods-differ. There's also a good discussion here: stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english. These get quite technical though. Thanks for the question.
But aren't you assuming that P(Bob wins) is 100%? So you start with a very strong prior. Or where I am wrong? ;-)
I don't think I ever assume this, but can you let me know at which point you think I might have?
I guess what u are referring to is the part where he derive the equation for P(a = 5 \land b = 3 | bob win)*P(bob win) on 1:07:49?
This probability simplify to P((a = 5 \land b = 3) \land bob win), which should be equivalent to P(a = 5 \land b =6): this is the only state under which bob could win.
I am only 4:30 (min/sec) into the video - I will complete the session; however, futurely, die is singular for dice. It's a little unsettling calling a die, dice. ☺️☺️☺️
I will return to this comment for my overall objective opinion- although, to date, I am a Frequentist.
I'm a modern man - "dice" can be both plural and singular (grammarist.com/usage/dice-die/) 😉 "Die" sounds unnatural and unsettling to me!
Hope you enjoy the rest of the video. Let's see if I can convince you bayesianism is the way to go!
@@woodyrow Touché
😀😀😀
I made it safely through it.
If it is possible then I will give it 100 like. Thank you very much.
Thank you so much! A hundred times!
What is first to 6 wins in the game means
They keep playing this game and recording who wins each time. Once a player has won 6 times they win the whole game. Think of it like tennis! The first player to win 6 games wins the set. It's a bit like that.
I have a slight gripe in the disease example, false negatives are independent of false positives and that should be made explicitly clear. Even if it’s just stating the false positive probability and not inferring it without comment.
Absolutely right. False positives and false negatives are almost never equally likely. And fair point, it would have been better to just mention this, even if I kept things as they are to keep the the maths simple.
very cool