Dan - watching your videos helped me land a data scientist position at a FAANG company. So grateful for your knowledge and your ability to share it with a wide audience :)
This is the best video I’ve seen about A/B testing. This is exactly a day to day DS work in tech companies. I very appreciate your sharing and looking forward for your new videos. Wishes you all the best!
This is a great video step-by-step explanation and logics through the A/B testing setup! I am a recent DS bootcamp graduate and was struggling with this topic and this video helped tons! Thanks Dan
I was asked a similar A/B testing question on a Data Science interview with LinkedIn. Had I seen this video a week prior, I would’ve made it to the 3rd round. Better late than never. Great content! I just subscribed.
Hmmm, revenue per day per user. That seems much more variable than conversions per user sessions, which is binomial with bounded variance. Especially for clothing on an ecommerce site that isn't Amazon. Can you help me understand why you chose this as the primary success metric?
It depends on the Algorithm. If the change is related to recommendation Algorithm - Probably AVG Revenue is a good fit. If the change is related to UI, probably Sessions time would work
Excellent video. Should the candidate list the guardrail metrics way before step 7 (i.e., step 1) so those are known & agreed upon before the test starts ?
hi dan, first of all thanks for all the great work. Couple of questions on my end: -In another video (spotify case) you mentioned success metric shouldnt be top line such as retention. However, most PMs would like to see improvement in KPIs like retention. If success metric isnt something like retention, how do we guarantee the link between success metric and topline KPI (such as does improvement in CTR also improve retention in long/short term? -another question is say we saw a signicant change in country a, but no change in country b. Since unified product would be more desirable(it doesnt make sense to roll out different versions of product to different countries) how do we make the final decision here?
Hi Dan, Thanks for the video. 1. Isn't it inappropriate to compare the orders per user per day of treatment vs control? What if a big proportion of the treatment group do not use the treatment (don't use the searchbox driven by the new search algorithm)? 2. How do you define orders/user/day? is it = (total orders)/(total users x days run) or do you define the statistic at a user level and generate the distribution of means? 3. How would you assess the influence of position and popularity biases on your results?
Nice questions, now I am also curious to know the answers On proportion of groups not using search box, my thought is, we may take care of that by checking the stage at which user is entering in funnel
This video literally made me subscribe to you channel Dan! I have gone through countless materials and research papers on experimentation and by this is one of the best summary of the steps! Great work and I am excited to see more videos from you(especially around pitfalls for multiple metrics and more)!
Great video and summary. Just note that you've made a subtle mistake in explaining the novelty effect. The novelty effect in interpreting A/B test results refers to the impact that the introduction of a new feature or change may have on user behavior simply because it is new or different, and it's not about differentiating between new and existing users.
🎯 Key Takeaways for quick navigation: 00:00 *🎯 AB testing is vital for data science interviews at top companies.* 01:09 *🛠️ Seven steps of AB testing: Understand problem, define hypotheses, design experiment.* 03:00 *🛍️ Example: Designing experiment for an online store, clarify success metrics.* 05:31 *📊 Success metrics must be measurable, sensitive, and timely.* 08:00 *🧪 Establish hypotheses, set significance level, plan experiment.* 09:38 *📐 Design experiment: Determine sample size, run for 1-2 weeks.* 11:30 *🕵️ Check validity: Perform sanity checks, ensure randomization balance.* 14:44 *📈 Interpret results: Analyze metric direction, P-value, and confidence interval.* 17:23 *💡 Make decisions: Evaluate scenarios based on lift and confidence intervals.* Made with HARPA AI
Looks like it’s just for illustrative purposes. No data is provided, so of course there is no real statistical test. It’s a demonstration of what someone might encounter in this style of interview question.
*Please upload more A/B testing videos. This topic is too confusing. Could you please upload a video explaining the power of hypothesis test and how to choose the sample size for an A/B test?*
This video is awesome and really helpful if anyone wants to learn AB testing but doesn't have the time to finish the entire popular Udacity course. Thank you for making it so clear and interpretable!
I liked the practical representation with steps in this Video. It is very nicely done but few of the very important points are mentioned by words. This may need to take up as additional one line to slides incase of Success Metrics, Launch Decision and others. Cool one.. 🙂
@datainterview This is an amazing review of A/B testing in interviews! Quick question, if we have a minimal sample size value calculated, why do we also determine a duration of an experiment? I.e. if sample size is calculated to 500 users with searches per group, shouldn't the test duration be how long it takes a business to reach 500 users during the test period? And not an arbitrarily chosen time such as 1 to 2 weeks?
Dan we can see that we have chosen revenue_day_per_user as a success metric it will happen when the user makes a purchase so our target population should be the user who purchase right not the user who search then in this case should be the user who purchase then we will be able to get the revenue_day_per_user if we are concerned for the user who search then I think CTR can be a good metric to calculate or conversion rate that can also be good can please advise on this part because i think all the users who searches will not purchase. Some may drop at browse for item
Excellent content, I would appreciate just a slight improvement. When you've assumed you want an alpha of 5% (in this case 2.5 per side) and you've also determined the beta. There is a different equation that gives you an output of how many people the experiment should be performed according to those assumptions. i don`t now wich one is better but from what i see is that yours doesn`t take into account the d of the experiment more importantly, you can`t always use the sigma, it depends on several cases like the size of the test, independency of the samples and his unbiased estimator the variance could be sigma OR S^2 OR sp OR SD n>=(Sigma * (Z(1-a) + Z(1-Beta) / miu0 - miu1)^2
Wouldn‘t it violate the independent assumption if we use revenue per user per day for a/b testing? Since a high portion of low engaged users would just keep not buying anything.
Hey Dan, really great video. One question. What will be the possible metric if we want to test a recommendation system for a fee i.e. for items shown on homepage rather than when search happens.
Thank you for the great content, really loved it. It will be great if you could have some videos around most commonly used statistical tests in depth and A/B testing pitfalls. Will keep watching!
Excellent content as always Dan! As an expansion on the topic, could you consider doing a video on variation reduction techinques (e.g., CUPED, MLRATE) ?
Hey Dan, can you talk about the decision to pick revenue-per-user-per-day? Doesn't using this metric require the delta method or bootstrapping if randomizing by user? Thanks!
I recently failed google ds interview in AB Testing ; Feels really bad + next attempt would be after a year which makes me even sad. So yeah if sitting for FAANG always look into AB testing and statistics
Can the alternative hypothesis not be revenue being higher in the target group? And do a one tailed test? It doesn't make sense to change the algorithm for a business to make the same amount of money
@datainterview Wouldn't the click through rate or number or products purchased per user per day be a more effective success metric? If it is revenue per user per day, what if the control group purchases only one high priced product whereas treatment group purchases multiple low prices products, the recommendation system works but the revenue would be higher for the control group
What if we have same user making multiple purchases across the day? Does it make sense to have metric as revenue-per-session / revenue-per-search if we have multiple purchases by same user in a day, and metric is revenue per user per day, wont we have to change the randomization unit?
Thanks for the great video. I have a question w.r.t the sample size that you mentioned. With a 50:50 split on a website, there will be numerous sessions coming in. So, is the sample size the minimum number of sessions we need on each side to run a test. Or do we randomly sample X samples from all the incoming sessions, X being the sample size?
Hey Dan I am currently in my final semester of Data Science undergrad degree and am graduating a year early. I unfornately do not have any internships, but do have some projects. I am a little confused on what do after I graduate. Whether I should start applying to jobs or do a bootcamp. Please any advice will help.
Hi. Thanks for the Great video. At the moment I'm stuck with sample size, conversion rate during the test (let's say 2 weeks) and getting a histogram of the data. After 2 weeks I'll get one conversion rate. How can I get an histogram from only one data. "Inside" the CR there is a huge N (let's say 1000 users). I'm confused how can I extract the data from 1000 users to make a CR histogram. Could you help me out? Cheers
Welp, these videos tell me I am qualified to work at google in data science. Nevertheless, I don't see job openings in 2023. And I dropped out of school, so I've never been offered an interview despite my ability in statistics.
On the other hand, I'm not sure I really care about how effective ads are and getting users to make purchases. I'd rather study other things like whether or not certain types of jobs accelerate biological aging non-linearly.
Hi Dan, Very informative video! Would you have suggestions on how to select randomization units for B2B products like data bricks and slack, which have a many to 1 relationship between users and accounts?
Randomize at the account level - sort of like cohort-based randomization. So users within the sample cohort is exposed to the same variation condition.
Dan - watching your videos helped me land a data scientist position at a FAANG company. So grateful for your knowledge and your ability to share it with a wide audience :)
Can u please share me. What’s way you followed to land a job in FAANG please
Your English pronunciation is the most comfortable and tolerable throughout RUclips. Thank you so much for that. You saved my life.
This is the best video I’ve seen about A/B testing. This is exactly a day to day DS work in tech companies. I very appreciate your sharing and looking forward for your new videos. Wishes you all the best!
This is a great video step-by-step explanation and logics through the A/B testing setup! I am a recent DS bootcamp graduate and was struggling with this topic and this video helped tons! Thanks Dan
This is one of the best videos about AB testing interview prep on RUclips. Great job and thanks for sharing!
I was asked a similar A/B testing question on a Data Science interview with LinkedIn. Had I seen this video a week prior, I would’ve made it to the 3rd round. Better late than never. Great content! I just subscribed.
Best of luck with your prep! -- Dan
I am a product designer, and this is so great to learn. Thanks for explaining it in detail in a simple to understand way!
Thank you!
This is the best video about A/B testing in practice I've ever seen. Thank you for sharing
This is such a great video. This video is enough when you are preparing for analytics experimentation interviews
Thanks for sharing! This video on A/B testing is hands down the most informative and practical one I've seen on RUclips.
Hmmm, revenue per day per user. That seems much more variable than conversions per user sessions, which is binomial with bounded variance. Especially for clothing on an ecommerce site that isn't Amazon. Can you help me understand why you chose this as the primary success metric?
oh! I was thinking the same thing!
It depends on the Algorithm. If the change is related to recommendation Algorithm - Probably AVG Revenue is a good fit. If the change is related to UI, probably Sessions time would work
Best video for AB testing on YT. Thank you Dan 🙌
Excellent video. Should the candidate list the guardrail metrics way before step 7 (i.e., step 1) so those are known & agreed upon before the test starts ?
hi dan, first of all thanks for all the great work. Couple of questions on my end:
-In another video (spotify case) you mentioned success metric shouldnt be top line such as retention. However, most PMs would like to see improvement in KPIs like retention. If success metric isnt something like retention, how do we guarantee the link between success metric and topline KPI (such as does improvement in CTR also improve retention in long/short term?
-another question is say we saw a signicant change in country a, but no change in country b. Since unified product would be more desirable(it doesnt make sense to roll out different versions of product to different countries) how do we make the final decision here?
Can you explain same concept using jupyter notebook where you are following all the concepts that you talk in this video.?
A very detailed explanation to get started on A/B Testing. Thank you so much.
Of course!
@@DataInterview Can you please explain difference between Frequentist and Bayesian approaches in A/B Testing.
Bro I have an interview in the morning and your video is giving me the confidence I need to go through with it. 😅
Thank you very much, Dan! Very Helpful.
Hi Dan,
Thanks for the video.
1. Isn't it inappropriate to compare the orders per user per day of treatment vs control? What if a big proportion of the treatment group do not use the treatment (don't use the searchbox driven by the new search algorithm)?
2. How do you define orders/user/day? is it = (total orders)/(total users x days run) or do you define the statistic at a user level and generate the distribution of means?
3. How would you assess the influence of position and popularity biases on your results?
Nice questions, now I am also curious to know the answers
On proportion of groups not using search box, my thought is, we may take care of that by checking the stage at which user is entering in funnel
The best video about A/B testing, definitely!
This video literally made me subscribe to you channel Dan! I have gone through countless materials and research papers on experimentation and by this is one of the best summary of the steps! Great work and I am excited to see more videos from you(especially around pitfalls for multiple metrics and more)!
Thanks Viabhav! I created the lesson based on what I wish I had when I first learned about AB testing a couple years ago -- Dan
+1 for the pitfalls for multiple metrics idea. Would a love a video about that!
Great video and summary. Just note that you've made a subtle mistake in explaining the novelty effect. The novelty effect in interpreting A/B test results refers to the impact that the introduction of a new feature or change may have on user behavior simply because it is new or different, and it's not about differentiating between new and existing users.
🎯 Key Takeaways for quick navigation:
00:00 *🎯 AB testing is vital for data science interviews at top companies.*
01:09 *🛠️ Seven steps of AB testing: Understand problem, define hypotheses, design experiment.*
03:00 *🛍️ Example: Designing experiment for an online store, clarify success metrics.*
05:31 *📊 Success metrics must be measurable, sensitive, and timely.*
08:00 *🧪 Establish hypotheses, set significance level, plan experiment.*
09:38 *📐 Design experiment: Determine sample size, run for 1-2 weeks.*
11:30 *🕵️ Check validity: Perform sanity checks, ensure randomization balance.*
14:44 *📈 Interpret results: Analyze metric direction, P-value, and confidence interval.*
17:23 *💡 Make decisions: Evaluate scenarios based on lift and confidence intervals.*
Made with HARPA AI
Thank you for creating this video. It provides such a clear and concise overview of A/B testing 👏
best a/b test videos i've ever seen, Thanks!
Which statistical test did you use to get the p-value while interpreting the results
Same question. How did he get p value as 0.001
Looks like it’s just for illustrative purposes. No data is provided, so of course there is no real statistical test. It’s a demonstration of what someone might encounter in this style of interview question.
*Please upload more A/B testing videos. This topic is too confusing. Could you please upload a video explaining the power of hypothesis test and how to choose the sample size for an A/B test?*
Love the content! So much valuable insight into A/B Testing! Keep sharing more content.
I'm currently preparing for Data Scientist interviews and this video dropped right on time. Thank you so much for such amazing content, Dan!!
How was the interview?
This video is awesome and really helpful if anyone wants to learn AB testing but doesn't have the time to finish the entire popular Udacity course. Thank you for making it so clear and interpretable!
And that’s why I made this video. Ran into a lot of frustrations when I was learning AB testing myself.
This video made me subscribe to your channel. Extraordinarily detailed explanation of all of the things that go into experiment design.
I liked the practical representation with steps in this Video. It is very nicely done but few of the very important points are mentioned by words. This may need to take up as additional one line to slides incase of Success Metrics, Launch Decision and others. Cool one.. 🙂
Great video. But I have a question. Isn't Step 5 Validity Check ought to be done before running the real experiment?
please make a video on how the hypothesis is being tested and we get the p value/CI
@datainterview This is an amazing review of A/B testing in interviews! Quick question, if we have a minimal sample size value calculated, why do we also determine a duration of an experiment? I.e. if sample size is calculated to 500 users with searches per group, shouldn't the test duration be how long it takes a business to reach 500 users during the test period? And not an arbitrarily chosen time such as 1 to 2 weeks?
This was so helpful and straight forward. Thank you so much!!
your channel is GOLD! THANK YOU
Hi Dan, for step 6, how do you get the confidence interval (3.4%, 5.4%)? Thank you.
Dan we can see that we have chosen revenue_day_per_user as a success metric it will happen when the user makes a purchase so our target population should be the user who purchase right not the user who search then in this case should be the user who purchase then we will be able to get the revenue_day_per_user if we are concerned for the user who search then I think CTR can be a good metric to calculate or conversion rate that can also be good can please advise on this part because i think all the users who searches will not purchase. Some may drop at browse for item
Excellent content, I would appreciate just a slight improvement.
When you've assumed you want an alpha of 5% (in this case 2.5 per side) and you've also determined the beta. There is a different equation that gives you an output of how many people the experiment should be performed according to those assumptions. i don`t now wich one is better but from what i see is that yours doesn`t take into account the d of the experiment
more importantly, you can`t always use the sigma, it depends on several cases like the size of the test, independency of the samples and his unbiased estimator
the variance could be sigma OR S^2 OR sp OR SD
n>=(Sigma * (Z(1-a) + Z(1-Beta) / miu0 - miu1)^2
Wouldn‘t it violate the independent assumption if we use revenue per user per day for a/b testing? Since a high portion of low engaged users would just keep not buying anything.
Thank You for the concise explanation.
Of course! -- Dan
Hey Dan, really great video. One question. What will be the possible metric if we want to test a recommendation system for a fee i.e. for items shown on homepage rather than when search happens.
Thank you for the great content, really loved it. It will be great if you could have some videos around most commonly used statistical tests in depth and A/B testing pitfalls. Will keep watching!
Excellent content as always Dan! As an expansion on the topic, could you consider doing a video on variation reduction techinques (e.g., CUPED, MLRATE) ?
Yes!
Extremely helpful. Thank you for this step by guide to A/B Testing.
Why is this a two sided hypothesis test instead of a straightforward one sided test?
Do you think that I need to get a course on data science first and then your course or I can jump straight into yours?
Amazing explaination and super concise, love it
Thanks man! - Dan
Hey Dan, can you talk about the decision to pick revenue-per-user-per-day? Doesn't using this metric require the delta method or bootstrapping if randomizing by user? Thanks!
This is amazing. Thank you. I will send you a Starbucks gift card if I get the job.
Very interesting ..well understood except the launch decision part .. the diagram with horizontal lines at different levels.
Can you explain how you got p-value = 0.001 and confidence interval = [3.4, 5.4]?
I have the same question
I have the same question too
We're all Waiting for the reply @Dan
@datainterview
I recently failed google ds interview in AB Testing ; Feels really bad + next attempt would be after a year which makes me even sad. So yeah if sitting for FAANG always look into AB testing and statistics
Can the alternative hypothesis not be revenue being higher in the target group? And do a one tailed test? It doesn't make sense to change the algorithm for a business to make the same amount of money
when you increase power (1-beta), you are actually increasing recall not precision
Great content. Simple concise explanation.
what software/ tool is used for the A/B testing example in the video?
great work. i love this course and it helped me alot.
In an A/B interview, is it common to need to calculate the Z-value or interview endpoints manually?
the best a b testing video
This was excellent
Clear and helpful, thank you
Thank you!
@datainterview Wouldn't the click through rate or number or products purchased per user per day be a more effective success metric? If it is revenue per user per day, what if the control group purchases only one high priced product whereas treatment group purchases multiple low prices products, the recommendation system works but the revenue would be higher for the control group
Great content! Thank you!
What if we have same user making multiple purchases across the day? Does it make sense to have metric as revenue-per-session / revenue-per-search
if we have multiple purchases by same user in a day, and metric is revenue per user per day, wont we have to change the randomization unit?
Excellent video!
This was really good
Thanks for the great video. I have a question w.r.t the sample size that you mentioned. With a 50:50 split on a website, there will be numerous sessions coming in. So, is the sample size the minimum number of sessions we need on each side to run a test. Or do we randomly sample X samples from all the incoming sessions, X being the sample size?
Randomly sample users, not sessions.
Why do we use a two sided hypothesis test here? Can we not assume the new algorithm is at least the same as the old one and use the extra power?
You're allowing for the possibility that the new ALGO might actually hurt your KPI goals
What is lift?
Hey Dan I am currently in my final semester of Data Science undergrad degree and am graduating a year early. I unfornately do not have any internships, but do have some projects. I am a little confused on what do after I graduate. Whether I should start applying to jobs or do a bootcamp. Please any advice will help.
How do you get the p value 0.001 ?
this is so helpful
Hi. Thanks for the Great video. At the moment I'm stuck with sample size, conversion rate during the test (let's say 2 weeks) and getting a histogram of the data. After 2 weeks I'll get one conversion rate. How can I get an histogram from only one data. "Inside" the CR there is a huge N (let's say 1000 users). I'm confused how can I extract the data from 1000 users to make a CR histogram. Could you help me out? Cheers
Amplitude or have a data engineer/swe get it for you.
Welp, these videos tell me I am qualified to work at google in data science. Nevertheless, I don't see job openings in 2023. And I dropped out of school, so I've never been offered an interview despite my ability in statistics.
On the other hand, I'm not sure I really care about how effective ads are and getting users to make purchases. I'd rather study other things like whether or not certain types of jobs accelerate biological aging non-linearly.
Your explanation is great but the you could avoid the background music. Infuriating!...
Respect !! ❤️
awesome
Hi Dan, Very informative video!
Would you have suggestions on how to select randomization units for B2B products like data bricks and slack, which have a many to 1 relationship between users and accounts?
Randomize at the account level - sort of like cohort-based randomization. So users within the sample cohort is exposed to the same variation condition.
idc about "acing" anything
??????? ?
I ni cash