Man what a good explanation! I was looking for bayesian regression and found your video on it, got it. Now I searched for thompson sampling and it's your channel again! You're saving my day hahaha. Very clear and insightful explanations. Thank you very much!
I like the very clear explanation with a reference to the math details for those who want that. Also appreciate the limitations at the end. Thinking of applications to portfolio optimization
Cool video! There are a lot of videos about DS implementation, I find this channel provides lots of math foundations behind the scene. While a good implementation is important, I believe the theoretical foundation is also very cool and would be crucial to a successful analysis.
Absolutely fantastic content once again, many thanks! However, I would have one important question: you never revisited the assumption that we know sigma,i beforehand, even though in practice it's an unobservable quantity. What should one do with it? Is estimating it from historical data (if such data are available) a big no-no?
The posteriors that emerge given the formulas have a standard deviation of 1 after one visit. Does this result depend on the fact that the quality of the restaurants actually have a known standard deviation of 1?
I wonder if anybody can bring some "Explore-Exploit" thinking to this. Here, Thompson sampling arrives at the optimal solution provided that the 'environment' (restaurant quality) is constant. But what about a changing environment (say, restaurants occasionally going under new management). In this case, it seems that time exploring should always remain higher than it would in a constant environment. Is there an analagous sampling routine for such situation?
Been thinking about this. I may have a partial solution. Since, once sufficient data is available, the 'better' option might always out compete the 'lesser' option, a change to the environment that makes the lesser option the better will go undetected. So perhaps the goal is to increase the uncertainty in the posteriors in proportion to the number of future events. One way (I think) to do this would be to weight the data by something like [1/total planned visits to any restaurant]. In this way, much of the 'uninformation' of the prior is maintained--permitting increased exploration. But even if this is okay, what do you do if you plan to visit restaurants infinitely many times?
Is one shortcoming of this method that the variance of the posterior does not scale to the sample variance of the observations for that restaurant? Like, if I went to Restaurant A 50 times and Restaurant B 50 times, and my sample values from Restaurant A were distributed N(5, 1) but my sample values from Restaurant B were distributed N(6, 10), then you would think that my posterior for Restaurant B should have much wider variance than my posterior for Restaurant A. But Thompson Sampling doesn't seem to account for that, instead just scaling posterior variance by the number of observations per restaurant. Am I missing something here?
Is it correct to multiply by sigma squared in the posterior formula? Seems that we need to multiply by sigma only otherwise we can get wrong scale. We then get squared length instead of length
You can use this Python Code: # Thompson Sampling # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Ads_CTR_Optimisation.csv') # Implementing Thompson Sampling import random N = 1000 d = 10 ads_selected = [] numbers_of_selections=np.zeros(10) # Ni(n) variance_posterior = [1e2]*d mean_posterior=[0]*d sum_sample=[0]*d numbers_of_rewards_0 = [0] * d total_reward = 0 for n in range(0,N): ad=0 max_sample=0 for i in range(0,d): sample=np.random.normal(mean_posterior[i], variance_posterior[i]) if sample>max_sample: max_sample=sample ad=i ads_selected.append(ad) numbers_of_selections[ad]+=1 reward=dataset.values[n,ad] sum_sample[ad]+=reward variance_posterior[ad]=1/(1/(1e2*1e2) + numbers_of_selections[ad]) mean_posterior[ad]=variance_posterior[ad]*sum_sample[ad]
# Visualising the results - Histogram plt.hist(ads_selected) plt.title('Histogram of ads selections') plt.xlabel('Ads') plt.ylabel('Number of times each ad was selected') plt.show() print(numbers_of_selections[4]/sum(numbers_of_selections)*100)
I really like your videos! Your explanations are so much better than the ones given by my professors!
Thanks!
Man what a good explanation! I was looking for bayesian regression and found your video on it, got it. Now I searched for thompson sampling and it's your channel again! You're saving my day hahaha. Very clear and insightful explanations. Thank you very much!
Beautiful explanation! Had come across Thomson Sampling during Udemy 's online course on Recommender Systems.
I like the very clear explanation with a reference to the math details for those who want that. Also appreciate the limitations at the end. Thinking of applications to portfolio optimization
Thanks!
Cool video! There are a lot of videos about DS implementation, I find this channel provides lots of math foundations behind the scene. While a good implementation is important, I believe the theoretical foundation is also very cool and would be crucial to a successful analysis.
Thanks!
Very neat, first time I come across Thompson’s sampling!
This is pretty awesome, thanks for the great explaination!
Thanks!
Very clear explanation! Thank you so much!
loved the explanation I thought before this video I could never learn TS Thank you :)
Damn bro. You are good at this
Your explanation made me say WOW!
This is really interesting. Never heard of it.
great explanation!!!
Absolutely fantastic content once again, many thanks! However, I would have one important question: you never revisited the assumption that we know sigma,i beforehand, even though in practice it's an unobservable quantity. What should one do with it? Is estimating it from historical data (if such data are available) a big no-no?
Excellent video - thank you!
Very well explained video, helped me a lot!
The posteriors that emerge given the formulas have a standard deviation of 1 after one visit. Does this result depend on the fact that the quality of the restaurants actually have a known standard deviation of 1?
Can you please mention the source u studied for this video? Like a journal paper or a textbook u followed. It will help me a lot. Thanks
I wonder if anybody can bring some "Explore-Exploit" thinking to this. Here, Thompson sampling arrives at the optimal solution provided that the 'environment' (restaurant quality) is constant. But what about a changing environment (say, restaurants occasionally going under new management). In this case, it seems that time exploring should always remain higher than it would in a constant environment. Is there an analagous sampling routine for such situation?
Been thinking about this. I may have a partial solution.
Since, once sufficient data is available, the 'better' option might always out compete the 'lesser' option, a change to the environment that makes the lesser option the better will go undetected. So perhaps the goal is to increase the uncertainty in the posteriors in proportion to the number of future events. One way (I think) to do this would be to weight the data by something like [1/total planned visits to any restaurant]. In this way, much of the 'uninformation' of the prior is maintained--permitting increased exploration.
But even if this is okay, what do you do if you plan to visit restaurants infinitely many times?
how to pick the next visit to 1 or 2? could you explain that?
Could you please also have a video on Importance Sampling?
Amazing. Thanks 😉
Is one shortcoming of this method that the variance of the posterior does not scale to the sample variance of the observations for that restaurant? Like, if I went to Restaurant A 50 times and Restaurant B 50 times, and my sample values from Restaurant A were distributed N(5, 1) but my sample values from Restaurant B were distributed N(6, 10), then you would think that my posterior for Restaurant B should have much wider variance than my posterior for Restaurant A. But Thompson Sampling doesn't seem to account for that, instead just scaling posterior variance by the number of observations per restaurant. Am I missing something here?
Awesome!!!
What about CRF? r u able to do it?
You are the best
Can you produce a video to explain about moving least squares method? Thank you in advance
Actually you had the board covered the entire video. Couldn’t take a photo unobstructed this time.
Sorry! Will try to remember that
Is it correct to multiply by sigma squared in the posterior formula? Seems that we need to multiply by sigma only otherwise we can get wrong scale. We then get squared length instead of length
Do you have the article you mentioned (4:42) witht the table of prior/posterior distributions?
He was possibly referring to this paper: statweb.stanford.edu/~cgates/PERSI/papers/conjprior.pdf
Just linked! Sorry bout that
@@ritvikmath Thanks!
how do you get the initial posterior distribution of 20 and -12?
Ritvik you are cool
Wow thanks!
Do you know Top-two thompson sampling?
8:47 "we sample from those posteriors"
You mean "priors"?
in the first visit the posterior is equal to the prior
You can use this Python Code:
# Thompson Sampling
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Ads_CTR_Optimisation.csv')
# Implementing Thompson Sampling
import random
N = 1000
d = 10
ads_selected = []
numbers_of_selections=np.zeros(10) # Ni(n)
variance_posterior = [1e2]*d
mean_posterior=[0]*d
sum_sample=[0]*d
numbers_of_rewards_0 = [0] * d
total_reward = 0
for n in range(0,N):
ad=0
max_sample=0
for i in range(0,d):
sample=np.random.normal(mean_posterior[i], variance_posterior[i])
if sample>max_sample:
max_sample=sample
ad=i
ads_selected.append(ad)
numbers_of_selections[ad]+=1
reward=dataset.values[n,ad]
sum_sample[ad]+=reward
variance_posterior[ad]=1/(1/(1e2*1e2) + numbers_of_selections[ad])
mean_posterior[ad]=variance_posterior[ad]*sum_sample[ad]
# Visualising the results - Histogram
plt.hist(ads_selected)
plt.title('Histogram of ads selections')
plt.xlabel('Ads')
plt.ylabel('Number of times each ad was selected')
plt.show()
print(numbers_of_selections[4]/sum(numbers_of_selections)*100)