❤️ Blog post with code for this video medium.com/@AmyGrabNGoInfo/recommendation-system-user-based-collaborative-filtering-a2e76e3e15c4 🙏 Give me a tip to show your appreciation: www.paypal.com/donate/?hosted_button_id=4PZAFYA8GU8JW ✅ Join Medium Membership: If you are not a Medium member and would like to support me as a writer (😄 Buy me a cup of coffee ☕), join Medium membership through this link: medium.com/@AmyGrabNGoInfo/membership You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support! 📒 Code Notebook: mailchi.mp/04dedbc4e95e/ej3em2e33u 🧱Other tutorials on Recommendation System ruclips.net/p/PLVppujud2yJqshyM80nNDZgye-AFufyqF 🛎️ SUBSCRIBE bit.ly/3keifBY 🔥 Check out more machine learning tutorials on my website! grabngoinfo.com/tutorials/
Great tutorial! I rarely comment on youtube videos, but I just had to on this one, given how detailed you've explained everything and was very straight forward! You were also able to provide all of the resources, which helped me a lot so I have a reference to keep coming back to.
How can I divide the data in train and test set to evaluate this model? It's not easy because if I want to predict the top movies for a user I need the similarity matrix of all users. How can I predict the rating for a new user not in the dataset?
Great video series! How would you approach this if you would not have user ratings. I would like to apply this technique to fashion items and was thinking about using purchase frequency. So for example, user i purchased item x 2 times. Do you have another idea or think this is a reasonable approach?
Thank you for your efforts! I have one question. You gave a condition that keep only the movies with greater than 100 ratings. could you explain why you give a condition like that?
Thanks so much! it really helps! But I do have a question regarding the finding similar user part. the user-item matrix I have is containing a lot of missing values as well, just like yours. Howver, when I tried to use the pearson correlation to get the correlation, why I did not get NaN for those entries with missing value; instead, I still got numbers. That's so weird cuz based on your illustration, it should return NaN for missing values correlation.
Hello, great tutorial! I noticed at 8:15 there are numbers by the movie recommendations. E.g the movie at the top - Harry potter - has number 16. What are these numbers? Tried comparing them to either the movieId or the index number at 3:23 but "A beautiful mind" was neither at index 3 or had 3 as its movieId.
Thank you, Vanessa! The numbers by the movie recommendations are the index for the dataframe called item_score. The ranked_item_score is the sorted item_score and it kept the index from item_score.
can we make one recommendation system with both collaborative filtering approach (user-based and item-based)? If yes, then how can we generate recommendations ?
Good question! You can combine the user-based and item-based collaborative filtering by different methods. For example, one way is to assign different weight to the recommendations from the two different methods. I plan to write a tutorial on this topic in the near future, so stay tuned.
hi i wanted to ask about the result of the rating prediction. in this video the top rating prediction is 6.28, my question is why is the predicted rating value exceed the max rating value that is 5 ?
Hi Gregorius, thank you for watching the tutorial! The predicted rating is the sum of the movie_score and the average movie rating for a user. It can exceed 5 if a user tends to give high ratings to movies. You can align the the predicted rating to the existing rating system by putting the predicted_rating to buckets. For example, if predicted_rating>=4.75, then prediction_rating_adjusted=5, if predicted_rating>=4.25 and predicted_rating
hi just wanted to ask where you created the user/item matrix you mentioned in the code 'matrix = df_GT100.pivot table' just wanted to ask what it means by that. And also wanted to ask what other alternatives are they to matrix normalization
Hi Shah, thank you for your question! In the code 'matrix = df_GT100.pivot table', matrix is the output user-movie matrix name. df_GT100 is the name of the input dataset after data processing. pivot_table is the function for transforming the data. For collaborative filtering recommendation system, extracting the mean is the most commonly used matrix normalization. I hope the information is helpful. Feel free to let me know if you have other questions.
Hi Chigoziri, Thank you for watching the video! Are you referring to the code agg_ratings = df.groupby('title').agg(mean_rating = ('rating', 'mean'), number_of_ratings = ('rating', 'count')).reset_index()? In this code, the data was aggregated by the movie title. For each move title, two variables are created, mean_rating and number_of_ratings. mean_rating is calculated by taking the mean of all the ratings for a movie, and number_of_ratings is calculated by counting the number of ratings for each movie title.
Here is a good summary of recommender sys evaluation: www.math.uci.edu/icamp/courses/math77b/lecture_12w/pdfs/Chapter%2007%20-%20Evaluating%20recommender%20systems.pdf I may create a tutorial on that in the future.
if this is user based collaborative filtering arent we supposed to find similarity between the users??? so why did u find the similarity between the items?
@@grabngoinfo Hi! I was able to modify the entire code with loop except the last part: picked_userid = [407303734, 529893144] def gen_item_score(data): df_item_scores = pd.DataFrame() for customerid in picked_userid: # A dictionary to store item scores item_score = {} # Loop through items for i in data[data.index == picked_userid].columns: # Get the ratings for movie i category_rating = data[data.index == picked_userid][i].drop_duplicates() # Create a variable to store the score total = 0 # Create a variable to store the number of scores count = 0 # Loop through similar users for u in df_similar_users[df_similar_users.index== picked_userid].index: # If the movie has rating if pd.isna(category_rating[u]) == False: # Score is the sum of user similarity score multiply by the movie rating score = df_similar_users[df_similar_users.index== picked_userid][u] * category_rating[u] # Add the score to the total score for the movie so far total += score # Add 1 to the count count +=1 # Get the average score for the item item_score[i] = total / count # Convert dictionary to pandas dataframe item_score = pd.DataFrame(item_score.items(), columns=['category_id', 'category_score']) df_item_scores = df_item_scores.append(item_score) return df_item_scores df_item_scores = gen_item_score(df_similar_user_category_ordered) Can you please tell where I am going wrong?
❤️ Blog post with code for this video
medium.com/@AmyGrabNGoInfo/recommendation-system-user-based-collaborative-filtering-a2e76e3e15c4
🙏 Give me a tip to show your appreciation: www.paypal.com/donate/?hosted_button_id=4PZAFYA8GU8JW
✅ Join Medium Membership: If you are not a Medium member and would like to support me as a writer (😄 Buy me a cup of coffee ☕), join Medium membership through this link: medium.com/@AmyGrabNGoInfo/membership
You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support!
📒 Code Notebook: mailchi.mp/04dedbc4e95e/ej3em2e33u
🧱Other tutorials on Recommendation System
ruclips.net/p/PLVppujud2yJqshyM80nNDZgye-AFufyqF
🛎️ SUBSCRIBE bit.ly/3keifBY
🔥 Check out more machine learning tutorials on my website!
grabngoinfo.com/tutorials/
Great tutorial! I rarely comment on youtube videos, but I just had to on this one, given how detailed you've explained everything and was very straight forward! You were also able to provide all of the resources, which helped me a lot so I have a reference to keep coming back to.
Thank you for taking the time to write the comment, Margate. Glad to hear that you found the tutorial helpful! 🙂
Thank you for sharing your knowledge. Great tutorial for learning user based CF
Thank you, Poonam!
How can I divide the data in train and test set to evaluate this model? It's not easy because if I want to predict the top movies for a user I need the similarity matrix of all users. How can I predict the rating for a new user not in the dataset?
Great video series! How would you approach this if you would not have user ratings. I would like to apply this technique to fashion items and was thinking about using purchase frequency. So for example, user i purchased item x 2 times. Do you have another idea or think this is a reasonable approach?
Thank you for your efforts! I have one question. You gave a condition that keep only the movies with greater than 100 ratings. could you explain why you give a condition like that?
This is a great tutorial. Thank you for your efforts.
Thanks so much! it really helps! But I do have a question regarding the finding similar user part. the user-item matrix I have is containing a lot of missing values as well, just like yours. Howver, when I tried to use the pearson correlation to get the correlation, why I did not get NaN for those entries with missing value; instead, I still got numbers. That's so weird cuz based on your illustration, it should return NaN for missing values correlation.
very clean very clear
Thank you, Sinan. Glad you think so!
Great tutorial!
how can it be user-based since columns are movie names ? did you transpoze the matrix ?
Great tutorial I have ever found. Can you give me the further explanation about the performance evaluation ? Thanks.
Thank you, Choirul! I will consider writing a tutorial on recommendation system performance evaluation in the near future.
Hello, great tutorial! I noticed at 8:15 there are numbers by the movie recommendations. E.g the movie at the top - Harry potter - has number 16. What are these numbers? Tried comparing them to either the movieId or the index number at 3:23 but "A beautiful mind" was neither at index 3 or had 3 as its movieId.
Thank you, Vanessa! The numbers by the movie recommendations are the index for the dataframe called item_score. The ranked_item_score is the sorted item_score and it kept the index from item_score.
can we make one recommendation system with both collaborative filtering approach (user-based and item-based)? If yes, then how can we generate recommendations ?
Good question! You can combine the user-based and item-based collaborative filtering by different methods. For example, one way is to assign different weight to the recommendations from the two different methods. I plan to write a tutorial on this topic in the near future, so stay tuned.
so if i enter different user_id then i would get recommendations right? so how can i deploy it on streamlit...please help me out.
hi i wanted to ask about the result of the rating prediction. in this video the top rating prediction is 6.28, my question is why is the predicted rating value exceed the max rating value that is 5 ?
Hi Gregorius, thank you for watching the tutorial! The predicted rating is the sum of the movie_score and the average movie rating for a user. It can exceed 5 if a user tends to give high ratings to movies. You can align the the predicted rating to the existing rating system by putting the predicted_rating to buckets. For example, if predicted_rating>=4.75, then prediction_rating_adjusted=5, if predicted_rating>=4.25 and predicted_rating
@@grabngoinfo How can i define the buckets to use, and what is the range of predicted rating value if my rating range is 1 to 5 ?
Amazing Sir
Many many thanks
thank you
hi just wanted to ask where you created the user/item matrix you mentioned in the code 'matrix = df_GT100.pivot table' just wanted to ask what it means by that.
And also wanted to ask what other alternatives are they to matrix normalization
Hi Shah, thank you for your question!
In the code 'matrix = df_GT100.pivot table', matrix is the output user-movie matrix name. df_GT100 is the name of the input dataset after data processing. pivot_table is the function for transforming the data.
For collaborative filtering recommendation system, extracting the mean is the most commonly used matrix normalization.
I hope the information is helpful. Feel free to let me know if you have other questions.
Hello. At 3:54, you used mean_rating("rating", "mean"), number_of_rating("rating","count"). I was wondering where they came from
Hi Chigoziri,
Thank you for watching the video!
Are you referring to the code agg_ratings = df.groupby('title').agg(mean_rating = ('rating', 'mean'), number_of_ratings = ('rating', 'count')).reset_index()? In this code, the data was aggregated by the movie title. For each move title, two variables are created, mean_rating and number_of_ratings. mean_rating is calculated by taking the mean of all the ratings for a movie, and number_of_ratings is calculated by counting the number of ratings for each movie title.
Hi, can someone tell me how to evaluate the performance of this model?
Here is a good summary of recommender sys evaluation: www.math.uci.edu/icamp/courses/math77b/lecture_12w/pdfs/Chapter%2007%20-%20Evaluating%20recommender%20systems.pdf I may create a tutorial on that in the future.
@@grabngoinfo thanks
if this is user based collaborative filtering arent we supposed to find similarity between the users???
so why did u find the similarity between the items?
How do I modify the code if I want to run for all the users and not want to pick one user as you did?
You can write a function using user_id as the input, and loop through all the users using the function.
@@grabngoinfo Hi! I was able to modify the entire code with loop except the last part:
picked_userid = [407303734, 529893144]
def gen_item_score(data):
df_item_scores = pd.DataFrame()
for customerid in picked_userid:
# A dictionary to store item scores
item_score = {}
# Loop through items
for i in data[data.index == picked_userid].columns:
# Get the ratings for movie i
category_rating = data[data.index == picked_userid][i].drop_duplicates()
# Create a variable to store the score
total = 0
# Create a variable to store the number of scores
count = 0
# Loop through similar users
for u in df_similar_users[df_similar_users.index== picked_userid].index:
# If the movie has rating
if pd.isna(category_rating[u]) == False:
# Score is the sum of user similarity score multiply by the movie rating
score = df_similar_users[df_similar_users.index== picked_userid][u] * category_rating[u]
# Add the score to the total score for the movie so far
total += score
# Add 1 to the count
count +=1
# Get the average score for the item
item_score[i] = total / count
# Convert dictionary to pandas dataframe
item_score = pd.DataFrame(item_score.items(), columns=['category_id', 'category_score'])
df_item_scores = df_item_scores.append(item_score)
return df_item_scores
df_item_scores = gen_item_score(df_similar_user_category_ordered)
Can you please tell where I am going wrong?
Can you make me a project collaborative recommendations system
how would you evaluate your work?
Great question! In practice, AB testing can be set up to evaluate the result of the recommendation system vs. the existing algorithm.
I am not able to access the whole code bro.
Here is the link to the notebook: mailchi.mp/04dedbc4e95e/ej3em2e33u
please try to add notebook link directly
Hi Ravi, I am working on making the notebooks available in a central place, so stay tuned.
Hi Ravi, here is the notebook with code: mailchi.mp/04dedbc4e95e/ej3em2e33u
Please give a code
Hi Kachhadiya, here is the notebook with code: mailchi.mp/04dedbc4e95e/ej3em2e33u
jai shree ram
Thank you!