User-Based Collaborative Filtering In Python | Machine Learning

Поделиться
HTML-код
  • Опубликовано: 6 ноя 2024

Комментарии • 50

  • @grabngoinfo
    @grabngoinfo  2 года назад +2

    ❤️ Blog post with code for this video
    medium.com/@AmyGrabNGoInfo/recommendation-system-user-based-collaborative-filtering-a2e76e3e15c4
    🙏 Give me a tip to show your appreciation: www.paypal.com/donate/?hosted_button_id=4PZAFYA8GU8JW
    ✅ Join Medium Membership: If you are not a Medium member and would like to support me as a writer (😄 Buy me a cup of coffee ☕), join Medium membership through this link: medium.com/@AmyGrabNGoInfo/membership
    You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support!
    📒 Code Notebook: mailchi.mp/04dedbc4e95e/ej3em2e33u
    🧱Other tutorials on Recommendation System
    ruclips.net/p/PLVppujud2yJqshyM80nNDZgye-AFufyqF
    🛎️ SUBSCRIBE bit.ly/3keifBY
    🔥 Check out more machine learning tutorials on my website!
    grabngoinfo.com/tutorials/

  • @chrispy_baecon
    @chrispy_baecon Год назад +2

    Great tutorial! I rarely comment on youtube videos, but I just had to on this one, given how detailed you've explained everything and was very straight forward! You were also able to provide all of the resources, which helped me a lot so I have a reference to keep coming back to.

    • @grabngoinfo
      @grabngoinfo  Год назад +1

      Thank you for taking the time to write the comment, Margate. Glad to hear that you found the tutorial helpful! 🙂

  • @poonamnagpal970
    @poonamnagpal970 Год назад +1

    Thank you for sharing your knowledge. Great tutorial for learning user based CF

  • @lucianozaffaina9853
    @lucianozaffaina9853 Год назад +3

    How can I divide the data in train and test set to evaluate this model? It's not easy because if I want to predict the top movies for a user I need the similarity matrix of all users. How can I predict the rating for a new user not in the dataset?

  • @moritzvoss3056
    @moritzvoss3056 Год назад +1

    Great video series! How would you approach this if you would not have user ratings. I would like to apply this technique to fashion items and was thinking about using purchase frequency. So for example, user i purchased item x 2 times. Do you have another idea or think this is a reasonable approach?

  • @쪼이-s5h
    @쪼이-s5h Год назад

    Thank you for your efforts! I have one question. You gave a condition that keep only the movies with greater than 100 ratings. could you explain why you give a condition like that?

  • @RodrigoMaldonado-l6v
    @RodrigoMaldonado-l6v Год назад

    This is a great tutorial. Thank you for your efforts.

  • @kenziecheng6660
    @kenziecheng6660 11 месяцев назад

    Thanks so much! it really helps! But I do have a question regarding the finding similar user part. the user-item matrix I have is containing a lot of missing values as well, just like yours. Howver, when I tried to use the pearson correlation to get the correlation, why I did not get NaN for those entries with missing value; instead, I still got numbers. That's so weird cuz based on your illustration, it should return NaN for missing values correlation.

  • @sinan_islam
    @sinan_islam Год назад +1

    very clean very clear

    • @grabngoinfo
      @grabngoinfo  Год назад

      Thank you, Sinan. Glad you think so!

  • @SihanLi-m5c
    @SihanLi-m5c Год назад

    Great tutorial!

  • @Krm3458
    @Krm3458 Год назад

    how can it be user-based since columns are movie names ? did you transpoze the matrix ?

  • @choirulhuda8606
    @choirulhuda8606 Год назад

    Great tutorial I have ever found. Can you give me the further explanation about the performance evaluation ? Thanks.

    • @grabngoinfo
      @grabngoinfo  Год назад

      Thank you, Choirul! I will consider writing a tutorial on recommendation system performance evaluation in the near future.

  • @vanessahaaland2954
    @vanessahaaland2954 2 года назад +1

    Hello, great tutorial! I noticed at 8:15 there are numbers by the movie recommendations. E.g the movie at the top - Harry potter - has number 16. What are these numbers? Tried comparing them to either the movieId or the index number at 3:23 but "A beautiful mind" was neither at index 3 or had 3 as its movieId.

    • @grabngoinfo
      @grabngoinfo  2 года назад

      Thank you, Vanessa! The numbers by the movie recommendations are the index for the dataframe called item_score. The ranked_item_score is the sorted item_score and it kept the index from item_score.

  • @ridakhan7934
    @ridakhan7934 Год назад +1

    can we make one recommendation system with both collaborative filtering approach (user-based and item-based)? If yes, then how can we generate recommendations ?

    • @grabngoinfo
      @grabngoinfo  Год назад

      Good question! You can combine the user-based and item-based collaborative filtering by different methods. For example, one way is to assign different weight to the recommendations from the two different methods. I plan to write a tutorial on this topic in the near future, so stay tuned.

  • @indianarrmy3148
    @indianarrmy3148 Год назад

    so if i enter different user_id then i would get recommendations right? so how can i deploy it on streamlit...please help me out.

  • @gregoriusoscar
    @gregoriusoscar Год назад +1

    hi i wanted to ask about the result of the rating prediction. in this video the top rating prediction is 6.28, my question is why is the predicted rating value exceed the max rating value that is 5 ?

    • @grabngoinfo
      @grabngoinfo  Год назад

      Hi Gregorius, thank you for watching the tutorial! The predicted rating is the sum of the movie_score and the average movie rating for a user. It can exceed 5 if a user tends to give high ratings to movies. You can align the the predicted rating to the existing rating system by putting the predicted_rating to buckets. For example, if predicted_rating>=4.75, then prediction_rating_adjusted=5, if predicted_rating>=4.25 and predicted_rating

    • @gregoriusoscar
      @gregoriusoscar Год назад

      @@grabngoinfo How can i define the buckets to use, and what is the range of predicted rating value if my rating range is 1 to 5 ?

  • @shrikantganduri5482
    @shrikantganduri5482 2 года назад +1

    Amazing Sir

  • @mohammadhegazy1285
    @mohammadhegazy1285 Год назад +1

    thank you

  • @shahali6880
    @shahali6880 2 года назад +1

    hi just wanted to ask where you created the user/item matrix you mentioned in the code 'matrix = df_GT100.pivot table' just wanted to ask what it means by that.
    And also wanted to ask what other alternatives are they to matrix normalization

    • @grabngoinfo
      @grabngoinfo  2 года назад

      Hi Shah, thank you for your question!
      In the code 'matrix = df_GT100.pivot table', matrix is the output user-movie matrix name. df_GT100 is the name of the input dataset after data processing. pivot_table is the function for transforming the data.
      For collaborative filtering recommendation system, extracting the mean is the most commonly used matrix normalization.
      I hope the information is helpful. Feel free to let me know if you have other questions.

  • @chigozirioyiogu7804
    @chigozirioyiogu7804 2 года назад

    Hello. At 3:54, you used mean_rating("rating", "mean"), number_of_rating("rating","count"). I was wondering where they came from

    • @grabngoinfo
      @grabngoinfo  2 года назад +1

      Hi Chigoziri,
      Thank you for watching the video!
      Are you referring to the code agg_ratings = df.groupby('title').agg(mean_rating = ('rating', 'mean'), number_of_ratings = ('rating', 'count')).reset_index()? In this code, the data was aggregated by the movie title. For each move title, two variables are created, mean_rating and number_of_ratings. mean_rating is calculated by taking the mean of all the ratings for a movie, and number_of_ratings is calculated by counting the number of ratings for each movie title.

  • @lucianozaffaina9853
    @lucianozaffaina9853 Год назад +1

    Hi, can someone tell me how to evaluate the performance of this model?

    • @grabngoinfo
      @grabngoinfo  Год назад +1

      Here is a good summary of recommender sys evaluation: www.math.uci.edu/icamp/courses/math77b/lecture_12w/pdfs/Chapter%2007%20-%20Evaluating%20recommender%20systems.pdf I may create a tutorial on that in the future.

    • @lucianozaffaina9853
      @lucianozaffaina9853 Год назад

      @@grabngoinfo thanks

  • @lachimolala7839
    @lachimolala7839 Год назад

    if this is user based collaborative filtering arent we supposed to find similarity between the users???
    so why did u find the similarity between the items?

  • @himanshusharma2861
    @himanshusharma2861 2 года назад

    How do I modify the code if I want to run for all the users and not want to pick one user as you did?

    • @grabngoinfo
      @grabngoinfo  2 года назад +1

      You can write a function using user_id as the input, and loop through all the users using the function.

    • @himanshusharma2861
      @himanshusharma2861 2 года назад

      @@grabngoinfo Hi! I was able to modify the entire code with loop except the last part:
      picked_userid = [407303734, 529893144]
      def gen_item_score(data):
      df_item_scores = pd.DataFrame()
      for customerid in picked_userid:
      # A dictionary to store item scores
      item_score = {}
      # Loop through items
      for i in data[data.index == picked_userid].columns:
      # Get the ratings for movie i
      category_rating = data[data.index == picked_userid][i].drop_duplicates()
      # Create a variable to store the score
      total = 0
      # Create a variable to store the number of scores
      count = 0
      # Loop through similar users
      for u in df_similar_users[df_similar_users.index== picked_userid].index:
      # If the movie has rating
      if pd.isna(category_rating[u]) == False:
      # Score is the sum of user similarity score multiply by the movie rating
      score = df_similar_users[df_similar_users.index== picked_userid][u] * category_rating[u]
      # Add the score to the total score for the movie so far
      total += score
      # Add 1 to the count
      count +=1
      # Get the average score for the item
      item_score[i] = total / count
      # Convert dictionary to pandas dataframe
      item_score = pd.DataFrame(item_score.items(), columns=['category_id', 'category_score'])
      df_item_scores = df_item_scores.append(item_score)
      return df_item_scores
      df_item_scores = gen_item_score(df_similar_user_category_ordered)
      Can you please tell where I am going wrong?

  • @iasupsc100
    @iasupsc100 Год назад

    Can you make me a project collaborative recommendations system

  • @haydenpour9133
    @haydenpour9133 2 года назад

    how would you evaluate your work?

    • @grabngoinfo
      @grabngoinfo  2 года назад

      Great question! In practice, AB testing can be set up to evaluate the result of the recommendation system vs. the existing algorithm.

  • @kushagrayadav1659
    @kushagrayadav1659 Год назад

    I am not able to access the whole code bro.

    • @grabngoinfo
      @grabngoinfo  Год назад

      Here is the link to the notebook: mailchi.mp/04dedbc4e95e/ej3em2e33u

  • @RaviKumar-nz7rv
    @RaviKumar-nz7rv 2 года назад

    please try to add notebook link directly

    • @grabngoinfo
      @grabngoinfo  2 года назад +1

      Hi Ravi, I am working on making the notebooks available in a central place, so stay tuned.

    • @grabngoinfo
      @grabngoinfo  2 года назад +1

      Hi Ravi, here is the notebook with code: mailchi.mp/04dedbc4e95e/ej3em2e33u

  • @kachhadiyahemal3807
    @kachhadiyahemal3807 2 года назад

    Please give a code

    • @grabngoinfo
      @grabngoinfo  2 года назад

      Hi Kachhadiya, here is the notebook with code: mailchi.mp/04dedbc4e95e/ej3em2e33u

  • @hariharibolll3459
    @hariharibolll3459 Год назад +1

    jai shree ram