Term Frequency Inverse Document Frequency (TF-IDF) Explained

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024

Комментарии • 25

  • @datamlistic
    @datamlistic  2 года назад +3

    There are several ways of computing TF-IDF (e.g. tf = log(1 + word_freq)). Have you experimented with other variations?

  • @chiedozieonyearugbulem9363
    @chiedozieonyearugbulem9363 3 месяца назад

    Amazing. This video explained exactly what I wanted from it. Thanks a bunch!

  • @noblemaster0173
    @noblemaster0173 Месяц назад

    Best explanation ever, well detailed

    • @datamlistic
      @datamlistic  20 дней назад

      Thanks! Glad you think so! :)

  • @TJ-hs1qm
    @TJ-hs1qm 3 месяца назад

    4:54 the motivation behind log is more likely this: assume only 1 document cat.txt (N=1) with the word "cat" in it. The idf("cat", cat.txt) would be 1 / 1 = 1. However we want the idf to equal 0 for super common terms (terms with no specific information about the document => noise) . The log will give us that idf("cat", cat.txt) = 0. So whenever the number of documents matches the number of occurrences the idf = 0... exactly like in 5:38

    • @datamlistic
      @datamlistic  3 месяца назад

      Thanks for the clarification! :)

  • @urjitpatil9712
    @urjitpatil9712 Год назад +1

    This was very helpful. Thank you!

  • @annette4718
    @annette4718 4 месяца назад

    Very well explained, thank you!

    • @datamlistic
      @datamlistic  4 месяца назад

      Thanks! Glad it was helpful! :)

  • @OmarAmil-n7u
    @OmarAmil-n7u 10 дней назад

    thank you for this video

    • @datamlistic
      @datamlistic  5 дней назад

      You're welcome! Glaf you liked it! :)

  • @ABear999
    @ABear999 4 месяца назад

    Well explained, thank you!

    • @datamlistic
      @datamlistic  4 месяца назад

      Thanks! Glad it was helpful! :)

  • @abhiramivenugopal9868
    @abhiramivenugopal9868 7 месяцев назад

    excellent explanation

    • @datamlistic
      @datamlistic  7 месяцев назад

      Thanks! Glad you liked it! :)

  • @serafeiml1041
    @serafeiml1041 4 месяца назад

    nice explanation thanks

    • @datamlistic
      @datamlistic  4 месяца назад

      Glad you found it helpful! :)

  • @IO279K4RMN
    @IO279K4RMN 6 месяцев назад

    thank you so much!!!

    • @datamlistic
      @datamlistic  6 месяцев назад

      You're welcome! I'm happy you enjoyed the explanation! :)

  • @a7d2e5aff
    @a7d2e5aff 7 месяцев назад

    Merci!

  • @TJ-hs1qm
    @TJ-hs1qm 3 месяца назад

    3:16 you didn't count the dot at the end of each line, so technically 2/7 etc. 😜
    8:33 TF-IDF(doc2, "and") should be 0.022 not 0.22