BM25 : The Most Important Text Metric in Data Science

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024

Комментарии • 20

  • @tantzer6113
    @tantzer6113 2 часа назад +1

    This particular IDF has nothing to do with occupation, apartheid, or genocide.

  • @wildlifeonscreen
    @wildlifeonscreen Год назад +7

    Thanks for your video! Doesn't TF already take into account the length of the document? It's a proportion of the number of times a word appears out of the total number of words in the document. So, issue #1 wouldn't be a problem?

    • @ugestacoolie5998
      @ugestacoolie5998 8 месяцев назад +1

      yeah I was thinking that too, and long documents don't really matter as well because the ratios are equivalent, like 1/10 = 100/1000, so I don't really get it as well

    • @EOI-KSA
      @EOI-KSA 5 месяцев назад

      Rightly pointed out! but yes other portions of the problem makes sense.

    • @roct07
      @roct07 2 месяца назад

      Frequency is used as is and what you're talking about is frequency normalised with the length of the document. While that is fine, it scales linearly whereas BM25 is exponential and gives lesser score as the proportion increases instead of a constant increase, which produces better results in real use cases

  • @rabiumuhammedeffect423
    @rabiumuhammedeffect423 6 месяцев назад +2

    This is way way more informative than my lecturer's lecture

  • @gopesh97
    @gopesh97 5 месяцев назад +1

    What a superb explanation! I really appreciate how you broke down the problem and its solution into manageable pieces. Your clarity and approach are consistently impressive! 🍻

  • @MilesLabrador
    @MilesLabrador Год назад +2

    This was wonderfully explained, thank you for the great video!

  • @vishnum9613
    @vishnum9613 4 месяца назад

    isn't the term frequency already dependent on the total number of words in the document?

  • @rsilveira79
    @rsilveira79 8 месяцев назад +2

    Great explanation, thanks!

  • @haloandavatar1177
    @haloandavatar1177 8 месяцев назад +1

    Remarkably well explained and with such concise elegance too. Extra +100 pts for explaining in laymans term what a partial derivative is

  • @shohrehhaddadan8922
    @shohrehhaddadan8922 4 месяца назад +1

    Not only does this video explain the specific metric, but it also teaches how to analyze metrics! amazing!

    • @ritvikmath
      @ritvikmath  4 месяца назад +1

      Glad it was helpful!

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo 2 месяца назад

    what a great explanation

  • @hungchen6604
    @hungchen6604 Год назад +1

    Analyzing an equation using derivatives is brilliant. Thanks for yet another outstanding video, as always.

  • @naughtrussel5787
    @naughtrussel5787 Год назад +1

    Awesome video, very clear and helpful!

  • @Hemewl
    @Hemewl Год назад +1

    So why not just apply length normalization to the document so that tf of cat in A = 1/10 and tf of cat in B = 10/1000?

    • @siddhantrai7529
      @siddhantrai7529 Год назад

      I suppose we are doing that here as well, but instead of treating each document as IID and hence normalising, we are also taking into account the relative difference in size. It's like a weighted normalisation.
      Plain normalisation would be like comparing bunch of sigmoids with cross entropy.
      More specifically we are trying to take mutual information amongst the docs into account while calculations.

  • @daspradeep
    @daspradeep Год назад

    randomly landed on your channel, your explanations are fundamentally so awesome 🙏🏽

  • @Thebrotrain
    @Thebrotrain 11 месяцев назад

    Why would you use additional shorthand when trying to teach something. I get that paper is small, but adding 1 more thing to keep track off for the learner is bad teaching strategy.