Ajinkya More | Resampling techniques and other strategies

Поделиться
HTML-код
  • Опубликовано: 23 окт 2024

Комментарии • 15

  • @OmyTrenav
    @OmyTrenav 7 лет назад +1

    Great talk. Thanks!

  • @Johnnyboycurtis
    @Johnnyboycurtis 8 лет назад

    Great presentation!

  • @WalterReade
    @WalterReade 8 лет назад

    Thanks for the video; it was excellent and I learned a great deal. I'd suggest, though, that you split out the test data _before_ you apply the under/over sampling algorithm (to the train data only). That would give a much better comparison of the algorithms, showing how they perform on the unmodified test data.

    • @ajinkyamore7090
      @ajinkyamore7090 8 лет назад +1

      Thanks. The train/test split is the first step (see cell number 2 in the notebook) and none of the under/over sampling methods are applied to the test set. The performance comparison is indeed on the unmodified test data.

    • @WalterReade
      @WalterReade 8 лет назад

      I just noticed that as I was going through your notebook on github (thanks for uploading!) and was going to edit my comment. . Yes, that makes perfect sense. What initially confused me was that the graphs are showing the decision boundary on the train data (and I was thinking it was the test data).

    • @WalterReade
      @WalterReade 8 лет назад

      I do like like the graphs showing the decision boundary on the train data, since it shows how the under/over sampling algs modify the data. I forked the notebook and am going to add the plots of the decision boundary on the test data as well.

    • @ajinkyamore7090
      @ajinkyamore7090 8 лет назад +1

      Yes, the idea was to show the changes in the data distribution affect the decision boundary.

  • @rebiiahmed7836
    @rebiiahmed7836 8 лет назад +1

    Thank you for your presentation! Could you please upload the code in notebook file for example?

    • @ajinkyamore7090
      @ajinkyamore7090 8 лет назад +2

      Thanks! Here is a link to the notebook github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb and the slides www.slideshare.net/AjinkyaMore3/python-resampling

    • @WalterReade
      @WalterReade 8 лет назад

      I found it with a bit of digging:
      github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb

    • @WalterReade
      @WalterReade 8 лет назад

      LOL . . . I should have refreshed the comments before posting my comment. :-)

    • @rebiiahmed7836
      @rebiiahmed7836 8 лет назад

      Great thanks to you Mr Ajinkya More.

  • @EarlWallaceNYC
    @EarlWallaceNYC 8 лет назад

    Great video, Thanks.
    Where can I get the slides you used?
    (I found your paper on arXiv, but it doesn't have the code)

    • @ajinkyamore7090
      @ajinkyamore7090 8 лет назад +1

      Thanks! Here is a link to the notebook github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb and the slides www.slideshare.net/AjinkyaMore3/python-resampling

  • @berry4862
    @berry4862 8 лет назад

    Optimizing an arbitrary metric is rather useless for business. In particular, what is the business meaning of optimizing for precision of normal cases? Something like alarms per month may well be meaningful, but that would be Recall(pos)/Prec(pos)..