The Science Behind InterpretML: Explainable Boosting Machine

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 24

  • @MensRea78
    @MensRea78 3 года назад +7

    This looks like an amazing library. Where it looks thin is on documentation. I hope the documentation is updated to the same standard as LightGBM. This has the potential to be a game changer

    • @richcaruana2067
      @richcaruana2067 3 года назад +5

      The documentation was recently updated and is now much improved. We still need to to do more, but it's getting there.

    • @juliocardenas4485
      @juliocardenas4485 Год назад

      @@richcaruana2067 thank you 🙏🏾.

  • @allanhansen6923
    @allanhansen6923 3 года назад +7

    XGBoost & friends are good at capturing interactions between features, but can EBM's do the same? Being restricted to just one feature for each tree would intuitively seem to limit the models ability to discover interaction effects (which of course comes at the tradeoff of limiting the interprability)?

    • @richcaruana2067
      @richcaruana2067 3 года назад +6

      Yes, by default EBMs also model pairwise interactions. EBMs do this by fitting the main effects first using trees that can only use one feature at a time. After fitting the mains, EBMs then use a procedure to find which pairwise interactions look like they would help improve the model most. Then EBMs fit these important pairwise interactions using boosted trees that can only use two features in each tree, and thus can model pairwise interactions. Pairwise interactions are important in some problems, and less important in others. By being able to model both main effects and pairwise interactions, EBMs are able to be as accurate as XGBoost on most tabular datasets.

    • @allanhansen6923
      @allanhansen6923 3 года назад

      @@richcaruana2067 Cool, can you link to some data explaining exactly how these features are chosen?
      I (for some unknown reason) thought that it would multiply two features together to construct a new feature, and then fit a tree to that new feature, rather than having the tree split on either "original" features"?

    • @richcaruana2067
      @richcaruana2067 3 года назад

      @@allanhansen6923 Multiplying two features together to create pairwise interaction terms is a common practice in statistics if you then want to fit the interaction using things like linear or logistic regression. But if you're using modern machine learning methods there are advantages to directly modeling the interaction instead of multiplying features. Our code uses a heuristic procedure to quickly determine which interaction terms are most important. It's designed to be fast (because there are order n^2 pairwise interactions from which to select), and pretty good. In practice it works well, but there are other methods in the literature for selecting pairwise interactions that are based on analyzing what is learned by random forests or by neural nets. If you prefer an alternate method for selecting what pairs to include in the model, our code allows you to bypass automatic pair selection and instead specify which pairs you want in the model. This is also useful if you've been working with the same data set for a long time and want to use the same pairs you've used in previous models.

    • @allanhansen6923
      @allanhansen6923 3 года назад +3

      @@richcaruana2067 Cool! Would you be able to provide a Github link to the code handling the pairwise interactions? Or perhaps a link to an open-access article or preprint server describing the technique?

  • @carlofoschi-p7z
    @carlofoschi-p7z Год назад

    if two collinear features are ordered so that, among many, one comes right before the other, will EBMs assign more predictive power to whatever of these two features comes first?

  • @yanzhao1281
    @yanzhao1281 3 года назад +5

    Can anyone explain why the order of features does not matter with small learning rate?

    • @richcaruana2067
      @richcaruana2067 3 года назад +7

      If the learning rate is very small, then so little is learned each time you add a tree for a feature, that most of the variance is still unexplained and as you move to subsequent features they also get an opportunity to learn something that is useful for reducing variance. The lower the learning rate the less important the order in which you visit the features is. If the learning rate is very low, it's almost as if you're doing parallel updates for all the feature simultaneously.

  • @aaron10k
    @aaron10k Год назад

    I’m having trouble understanding the explanation for the model determining the cancer/asthma being good for covid at 9:52. I don’t understand the explanation of the sample bias

  • @zeno.x-edit02
    @zeno.x-edit02 Год назад

    Please tell me can we use EBM for multi-class classification?

  • @beautyisinmind2163
    @beautyisinmind2163 Год назад

    what is the difference between white box and glass box model? are they the same?

  • @amerryatheist
    @amerryatheist Год назад

    Are these models robust to correlated features?

  • @aaron10k
    @aaron10k Год назад

    This is sooo sick

  • @bktsys
    @bktsys 3 года назад +1

    CATboost is available for EBM now?

    • @richcaruana2067
      @richcaruana2067 3 года назад +1

      I like CATboost, but it's not available right now as a method for training EBMs. Because EBMs are restricted because of the need to be intelligible (i.e., they are restricted to being GAMs of main effects and pairwise interactions), we don't think CATboost would make a significant change in the expected accuracy of the final EBMs. But it would be fun to try anyway, and behind the scenes we have explored a variety of algorithms for training EBMs including things like BART (Bayesian Regression Trees) and LightGBM.

  • @piotr780
    @piotr780 Год назад

    but the same kind of plots we can create from gradient boosting machines, so what the point ? having separate trees does not provide any insight itself, nor being them additive, only interesting thing I see is that predictions going flat if data become sparse

  • @이준현-m3m
    @이준현-m3m 3 года назад

    Is there anyone who can explain the way I can get each feature's importance in EBM?

    • @richcaruana2067
      @richcaruana2067 3 года назад +1

      Try something like this:
      for f_name, importance in zip(ebm.feature_names, ebm.feature_importances_):
      print(f"{f_name}: {importance}")

  • @mbappekawani9716
    @mbappekawani9716 4 года назад +2

    Can you share a source for the estimate that 2500 lives will be saved a year (5:26), if the BUN risk curve was flattened?

    • @richcaruana2067
      @richcaruana2067 3 года назад

      We are working on that paper now, but don't currently have any other publications that explain this in detail. BTW, the dataset used to discover this dates back to 1989 and thus is quite old, and since then doctors have changed how they treat patients with high BUN and have begun to treat patients with BUN lower than 100 more aggressively, in part because the treatment itself has changed and become safer. Hope this helps --- sorry we don't have the BUN work written up yet!