Decision Tree Regression in Python (from scratch!)

Поделиться
HTML-код
  • Опубликовано: 3 дек 2024

Комментарии • 29

  • @piyushnashani7162
    @piyushnashani7162 3 года назад +6

    you are doing a great work, the videos are really helpful, keep up the good work!!

  • @dominiquedewet3311
    @dominiquedewet3311 Год назад +2

    Just beautiful! Thank you so much for all your effort, it is greatly appreciated!

  • @averyiorio4337
    @averyiorio4337 3 года назад +1

    you make videos so quickly, keep up the good work!!!

  • @rishabhjain1418
    @rishabhjain1418 6 месяцев назад

    Thank you so much for the awesome explanation!

  • @mohandev7385
    @mohandev7385 3 года назад +3

    Please make a video on decision tree using grid search cv and explaining all the hyperparameters of it

  • @torvess
    @torvess 4 месяца назад

    i am watching from Brazil, great video

  • @nipunkulshrestha4329
    @nipunkulshrestha4329 6 месяцев назад

    Please make a complete series on how you create such astonishing animations to explain the concepts ...I think you use manim library of python...It would be of great help

  • @Tusharlone-hc4zo
    @Tusharlone-hc4zo 2 года назад +1

    Thank you.

  • @spartias6154
    @spartias6154 18 дней назад

    please i have a question about your excellent video : what is the goal of the stopping criterion : min_samples_split. I understood all the video which is very clear but i miss this point. Thank you from France

  • @keshia822
    @keshia822 3 года назад +3

    This is a great video! I have watched all your decision tree videos, but i still wondering how to make decision tree regression or classification if you have categorical data on it?

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      In case of categorical variables the nodes will ask the question like this "if x1 = category 1". Instead of less than equal we'll check equality.

    • @kayodeoyeniran2862
      @kayodeoyeniran2862 Год назад

      Do we need to explicitly state the equality condition for categorical feature in the code?

    • @BigNickPoodle
      @BigNickPoodle Год назад

      @@kayodeoyeniran2862 For numerical features you have to check all the unique values of that feature (1,2,3,4,5...) as potential thresholds. For categorical features you do the same but for each category (color=red,color=blue,color=yellow, ...)

  • @g_arm_
    @g_arm_ 2 года назад +1

    if the max depth is too high, I get a key error during the best split routine. Is there a way to fix that?

  • @TTS888
    @TTS888 Год назад

    beside the airfoil csv, can i apply other data and apply to this model ?

  • @juanrenatojimenezpasache5910
    @juanrenatojimenezpasache5910 2 года назад

    why do you use variance reduction? I see other sources use sum of squared errors for splitting

  • @maryamazeez7830
    @maryamazeez7830 2 года назад

    hello, I have a problem please, when I wrote the codes jupyter said " invalid syntax" !!!!, for the first codes of partition"class Node'', any help ,please

  • @_jakiya_hereweare5270
    @_jakiya_hereweare5270 2 года назад

    Hello, very nice video!
    Is there a way to say this model r2 values is above .7 or something in decision tree regression model? if yes, how we can extract that r2 in case of your given example?

    • @aakashkarmakar7478
      @aakashkarmakar7478 Год назад +1

      You can calculate r2 value for any regression model by simply calculating sum of squares of explained error divided by sum of squares of total error
      sum of squares of total error (SST)= sum of ( y(actual) - y(mean) )^2
      sum of squares of residual error (SSR)= sum of ( y(actual) - y(pred) )^2
      r2 = 1 - (SSR/SST)

  • @TheTOMAVITAN
    @TheTOMAVITAN 2 года назад

    can you help me convert the variance_reduction to a calculation of SSR ?

  • @amirtaghavy7647
    @amirtaghavy7647 3 года назад

    Thanks for the great videos. A question though. To quantify accuracy of your predictions, you use RMSE which is not a dim-less measure of error. I am just wondering about the value of RMSE normalized with mean(Yi). Thanks again.

  • @JimWright1950
    @JimWright1950 2 года назад +1

    Noise is indicative of vibration.

  • @SAGAR-vv3pc
    @SAGAR-vv3pc Месяц назад

    This code actually gives you wrong on bike sharing data set, when some of the columns are taken out, because it sometimes try to acces the var_red before placing it into the dictionary so:I'll paste the bug fixed code in next comment, check with the data set. But thanks you gave great code.
    class DecisionTreeRegressor():
    def __init__(self, min_samples_split=2, max_depth=2):
    ''' constructor '''
    self.root = None
    self.min_samples_split = min_samples_split
    self.max_depth = max_depth
    def build_tree(self, dataset, curr_depth=0):
    ''' recursive function to build the tree '''
    X, Y = dataset[:, :-1], dataset[:, -1]
    num_samples, num_features = np.shape(X)
    # Split until stopping conditions are met
    if num_samples >= self.min_samples_split and curr_depth 0:
    left_subtree = self.build_tree(best_split["dataset_left"], curr_depth + 1)
    right_subtree = self.build_tree(best_split["dataset_right"], curr_depth + 1)
    return Node(best_split["feature_index"], best_split["threshold"],
    left_subtree, right_subtree, best_split["var_red"])
    # Compute leaf node
    leaf_value = self.calculate_leaf_value(Y)
    return Node(value=leaf_value)
    def get_best_split(self, dataset, num_samples, num_features):
    ''' function to find the best split '''
    best_split = None
    max_var_red = -float("inf")
    for feature_index in range(num_features):
    feature_values = dataset[:, feature_index]
    possible_thresholds = np.unique(feature_values)
    for threshold in possible_thresholds:
    dataset_left, dataset_right = self.split(dataset, feature_index, threshold)
    if len(dataset_left) > 0 and len(dataset_right) > 0:
    y, left_y, right_y = dataset[:, -1], dataset_left[:, -1], dataset_right[:, -1]
    curr_var_red = self.variance_reduction(y, left_y, right_y)
    # Print the current variance reduction being calculated
    #print(f"Variance reduction for feature {feature_index} at threshold {threshold}: {curr_var_red}")
    # Update the best split if needed
    if curr_var_red > max_var_red:
    best_split = {
    "feature_index": feature_index,
    "threshold": threshold,
    "dataset_left": dataset_left,
    "dataset_right": dataset_right,
    "var_red": curr_var_red
    }
    max_var_red = curr_var_red
    return best_split

  • @kritiakash
    @kritiakash 2 года назад

    getting error
    1 frames

  • @PsynideNeel
    @PsynideNeel 3 года назад +1

    Where facecam?