Please make a complete series on how you create such astonishing animations to explain the concepts ...I think you use manim library of python...It would be of great help
please i have a question about your excellent video : what is the goal of the stopping criterion : min_samples_split. I understood all the video which is very clear but i miss this point. Thank you from France
This is a great video! I have watched all your decision tree videos, but i still wondering how to make decision tree regression or classification if you have categorical data on it?
@@kayodeoyeniran2862 For numerical features you have to check all the unique values of that feature (1,2,3,4,5...) as potential thresholds. For categorical features you do the same but for each category (color=red,color=blue,color=yellow, ...)
hello, I have a problem please, when I wrote the codes jupyter said " invalid syntax" !!!!, for the first codes of partition"class Node'', any help ,please
Hello, very nice video! Is there a way to say this model r2 values is above .7 or something in decision tree regression model? if yes, how we can extract that r2 in case of your given example?
You can calculate r2 value for any regression model by simply calculating sum of squares of explained error divided by sum of squares of total error sum of squares of total error (SST)= sum of ( y(actual) - y(mean) )^2 sum of squares of residual error (SSR)= sum of ( y(actual) - y(pred) )^2 r2 = 1 - (SSR/SST)
Thanks for the great videos. A question though. To quantify accuracy of your predictions, you use RMSE which is not a dim-less measure of error. I am just wondering about the value of RMSE normalized with mean(Yi). Thanks again.
This code actually gives you wrong on bike sharing data set, when some of the columns are taken out, because it sometimes try to acces the var_red before placing it into the dictionary so:I'll paste the bug fixed code in next comment, check with the data set. But thanks you gave great code. class DecisionTreeRegressor(): def __init__(self, min_samples_split=2, max_depth=2): ''' constructor ''' self.root = None self.min_samples_split = min_samples_split self.max_depth = max_depth def build_tree(self, dataset, curr_depth=0): ''' recursive function to build the tree ''' X, Y = dataset[:, :-1], dataset[:, -1] num_samples, num_features = np.shape(X) # Split until stopping conditions are met if num_samples >= self.min_samples_split and curr_depth 0: left_subtree = self.build_tree(best_split["dataset_left"], curr_depth + 1) right_subtree = self.build_tree(best_split["dataset_right"], curr_depth + 1) return Node(best_split["feature_index"], best_split["threshold"], left_subtree, right_subtree, best_split["var_red"]) # Compute leaf node leaf_value = self.calculate_leaf_value(Y) return Node(value=leaf_value) def get_best_split(self, dataset, num_samples, num_features): ''' function to find the best split ''' best_split = None max_var_red = -float("inf") for feature_index in range(num_features): feature_values = dataset[:, feature_index] possible_thresholds = np.unique(feature_values) for threshold in possible_thresholds: dataset_left, dataset_right = self.split(dataset, feature_index, threshold) if len(dataset_left) > 0 and len(dataset_right) > 0: y, left_y, right_y = dataset[:, -1], dataset_left[:, -1], dataset_right[:, -1] curr_var_red = self.variance_reduction(y, left_y, right_y) # Print the current variance reduction being calculated #print(f"Variance reduction for feature {feature_index} at threshold {threshold}: {curr_var_red}") # Update the best split if needed if curr_var_red > max_var_red: best_split = { "feature_index": feature_index, "threshold": threshold, "dataset_left": dataset_left, "dataset_right": dataset_right, "var_red": curr_var_red } max_var_red = curr_var_red return best_split
you are doing a great work, the videos are really helpful, keep up the good work!!
Thanks a lot!
Just beautiful! Thank you so much for all your effort, it is greatly appreciated!
you make videos so quickly, keep up the good work!!!
Thanks, will do!
Thank you so much for the awesome explanation!
Please make a video on decision tree using grid search cv and explaining all the hyperparameters of it
i am watching from Brazil, great video
Please make a complete series on how you create such astonishing animations to explain the concepts ...I think you use manim library of python...It would be of great help
Thank you.
please i have a question about your excellent video : what is the goal of the stopping criterion : min_samples_split. I understood all the video which is very clear but i miss this point. Thank you from France
This is a great video! I have watched all your decision tree videos, but i still wondering how to make decision tree regression or classification if you have categorical data on it?
In case of categorical variables the nodes will ask the question like this "if x1 = category 1". Instead of less than equal we'll check equality.
Do we need to explicitly state the equality condition for categorical feature in the code?
@@kayodeoyeniran2862 For numerical features you have to check all the unique values of that feature (1,2,3,4,5...) as potential thresholds. For categorical features you do the same but for each category (color=red,color=blue,color=yellow, ...)
if the max depth is too high, I get a key error during the best split routine. Is there a way to fix that?
beside the airfoil csv, can i apply other data and apply to this model ?
why do you use variance reduction? I see other sources use sum of squared errors for splitting
hello, I have a problem please, when I wrote the codes jupyter said " invalid syntax" !!!!, for the first codes of partition"class Node'', any help ,please
Hello, very nice video!
Is there a way to say this model r2 values is above .7 or something in decision tree regression model? if yes, how we can extract that r2 in case of your given example?
You can calculate r2 value for any regression model by simply calculating sum of squares of explained error divided by sum of squares of total error
sum of squares of total error (SST)= sum of ( y(actual) - y(mean) )^2
sum of squares of residual error (SSR)= sum of ( y(actual) - y(pred) )^2
r2 = 1 - (SSR/SST)
can you help me convert the variance_reduction to a calculation of SSR ?
Thanks for the great videos. A question though. To quantify accuracy of your predictions, you use RMSE which is not a dim-less measure of error. I am just wondering about the value of RMSE normalized with mean(Yi). Thanks again.
Yeah you can surely do that.
Noise is indicative of vibration.
This code actually gives you wrong on bike sharing data set, when some of the columns are taken out, because it sometimes try to acces the var_red before placing it into the dictionary so:I'll paste the bug fixed code in next comment, check with the data set. But thanks you gave great code.
class DecisionTreeRegressor():
def __init__(self, min_samples_split=2, max_depth=2):
''' constructor '''
self.root = None
self.min_samples_split = min_samples_split
self.max_depth = max_depth
def build_tree(self, dataset, curr_depth=0):
''' recursive function to build the tree '''
X, Y = dataset[:, :-1], dataset[:, -1]
num_samples, num_features = np.shape(X)
# Split until stopping conditions are met
if num_samples >= self.min_samples_split and curr_depth 0:
left_subtree = self.build_tree(best_split["dataset_left"], curr_depth + 1)
right_subtree = self.build_tree(best_split["dataset_right"], curr_depth + 1)
return Node(best_split["feature_index"], best_split["threshold"],
left_subtree, right_subtree, best_split["var_red"])
# Compute leaf node
leaf_value = self.calculate_leaf_value(Y)
return Node(value=leaf_value)
def get_best_split(self, dataset, num_samples, num_features):
''' function to find the best split '''
best_split = None
max_var_red = -float("inf")
for feature_index in range(num_features):
feature_values = dataset[:, feature_index]
possible_thresholds = np.unique(feature_values)
for threshold in possible_thresholds:
dataset_left, dataset_right = self.split(dataset, feature_index, threshold)
if len(dataset_left) > 0 and len(dataset_right) > 0:
y, left_y, right_y = dataset[:, -1], dataset_left[:, -1], dataset_right[:, -1]
curr_var_red = self.variance_reduction(y, left_y, right_y)
# Print the current variance reduction being calculated
#print(f"Variance reduction for feature {feature_index} at threshold {threshold}: {curr_var_red}")
# Update the best split if needed
if curr_var_red > max_var_red:
best_split = {
"feature_index": feature_index,
"threshold": threshold,
"dataset_left": dataset_left,
"dataset_right": dataset_right,
"var_red": curr_var_red
}
max_var_red = curr_var_red
return best_split
getting error
1 frames
Where facecam?
Will do in future ;)