Decision tree from scratch - theory | ML fundamentals | ML in Julia [Lecture 13]
HTML-код
- Опубликовано: 6 фев 2025
- Lecture article: open.substack....
You might have heard (or seen people write) that XGBoost is an incredibly good ML model that can solve complex industry-level problems.
At its core, XGBoost consists of a bunch of decision trees.
But decision trees are so simple. Then how are they so impactful? How can they be used in complex problems?
In fact, decision trees are more important than convolutional neural networks in most industry-level problems.
If you can pick and learn three things from ML, learn gradient descent, decision tree, and XGBoost. You will be better than 99% of the folks out there.
Okay, enough of the big-picture stuff. Let us talk about decision trees.
A Decision tree is a predictive model that maps decisions and their possible outcomes in a tree-like structure. It is used for both classification and regression tasks.
A normal tree has roots, branches, and leaves. A decision tree also has those. Except the decision tree is upside down. The root is at the top, and the leaves are at the bottom.
⇒ Root Node: Represents the entire dataset and is split into 2 or more branches.
⇒ Internal Nodes: Represent decision points based on feature conditions.
⇒ Leaf Nodes: Represent the final output (class or value).
Okay let us now consider a numerical example:
♢ Say you want to classify animals into cow, dog, or pig based on their height and weight. At the root node, you have all the data points.
What feature will you use to create branches from the root node? Weight or height? What is the rationale? This is when entropy enters the pic.
We then select the split that yields the largest reduction in entropy, or equivalently, the largest increase in information.
Consider splitting the root node using one of the 2 conditions.
♢ Weight above 120kg
♢ Height above 90cm
Which one is a better choice?
You decide based on entropy. Whichever split results in maximum entropy reduction, you go with that.
Steps involved in building a decision tree:
⇒Split the Data: At each node, the dataset is split based on a feature that provides the best entropy reduction
⇒Recursively repeat the process continues until one of the following is met:
All data points in a node belong to the same class.
A pre-defined depth of the tree is reached.
♢ The number of samples in a node is below a threshold.
I will be failing in my duty to teach you the decision tree if I don’t share this MLU link with you. Hats off to the creators of this module: mlu-explain.gi...
Basically, you need a good tree that can learn general patterns. Not a great tree that beats entropy to death (aka entropy = zero) and learn all the noise in the training data and scores utterly poorly in testing data.
I have recorded a full-blown lecture on this topic for Vizuara’s RUclips channel. Do check this out: • Decision tree from scr...