Decision Tree with R | Complete Example

Dr. Bharatendra Rai

Просмотров 131 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024

Комментарии • 261

@olivergasior8005 Год назад ⁺¹
I watched your videos to help through a data analytics degree and I'm now working in a job type similar to business analyst and looking back at these videos. Very easy to follow, punctual, and informative for getting the job done. Thank you
@bkrai Год назад
You are welcome and god luck!
@animeshdevarshi 7 лет назад ⁺⁵
Sir, I've been following lot of courses but never found something with so clarity. Thanks for posting these!
@bkrai 7 лет назад
Thanks for the feedback!
@user-uf5bk8zc7n 4 года назад ⁺³
Thanks Doc, after my 6 hrs class ...you went through all my confusions in just 18:43 mins. Such a worthy job!!!
@bkrai 4 года назад
Thanks for your feedback and comments!
@vijayarjunwadkar 3 года назад ⁺¹
Take a bow sir! For the first time, I had full clarity on Decision Tree and it's usage! Thanks a lot for this superb tutorial, lucky to find your channel, stay blessed! 👌👍🙏
@bkrai 3 года назад
Thanks for comments!
@ivanjcardona 2 года назад ⁺¹
You really made it simple. I have been watching others tutorial, but not anymore. I already subscribed. Thanks a lot.
@bkrai 2 года назад
You are welcome!
@askpioneer 2 года назад ⁺¹
hello sir, your way of explaining is so simple and effective. made topic simple.
i would like to add comment for all as well that i was getting error while using controls=ctree_control and after doing google and forum support , now i am able to run. and veiw tree. Great work sir.
@bkrai 2 года назад
Thanks for the update!
@UmairSajid 5 лет назад ⁺²
Hello Dr. Rai, thank you for a very informative video.
One thing that I would like to add based on my limited knowledge:
For a skewed class distribution such as in the data, it is more importance that the model is able to predict the abnormal cases then it is to predict normal cases. If we just look at the mis-classification error, then the model may be aligned towards the class with higher percentage of data. One way to avoid that is to reduce the disparity between the class types by over/under sampling techniques. Another way is to use the Area under the precision-recall curve as a measure of model evaluation.
Your comments and feedback on this would be appreciated.
@bkrai 5 лет назад
That's correct. For more details about class imbalance problem, refer to this link:
ruclips.net/video/Ho2Klvzjegg/видео.html
@ShivaKumarbudda 4 года назад ⁺¹
Hi, video posted 4 years ago today has become a saviour for my internal assessment
Thank you 😃
@bkrai 4 года назад
Welcome! You may also find this recent one useful:
ruclips.net/p/PL34t5iLfZddvGr66DPf-L-sSJ50XNwN3K
@plum-ish6679 2 года назад ⁺²
You are truly remarkable! The way you explain things is very simple to understand.
@bkrai 2 года назад
Thanks for comments!
@kabeeradebayo9014 7 лет назад ⁺¹
Thank you again for these complete episodes. You have been of a great help to me "Rai". Please, I'd appreciate a complete episode on the ensembles, essentially, heterogeneous ensemble using DT, SVM etc. inclusive as the base classifiers.
Comprehensive videos on ensembles are not common, in fact, I haven't come across any. It will go a long way If you could put something together on this. Thank you for your help!
@bkrai 7 лет назад ⁺¹
Thanks for the suggestion, I'll do it in near future!
@kabeeradebayo9014 7 лет назад
Sounds really great. Looking forward to it. Can't wait!
@sujitcap 6 лет назад ⁺¹
Sir, so much clarity ...How simple and easy you created ! Thank you .
@bkrai 6 лет назад
Thanks for comments!
@sudzbyte2215 4 года назад ⁺²
This is a great example of decision trees. Thank you!
@bkrai 4 года назад
Thanks for comments!
@ekfistek 4 года назад ⁺¹
Dr Rai, thanks for your videos. I have them useful in explaining basic machine learning methods. Thank you!
@bkrai 4 года назад
Thanks for comments!
@wasafisafi612 2 года назад ⁺¹
Thank you so much for your videos. I am learning everyday with them. May God bless you
@bkrai 2 года назад
Thanks for comments!
@christan7434 5 лет назад ⁺¹
Thank you Professor Rai for taking the time to show us the ropes. Regarding the mis-classification error table, may I know: what is the difference between that and the Confusion Matrix. I notice the calculation for "accuracy" is the same as the Confusion Matrix, simply "sum(diag(tab))/sum(tab)", but for Confusion Matrix, the Actual is on the vertical versus what you stated in video for Actuals in the horizontal. Thanks, and looking forward to more videos from you
@bkrai 5 лет назад
Both confusion matrix or mis-classification table are same.
@DABANG125 4 года назад ⁺³
Sir,
Greetings from the US,
I have enrolled in the machine learning course through Udemy as well but your explanation super simple and easier to implement.
Please do guide me with any book which I can use to practice more of such datasets
@bkrai 4 года назад
Deep learning is the hottest topic currently within machine learning field. To get started with practical examples you can try:
www.amazon.com/Advanced-Deep-Learning-designing-improving/dp/1789538777
@shaliniguha1822 6 лет назад ⁺¹
Sir, it'd be really nice if you can make a blog explaining the output in more details. For instance, an explanation of the statistical parameters measured in the confusion matrix. Your videos are really helpful! :)
@bkrai 6 лет назад
Thanks for your comments and suggestion! You may find decision tree related explanations in following video too:
ruclips.net/video/J2a9yV3kl-M/видео.html
@shesadevsha1994 5 лет назад ⁺¹
Hi Sir, I am so glad to see your all videos on related to machine learning in R, So request one thing if you share your datasets which you have used in your session that will be great
@bkrai 5 лет назад
You can get data file from the link in description area below the video.
@rithishvikram1759 4 года назад ⁺³
wow thank you sir....!!!!sir please make video of entropy splitting creation calculation it is very useful sir
@bkrai 4 года назад ⁺¹
Thanks for the suggestion, I've added it to my list.
@akshitbhalla874 5 лет назад ⁺¹
Your videos are honestly so amazing.
@bkrai 5 лет назад ⁺¹
Thanks for comments!
@hridayborah9750 4 года назад ⁺¹
very very clear and helpful. thanks tons
@bkrai 4 года назад
Thanks for comments!
@nayeemislam8123 5 лет назад ⁺¹
Sir, I have a few questions:
1. How do you find statistically significant variable after developing a decision tree model with all variables? Ho
2. Suppose all variables in a decision tree is coded as POOR, FAIR, GOOD, then how to find the probabilities of each (POOR, FAIR, GOOD) at non terminal nodes of the tree and also number of sample in each category? I need to show this in my plot.
3. What is the best approach in developing a decision tree model? Developing a model on the training data using K Fold Cross Validation OR Developing a model on training data and then going for cross-validation and pruning process using a function like cv.tree() which allows us to choose the tree with lowest cross validation error rate? Which method is better?
4. How to find out the value of the standardized importance of independent variables using CART in R?
@bkrai 5 лет назад
1. P-values on the tree indicate statistical significance.
2. You can find it only at the terminal node.
3. k-fold CV is always better to avoid over-fitting.
4. Higher a variable on the tree, more important it is. For variable importance you can also try this link:
ruclips.net/video/dJclNIN-TPo/видео.html
@carlosfernandezgalvez3023 5 лет назад ⁺³
Hi! thank you for all your videos.
I'd just like make a little comment: ctree function implements 'Conditional Inference Tree', not 'Clasification Tree'. In fact, it can develop clasification trees, but the fundamentals are different.
Thank you for all the work you are doing! very usefull.
Carlos
@bkrai 5 лет назад ⁺¹
Thanks for the update!
@AmarLakel 5 лет назад ⁺¹
Thank you for your help and all your videos. It's help me a lot
@bkrai 5 лет назад
Thanks for your comments!
@halyad4384 7 лет назад ⁺¹
Very informative and easy to understand.Thanks for sharing such an useful video.
@bkrai 7 лет назад
Thanks for the feedback!
@ehtishamraza2623 3 года назад ⁺¹
Really Great Explanation
@bkrai 3 года назад
Thanks for comments!
@bkrai 3 года назад
Also here is a link to more recent one:
ruclips.net/video/RCdu0z2Vyrw/видео.html
@tarapaider1729 7 лет назад
Your videos are always very easy to follow!!
@bkrai 7 лет назад
+Tara Paider thanks for the feedback 👍
@bonelwamnyameni 7 лет назад ⁺¹
This video as helped me a lot with my assignment, thank you so much.
@bkrai 7 лет назад
that's great!
@rakeshv6322 2 года назад ⁺¹
Thanks sir for detailed video..
@bkrai 2 года назад
Most welcome!
@fadedmachine 6 лет назад ⁺¹
You're the man. Keep up the great work!
@MrCaptainJeeves 8 лет назад ⁺¹
love all your videos...Please keeping uploading
@bkrai 8 лет назад
+pradeep paul Thanks for your feedback!
@atiquerahman3766 7 лет назад ⁺¹
Hi Sir, Your videos are really helpful.It has really helped me a lot, I have few doubts though.I have just started learning data science so these doubts may be naive.
1) On what basis we decide that we should put this much data into training, validation, and testing respectively?
2)Is there any criteria(such as r-square in regression models, Chi-square for logistic regression) for decision trees so that we can say how good our model is?
@bkrai 7 лет назад
1) one may experiment with different partitions such as 50:50, 60:40, 70:30, etc., and see what works best. There is no single partition ratio that will work well in all situations.
2) if your y variable is categorical, mis-classification error is used for model performance assessment.
@atiquerahman3766 7 лет назад ⁺¹
Thank you, sir!!
@sushantchaudhary2008 3 года назад
Thank you Dr Rai. I have a question about the tree pruning. Prior to the pruning some of the trees were able to classify patients as pathological but after pruning( by changing the control functions) none of the trees identify the pathological patients. If we were to specifically identify patients with suspected pathology how can we modify the control functions or the initial formula included in the "ctree()" function?
@mateuszbielik2912 2 года назад ⁺¹
Greetings! I came back to this video after a while as it still seems to be the best one regarding Decision Trees out there. I have a quiestion regarding significance of variables. Do you have a video covering this subject? Any techniques I could apply while working on my Decision Tree? thank you
@bkrai 2 года назад
You can use this link. For tree based methods, it provides variable importance plots to show which variables are important and which ones do not contribute much.
ruclips.net/video/hCLKMiZBTrU/видео.html
@mayankhmathur 6 лет назад ⁺¹
Nice explanation. thanks.
@aditidalvi255 5 лет назад ⁺¹
Sir plz can u suggest a good book for beginners in machine learning to have basic knowledge of all statistical tools ??
@Fsp01 4 года назад ⁺¹
brilliant! thank you Dr
@bkrai 4 года назад
You're most welcome!
@leolee618 6 лет назад ⁺¹
Thank you so much for your awesome video. I've learned a lot from it.
@bkrai 6 лет назад
Thanks for your feedback!
@bkrai 6 лет назад
Thanks for your feedback!
@raymondjiii 2 года назад ⁺¹
That was awesome but I found that with my dataset I get a completely different decision tree using the rpart package. Without rpart, the tree is what I expected it to be and with rpart - in some ways it's almost opposite. I'm only comparing the two trees with my training data.
@raymondjiii 2 года назад ⁺¹
I think I know what the problem is - with rpart trees you only get a little "yes" and "no" marker on the root node. In my case "yes" goes to the left of the tree and "no" goes to the right of the tree. If I assume that direction is always the case then things are okay. I do wish that the "yes", "no" little while boxes were printed at every non leaf node so it's very clear which way the path is going. (I wonder if there's an option for that?) Thanks for the great video.
@bkrai 2 года назад
See link below that has more detailed coverage:
ruclips.net/video/6SMrjEwFiQY/видео.html
@takakosuzuki2514 5 лет назад ⁺¹
Hi Dr.Rai. I encountered an error on #Misclassification part. I got the table for using the library(party), but I got "all argument must have the same length" when using the rpart() one. But if I use validate set with the rpart package, the table can be generated.
@bkrai 5 лет назад
Difficult to say much without looking a the code. But you can review your code again, there may be some typo.
@lorihearn6859 3 года назад ⁺¹
Is it only useful for numerical data? when all the independent variable are continuous? or it can be used for categorical ones too?
@bkrai 3 года назад
It's useful for both. See this more detailed example:
ruclips.net/video/6SMrjEwFiQY/видео.html
@muhammadnurdzakki1605 4 года назад ⁺²
Reading /Preparing csv data : 0:32
Decision Tree using rpart Package : 11:22
@bkrai 4 года назад
Thanks!
@ricardobrubaker4109 2 года назад
How can we export the first tree prediction (View(predict(tree,validate,type="prob"))) into XL? When using a data frame they come out horizontally and unreadable.
@nagarajaraja2546 7 лет назад ⁺¹
Hi sir ,
my s.nagaraj adiga your vedios are very simple to listen and it is easy to understand thank you very much .
@bkrai 7 лет назад
Thanks for the feedback!
@vairachilai3588 4 года назад ⁺¹
in confusion matrix(tab), the column is predicted data and row-wise actual data
@bkrai 4 года назад
In this video I have used predicted data in row and actual in column for the confusion matrix.
@vairachilai3588 4 года назад ⁺¹
Kindly check it, (table(predict(tree),data$NSP), Then the output will be taken in the following way, column is predicted data and row-wise actual data
@bkrai 4 года назад
Try this, it will make it more clear:
table(Predicted = predict(tree), Actual = data$NSP)
@rakeshvikhar 2 года назад ⁺¹
I am a beginner.. could you help me understand if we can use linear/logistic regression todo the prediction here? I have referred your vehicle example and so got confused if we can use that model here.
@bkrai 2 года назад
Yes, you can use logistic regression as response variable is of factor type. For more see:
ruclips.net/video/AVx7Wc1CQ7Y/видео.html
@TheIanoTube 4 года назад ⁺³
Would this work just as well if some variables were categorical? I.e. written in text but limited options
Thanks for the video
@bkrai 4 года назад ⁺¹
Yes, absolutely
@bkrai 4 года назад ⁺¹
You may also try this link:
ruclips.net/p/PL34t5iLfZddvGr66DPf-L-sSJ50XNwN3K
@TheIanoTube 4 года назад ⁺¹
Thank you, great channel. Subscribed!
@bkrai 4 года назад
Thanks!
@sovon08 6 лет назад ⁺²
Sir, if you could create a video for how to calculate gini, KS using R that would be really great
@bkrai 6 лет назад
Thanks for the suggestion, I've added this to my list.
@vishalaaa1 4 года назад ⁺¹
ctree dont support the dates. I tried the dates converted from posix. Can you please suggest the parameter in ctree that resolved this problem ?
@bkrai 4 года назад
Decision tree is not a good methods to work with dates. For dates you should use time series:
ruclips.net/p/PL34t5iLfZddt9X6Q6aq0H38gn-_JQ1RjS
@harishnagpal21 6 лет назад ⁺¹
Nice video Bharatendra. One question.. you said that we need to optimize the model.... how to do that ie how to optimize our model! Thanks
@bkrai 6 лет назад ⁺¹
You can make changes to settings in 'control' to see what helps to improve the model. In the example, I used only 3 variables just for illustration, but you must start with all variables for a better performance.
@harishnagpal21 6 лет назад
thanks :)
@sachiniwickramasinghe1912 4 года назад ⁺¹
thank you ! so helpful !
@bkrai 4 года назад
Thanks for comments!
@OrcaChess 6 лет назад ⁺¹
Hello! I gave my decision tree 97 different features but the decision tree only picked one of these features
to make his decision. Is that normal that it doesn't consider all the features for its decision?
@bkrai 6 лет назад
It runs with default setting. By making changes to default settings you may be able to make it include some more. But features that have very little impact on the response are unlikely to be included.
@DhingraRajan 6 лет назад
It can happen when one of the feature is the close predictor for y. Then that value is quite enough to predict the y alone.
@Twiste_Z 5 лет назад ⁺¹
i followed ur method with a dataset i created...its a simple one but the output is just printing the values of my dataset rather than plotting a tree and predicting...can u help me understand why
@bkrai 5 лет назад
Difficult to say much without looking at data and code.
@satishbharadwaj9539 6 лет назад ⁺¹
Sir, please post a video on Regression Splines, Polynomial Regression & Step Functions etc
@bkrai 6 лет назад
Thanks for the suggestion, I've added it to my list.
@mateuszbielik2912 2 года назад ⁺¹
great video, everything explained step by step. I have a question tho. some of my data in the DB file is char and i keep getting an error "data class "character" is not supported". how can i include this data in my experiments?
@bkrai 2 года назад ⁺¹
You change such variables to ‘factor’.
@mateuszbielik2912 2 года назад ⁺¹
@@bkrai omg thank you. so I can just use data$variableF
@bkrai 2 года назад ⁺¹
yes that should work.
@sallymusungu8983 Год назад
How do you remove ticks on the axes? Or realign the axis labels?
@romanozzie3530 6 лет назад ⁺¹
Amazing, thanks
@sudiptomitra 3 года назад
A comparative analysis on pre/post pruning of model would have completed the tutorial on Decision Tree.
@bala4you01 8 лет назад
Thank you, Dr. Roy for sharing simple and detailed explanation on Decision Tree. My query is can we plot ROC curve for Multiclass Data. (pROC package provides to calculate the AUC but I could not find how to plot ROC graph for multinominal data).
@bkrai 7 лет назад
At this time it only does it for binomial situation. You can now find roc curve video here:
ruclips.net/video/ypO1DPEKYFo/видео.html
@m.z.1809 5 лет назад ⁺¹
how can we validate the accuracy or discriminatory from this model?
i believe you can use the model outputs from train and validate to somehow calculate chi-square etc?
@bkrai 5 лет назад
You can validate the model built on training data with the help of validate data.
@javeda 6 лет назад
Hi,
I wanted to ask which is most appropriate software for conducting SEM along with moderation analysis, in case of categorical, nominal (binary and multinomial) and ordinal variables as outcome/dependent/endogenous variables ?
P.S:The predictor variables are scale,nominal and ordinal variables.
Regards
@oguzyavuz2010 4 года назад ⁺¹
let me ask, top of the variable of the picture is not dependent variable right? 5:46
@bkrai 4 года назад
It's a independent variable.
@oguzyavuz2010 4 года назад
@@bkrai sir can i ask some simple questions about tree diagram if you do not mind. I leave it here my gmail adress: ogzhnyvzz@gmail.com
@uhsay1986 5 лет назад ⁺¹
Hi SIR , how do we apply test set to predict function where the target var have NA values ? As wen i run the function it says predictor must have 2 levels.
@bkrai 5 лет назад
You need to impute missing values before developing the model.
@anananan3635 2 года назад ⁺¹
its just for numaric variables? is their another cod to charachter variabls
@bkrai 2 года назад
Change character variables to factor variables before using this.
@raniash3ban383 6 лет назад ⁺¹
very helpful thanks
@bkrai 6 лет назад
Thanks for comments!
@bkrai 6 лет назад
Thanks for comments!
@satyanarayanajammala5129 7 лет назад
very nice explanation keep it up
@bkrai 7 лет назад
thanks for the feedback!
@kanhabira 3 года назад
Thanks sir for this interesting video. I am facing a problem. My dependent variable is binary(0,1). When I run predict, the estimated values appear in in decimals despite remove "type". So, misspecification error is close to 1. Could you please suggest how I can get the predicted value as 0/1.
@uchenzei5160 4 года назад ⁺¹
When i try to create the missclassification table, it always gives me an error "all arguments have to be the same". Please what can i do ? I am new to data science
@neera842006 4 года назад ⁺¹
I am also getting same error message
@dhavalpatel1843 4 года назад ⁺¹
You should always pass the model as the first argumnet in predict function. The second parameter should be a data frame of predictor variables only. You can specify type=”prob” as an extra argument to get probabilities of every factor of y. Either type=”class” directly gives you the class of predicted values. By default type argument is set up differently for every R version.
@bkrai 4 года назад ⁺¹
Thanks for the update!
@sudanmac4918 4 года назад ⁺¹
Sir what is the difference between rpart() and ctree(). And when to use it??
@bkrai 4 года назад
It's just a different way to represent a tree. Note that both use the same algorithm.
@abhinavmishra7786 6 лет назад ⁺¹
Hi sir nice explanation...learnt about ctree function. Can you please illustrate how we can tune the decision tree model?
@bkrai 6 лет назад ⁺¹
Around 7:30 point in the video tuning is shown using "mincriterion" and "minsplit".
@abhinavmishra7786 6 лет назад
Bharatendra Rai my mistake sir...I mean pruning the decision tree
@bkrai 6 лет назад ⁺¹
You can do pruning by increasing values for "mincriterion" and "minsplit".
@abhinavmishra7786 6 лет назад ⁺¹
Bharatendra Rai thank u for clarifying sir
@piyalichoudhury3493 5 лет назад ⁺¹
like your videos... can you upload some on ensemble and AIC as well. will be very kind of you
@bkrai 5 лет назад ⁺¹
Thanks for comments and suggestion, I've added it to my list.
@preeyank5 8 лет назад ⁺¹
Thanks a ton!!
@bkrai 8 лет назад
+Preeyank Pable 👍👍👍
@tayabakhanum9707 8 лет назад
sir please tell me about classical or crisp decision tree
@zahraadamabdallah4116 Год назад ⁺¹
مفيد جدن
@bkrai Год назад
Thanks!
@mahumadil 8 лет назад
I have a query and i tried to google it but I couldn't find any satisfactory answer against it. The question is what is the difference between ctree and rpart tree?
@bkrai 8 лет назад
+Mahum Khan Cree is a function within package called "party" for decision tree. Similarly rpart is a function within a package with the same name "rpart". Both are use for decision tree. I prefer party as it is said to be more accurate. If you search "party vs rpart' you can see many good explanations.
@aravindhp5612 4 года назад ⁺¹
Sir why you will give set.seed(1234) why you can't give set.seed(12345).can you pls tell
@bkrai 4 года назад ⁺¹
It can be any number, but to get same samples use the same number next time too.
@saniamadoo5558 6 лет назад ⁺¹
hello sir....can you plz make a tutorial on how to implement fpgrowth in Rstudio!!! its urgent! plz plz help!
@ateendraagnihotri9744 3 года назад ⁺¹
Sir can you provide this dataset which you have used
@bkrai 3 года назад
There is a link below this video.
@gebriadinda6405 7 лет назад
Excuse me, sir. Can you help me? I tried this script into my data. i have 100 observation of 1383 variables. I got the result "Conditional inference tree with 1 terminal nodes" and "Number of observations: 83". However, i can't get the decision trees., i just get the histogram. Can you help me, sir? why it's happen?? Thank you, sir.
@bkrai 7 лет назад
+Gebri Adinda you can send data and I can look into it.
@aisha555ms2000 5 лет назад
@@bkrai , Sir I get the same error , "Conditional inference tree with 1 terminal nodes" only histogram and number of observations=144..can you help?
@ronithNR 7 лет назад ⁺¹
sir, could u make a video on Random forest.
@Steamlala 5 лет назад ⁺¹
Dear Sir
Thank you for your video. Can you do a tutorial on R where multiple tree base models ( Decision tree , Random Forest, Gradient Boosting, Logistic and etc..) comparing each other on the same chart using ROC to represent the visualization and split them by training vs validate data set? It would be a great help for this type of visualization especially presenting to management. Thank you !
@bkrai 5 лет назад
Thanks for comments and suggestion that I'll work on in near future. Meanwhile here is a link where you can quickly get ROC that plots and compares several methods such as decision tree, logistic regression, svm, random forest, etc., on the same ROC plot.
ruclips.net/video/J2a9yV3kl-M/видео.html
@Steamlala 5 лет назад ⁺¹
Thank you Sir. The above youtube tutorial is really good. Looking forward on your awesome tutorial on comparison of multiple classification models comparison in one graph split between Train & validate.
@bkrai 5 лет назад
Thanks!
@kartikchauhan2845 4 года назад ⁺¹
Sir how would you increase the number of nodes?
@bkrai 4 года назад
You can change mincriterion and minsplit in the controls part for that.
@bkrai 4 года назад
For a more recent one, see below:
ruclips.net/p/PL34t5iLfZddvGr66DPf-L-sSJ50XNwN3K
@ITGuySam 8 лет назад
Thank you for your video. I'd like to know that what do you mean "set.seed(1234)"? why don't use set.seed (2) or ..
and do we can use "ifelse" instead of definition "pd"? which way is better?
@bkrai 8 лет назад
+Info A set.seed(1234) is just an example, you may use any other number. The idea is to reproduce results which any number can achieve. 'pd' was used for 'partitioning data' and it's just a name, you may use any other name, that will be fine too.
@ningrongye339 7 лет назад
Hi sir, Thank you for the video, it's very helpful! But I still not understand why your model could not predict the 3 model? If we you all the items could we predict more precisely? Thank you!
@bkrai 7 лет назад
That's correct! To obtain the final model we need to include all items and that will improve model performance.
@subashinirajan2841 7 лет назад
Hello sir, I'm implementing the same steps for my own set of data. But I am getting an error in the Misclassification part as "all arguments must have the same length". Will it be ok if you can check my code and let me know where I am going wrong? If it's ok for you then I will send you the code and data.
@bkrai 7 лет назад
yes send the code.
@subashinirajan2841 7 лет назад
Thank you sir. To which email id I should send the code. My email id is subashinivec@gmail.com
@sriharshabsathreya 7 лет назад ⁺¹
Sir,how to choose the Complexity parameter (CP Value)for Tree pruning ?
@kumarmithun2723 6 лет назад
For this, you will have to build rpart model and then you can prune the tree basis on CP value(by printcp(rpart_model) and we choose cp value minimum to prune tree further )
@caterinacevallos9822 6 лет назад
Could you please explain me this a little bit more?
pd
@bkrai 6 лет назад
You can go over this that has more detail:
ruclips.net/video/aS1O8EiGLdg/видео.html
@VenkateshDataScientist 6 лет назад
R Studio doubt :
I am building a predictive model with 1 million observations and having 15 variables .i am getting error like -" Can not allocate the vector of 432GB "
or " Can not allocate the vector of 3.8 GB "
I am using 16GB RAM .my file size is just 140MB . and i closed all the applications in my system .still error remains same .
Any suggestions much appreciated..
@bkrai 6 лет назад
You can probably take sample for creating model with huge data. The difference between model based on a good sample and all data may not be significant. You can also try faster algorithms such as extreme gradient boosting:
ruclips.net/video/woVTNwRrFHE/видео.html
@VenkateshDataScientist 6 лет назад ⁺¹
Bharatendra Rai sure sir ,I will try today
@akkimalhotra26 8 лет назад ⁺¹
dear sir, how can i get the data set that you are using
@bkrai 8 лет назад ⁺¹
your email?
@bkrai 8 лет назад ⁺¹
Actually I don't need email. You can get data from:
sites.google.com/site/raibharatendra/home/decision-tree
@sndrstpnv8419 7 лет назад ⁺¹
may add more about CHAID trees
@bkrai 7 лет назад ⁺¹
Thanks! I'll keep it in mind.
@maxxsensation3516 6 лет назад
right after I have installed the package my commands arnt working please reply ASAP
install.packages("party")
library(party)
data$NSPF
@bkrai 6 лет назад
which line is not working?
@anigov 6 лет назад
try
footballtree
@ronithNR 7 лет назад
hello sir its great video does the rpart uses gini index?
@bkrai 7 лет назад
It uses altered priors method.
@vishnukowndinya 7 лет назад
hi sir can u pls explain about pruning of tree. on what basis we do prune ?
@bkrai 7 лет назад ⁺¹
When you have decision trees that are too big, 'pruning' helps to reduce size of the tree by removing those parts that do not help much in correct prediction of the outcome. It helps to avoid over-fitting and improve prediction model accuracy.
@meghadabhade6967 7 лет назад
Hello Bharatendra Sir, Can you please guide me how to implement perturbation method in R?
Currently I classified data using classification (decision tree). Now I want to perturb data and follow same classification. I am unable to proceed. Can you please upload some videos illustrating how to implement perturbation method using R. its very urgent for me.
@bkrai 7 лет назад ⁺¹
Megha, here is the link for perturbation analysis. Note that it can be used for only regression like models. It may not work with decision trees.
ruclips.net/video/Jz97ccAIyj8/видео.html
@anandsalunke180 8 лет назад
what if there are two target variables like NSP and some other. what deecision tree techniques to use?what will be the formula?
@bkrai 8 лет назад
You can make two separate trees.
@anandsalunke180 8 лет назад
how we will derive the formula?based on what atributes
@bkrai 8 лет назад
Decision tree algorithm will automatically choose the attributes or independent variables depending on the parameters such as minimum sample size for splitting, statistical significance, etc., that you choose.
@atanunow 7 лет назад
getting error in #Misclassification error in testing data.
it is prompting " all arguments must have the same length"
Sir, please help me out.
@bkrai 7 лет назад
Probably there could be some mix up with training and testing data.
@atanunow 7 лет назад
Bharatendra Rai okay sir! Let me try once again ...if i get stuck again, can i share my codes here ?
@atanunow 7 лет назад ⁺¹
Bharatendra Rai sir, it was my fault, you were right ..
Now it is working fine.
@pawanmishra5472 7 лет назад
i cannot download the data from the link.it does not exist.
@bkrai 7 лет назад
Here is the link for data:
sites.google.com/site/raibharatendra/home/decision-tree
@bkrai 7 лет назад
I've created new links now:
drive.google.com/open?id=0B5W8CO0Gb2GGa09Ma3NzTVpyOWM
drive.google.com/open?id=0B5W8CO0Gb2GGMzJGbkdGUGREYjA
@divyadamodaran53 8 лет назад ⁺¹
what does the p value represents??
@bkrai 8 лет назад
+divya damodaran A p-value of 0.05 means 95% (1 - 0.05 = 0.95) confidence in concluding the variable to be statistically significant.
@divyadamodaran53 8 лет назад
okay thankyou..
@raghul4457 6 лет назад
hi, can u provide me the explanation of how over fitting occurs in decision tree?
@bkrai 6 лет назад
When terminal nodes have very small sample sizes, decision tree model is likely to have over-fitting. Due to small sample sizes, decisions arrived in the terminal node may not be very stable.
@sriharshabsathreya 7 лет назад
Sir how can be decision tree can be used for variable selection
@bkrai 7 лет назад
Importance of a variable in the tree is reflected by it's position. For example, the one at top of the tree is the most important.
@meghadabhade6967 7 лет назад
Sir I m getting following error: when I tried to execute #Misclassification error in testing data
tab
@bkrai 7 лет назад
You can use following lines:
pred
@meghadabhade6967 7 лет назад
Its throwing error within same project as well outside that project. Why is it so, it worked first time but now its not working and throwing error
@bkrai 7 лет назад
+Megha Dabhade send me the entire code to look into.
@meghadabhade6967 7 лет назад
I was using wrong tree in predict function which is having different length. Now its working fine.
Thanks for your guidance.
@bkrai 7 лет назад
+Megha Dabhade 👍

Следующие

Автовоспроизведение

Naive Bayes Classification with R | Example with Steps