Great Video Dan, it was eye-opening! Thank you so much from NYC! just one note that, boosting and Bagging methods are not just for the tree-based ML systems and can be used with any ML method. However, they are much more popular for tree-based methods due to their fast training time and relatively straightforward application.
HI SIR I AM ZAKIYAH FATHIMA M. I AM 12 YEARS OLD .I USED TO WATCH YOUR VIDEOS AND SUNDAS MAM'S CHANNEL. MY DREAM IS TO BECOME A DATASCIENTIST . I KNOW THE PROGRAM LANGUAGE PYTHON .
Concerning '# of positive reviews' feature: I have to assume that there exists a subset of fraudulent sellers using bots/review farms to boost #/ratio of positive review. If positive reviews are locally important for non-fraudulent true positives, I imagine that this could potentially lead to a recall problem in our model. thoughts?
PCA is a feature extraction technique. Feature selection techniques would choose from features list, extraction techniques would create features which capture the majority of vairance. Whatever the interviewee chose for feature selection are good I feel.
Im bagging, We won't say a model as weak leaner's.We use the word weak learners only in boosting and to specifically in Adaboost, because it only has a stumps for prediction not a full tree so only we say adaboost models as a weak learners
Isn't the term "variance" in the first question better phrased as "precision" ? I think that's the term we use in econometric class would hate to be unable to answer a question well because of terminology.
Higher variance means more flexibility? In general, can't you look at variance in the same way you look at overfitting. I.e., a model with vary high variance will capture outliers, tend to overfit data that doesn't accurately represent the underlying phenomena that produced the data. In this case, wouldn't it make sense to say it does NOT correspond to more flexibility, since the higher variance means it is better suited for ONLY the training data? Just curious where my logic is straying from the interviewers. Thank you for posting this it has been very informative!
I thought more flexibility (like a neural net model is more flexible than linear regression) means more precision (aka lower variance) but risk overfitting.
@@tuanseattle so the part we are disagreeing on is the definition of variance. In my head, I was using variance as in the separation from the mean (in which case, increasing precision captures strays from the mean, thus increasing variance) whereas you are using variance as the opposite of precision, I.e. Separation from the true data set, it seems. Nonetheless, what you say makes sense as well when looking at it that way.
@Drewbie_T By flexible, Dan means complexity of the model. More complex the model is, i.e the decision boundaries have been fit in such a way that the model performs exceeding well on the training data, then the chances are high that the model might not perform well on testing data. This is the case of high variance and low bias.
Great Video Dan, it was eye-opening! Thank you so much from NYC! just one note that, boosting and Bagging methods are not just for the tree-based ML systems and can be used with any ML method. However, they are much more popular for tree-based methods due to their fast training time and relatively straightforward application.
Hmm interesting
HI SIR I AM ZAKIYAH FATHIMA M. I AM 12 YEARS OLD .I USED TO WATCH YOUR VIDEOS AND SUNDAS MAM'S CHANNEL. MY DREAM IS TO BECOME A DATASCIENTIST . I KNOW THE PROGRAM LANGUAGE PYTHON .
Concerning '# of positive reviews' feature: I have to assume that there exists a subset of fraudulent sellers using bots/review farms to boost #/ratio of positive review. If positive reviews are locally important for non-fraudulent true positives, I imagine that this could potentially lead to a recall problem in our model. thoughts?
excellent mock
Good stuff!
? from where hyperparamter comes into decision boundary. which kind of intangible things are they cooking on their own. God please save.
PCA is a feature extraction technique. Feature selection techniques would choose from features list, extraction techniques would create features which capture the majority of vairance. Whatever the interviewee chose for feature selection are good I feel.
In classification we use to have precision-recall tradeoff ryt?
I feel like the dude got lost in the sauce with seller based, listing based type shit.
is it just me or you'll rather do clustering to find labels, then classify....
Is this a typical interview for an L4 or L5 role?
Im bagging, We won't say a model as weak leaner's.We use the word weak learners only in boosting and to specifically in Adaboost, because it only has a stumps for prediction not a full tree so only we say adaboost models as a weak learners
Thanks very much! learned a lot🤗
interesting, thank you
Great mock interview and I believe it is pretty representative! Thanks for providing this!
Isn't the term "variance" in the first question better phrased as "precision" ? I think that's the term we use in econometric class
would hate to be unable to answer a question well because of terminology.
Higher variance means more flexibility? In general, can't you look at variance in the same way you look at overfitting. I.e., a model with vary high variance will capture outliers, tend to overfit data that doesn't accurately represent the underlying phenomena that produced the data. In this case, wouldn't it make sense to say it does NOT correspond to more flexibility, since the higher variance means it is better suited for ONLY the training data? Just curious where my logic is straying from the interviewers. Thank you for posting this it has been very informative!
I thought more flexibility (like a neural net model is more flexible than linear regression) means more precision (aka lower variance) but risk overfitting.
@@tuanseattle so the part we are disagreeing on is the definition of variance. In my head, I was using variance as in the separation from the mean (in which case, increasing precision captures strays from the mean, thus increasing variance) whereas you are using variance as the opposite of precision, I.e. Separation from the true data set, it seems. Nonetheless, what you say makes sense as well when looking at it that way.
@Drewbie_T By flexible, Dan means complexity of the model. More complex the model is, i.e the decision boundaries have been fit in such a way that the model performs exceeding well on the training data, then the chances are high that the model might not perform well on testing data. This is the case of high variance and low bias.