Amazon Data Scientist Mock Interview - Fraud Model

Поделиться
HTML-код
  • Опубликовано: 28 сен 2024
  • НаукаНаука

Комментарии • 22

  • @hsoley
    @hsoley 2 года назад +12

    Great Video Dan, it was eye-opening! Thank you so much from NYC! just one note that, boosting and Bagging methods are not just for the tree-based ML systems and can be used with any ML method. However, they are much more popular for tree-based methods due to their fast training time and relatively straightforward application.

  • @ashokjaiswal7384
    @ashokjaiswal7384 2 года назад +18

    Hmm interesting

  • @zakiyahfathimam.7786
    @zakiyahfathimam.7786 2 года назад +11

    HI SIR I AM ZAKIYAH FATHIMA M. I AM 12 YEARS OLD .I USED TO WATCH YOUR VIDEOS AND SUNDAS MAM'S CHANNEL. MY DREAM IS TO BECOME A DATASCIENTIST . I KNOW THE PROGRAM LANGUAGE PYTHON .

  • @rr00676
    @rr00676 2 года назад

    Concerning '# of positive reviews' feature: I have to assume that there exists a subset of fraudulent sellers using bots/review farms to boost #/ratio of positive review. If positive reviews are locally important for non-fraudulent true positives, I imagine that this could potentially lead to a recall problem in our model. thoughts?

  • @ozziejin
    @ozziejin 2 года назад

    excellent mock

  • @corymaklin7864
    @corymaklin7864 2 года назад

    Good stuff!

  • @MrMandarpriya
    @MrMandarpriya Год назад

    ? from where hyperparamter comes into decision boundary. which kind of intangible things are they cooking on their own. God please save.

  • @gpprudhvi
    @gpprudhvi 2 года назад +5

    PCA is a feature extraction technique. Feature selection techniques would choose from features list, extraction techniques would create features which capture the majority of vairance. Whatever the interviewee chose for feature selection are good I feel.

  • @shilashm5691
    @shilashm5691 2 года назад +2

    In classification we use to have precision-recall tradeoff ryt?

  • @yoyo-ue5pf
    @yoyo-ue5pf 6 месяцев назад

    I feel like the dude got lost in the sauce with seller based, listing based type shit.

  • @xEl_ence
    @xEl_ence 2 года назад

    is it just me or you'll rather do clustering to find labels, then classify....

  • @aaronrasquinha
    @aaronrasquinha Год назад

    Is this a typical interview for an L4 or L5 role?

  • @shilashm5691
    @shilashm5691 2 года назад +3

    Im bagging, We won't say a model as weak leaner's.We use the word weak learners only in boosting and to specifically in Adaboost, because it only has a stumps for prediction not a full tree so only we say adaboost models as a weak learners

  • @mikekwabs5131
    @mikekwabs5131 2 года назад

    Thanks very much! learned a lot🤗

  • @EveGlowsTLS
    @EveGlowsTLS 2 года назад

    interesting, thank you

  • @danielxing1034
    @danielxing1034 2 года назад +2

    Great mock interview and I believe it is pretty representative! Thanks for providing this!

  • @tuanseattle
    @tuanseattle 2 года назад

    Isn't the term "variance" in the first question better phrased as "precision" ? I think that's the term we use in econometric class
    would hate to be unable to answer a question well because of terminology.

  • @Drewbie_T
    @Drewbie_T 2 года назад

    Higher variance means more flexibility? In general, can't you look at variance in the same way you look at overfitting. I.e., a model with vary high variance will capture outliers, tend to overfit data that doesn't accurately represent the underlying phenomena that produced the data. In this case, wouldn't it make sense to say it does NOT correspond to more flexibility, since the higher variance means it is better suited for ONLY the training data? Just curious where my logic is straying from the interviewers. Thank you for posting this it has been very informative!

    • @tuanseattle
      @tuanseattle 2 года назад +1

      I thought more flexibility (like a neural net model is more flexible than linear regression) means more precision (aka lower variance) but risk overfitting.

    • @Drewbie_T
      @Drewbie_T 2 года назад +1

      @@tuanseattle so the part we are disagreeing on is the definition of variance. In my head, I was using variance as in the separation from the mean (in which case, increasing precision captures strays from the mean, thus increasing variance) whereas you are using variance as the opposite of precision, I.e. Separation from the true data set, it seems. Nonetheless, what you say makes sense as well when looking at it that way.

    • @bhujithmadav1481
      @bhujithmadav1481 8 месяцев назад

      @Drewbie_T By flexible, Dan means complexity of the model. More complex the model is, i.e the decision boundaries have been fit in such a way that the model performs exceeding well on the training data, then the chances are high that the model might not perform well on testing data. This is the case of high variance and low bias.