All Machine Learning Beginner Mistakes explained in 17 Min

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 77

  • @divyanshpandey8355
    @divyanshpandey8355 28 дней назад +25

    I was actually gonna watch this while having some pasta, but 2 minutes later I realised I need to get my notebook and pen ASAP! Golden Content my guy, pure GOLD.

  • @nikunjdeeep
    @nikunjdeeep Месяц назад +24

    first 3 mins and i added this to my liked section ....pure gem

  • @petersilie9702
    @petersilie9702 28 дней назад +3

    As an amateur data Analyst/scientist, I think This is insanely useful Information. Thanks for Sharing

  • @prison9865
    @prison9865 Месяц назад +7

    Seriously good refresher. I like this type of videos. Quick and to the point. Got job

  • @PotatoMan1491
    @PotatoMan1491 27 дней назад +1

    I am just a new hobbyist, this content is awesome, I find it very helpful.

  • @michaelmcallister9519
    @michaelmcallister9519 29 дней назад +1

    This is a good video, way better than others I’ve seen.

  • @diemilio
    @diemilio Час назад

    Thank you. Very helpful.

  • @anatolsonntag857
    @anatolsonntag857 Месяц назад +1

    You have high quality Videos.
    If you keep up with those you will be very succesful.
    Keep up the good work.
    I ll bet you ll achieve 100k subscribers in the 6 Months.

  • @olavictor6286
    @olavictor6286 Месяц назад

    Your video is top notch as always, just diving into the world of ML

  • @Riku-pv5dc
    @Riku-pv5dc Месяц назад

    Great and informative video! I'm sure I'll rewatch it many times.
    I started learning programming just a month and a half ago (through Udemy courses), and I'm already building my first dataset on EV chargers installed in Europe (I have a dataframe with over a million parameters!). Once I clean it up, I'll move on to running ML algorithms on it.
    Thank you for the effort you're investing in future generations, Mr. Nobody!

    • @Riku-pv5dc
      @Riku-pv5dc Месяц назад

      During my university years (I'm finishing my degree in Engineering Management this semester), I studied statistics, linear algebra, calculus (ended somewhere around Hessian matrices), and optimization over the past 2-3 years. It feels like a dream come true to now apply these concepts in a programming environment, which I previously only worked on theoretically with pen and paper.

  • @ardgeorge4175
    @ardgeorge4175 Месяц назад

    Common sense, but very good refresher. Thanks!

  • @gnkartha
    @gnkartha Месяц назад +1

    Great summary!

  • @jomban707
    @jomban707 10 дней назад +1

    Would you say that shuffeling is needed for the Testing and Validation Sets?

  • @newbie8051
    @newbie8051 Месяц назад

    2:20 ahaaaa, had this asked in an interview as well
    6:07 read about annealing learning rates, will try to implement that as well

    • @vamsikrishna6968
      @vamsikrishna6968 24 дня назад

      hey, could you explain the data leakage part to me?

  • @bitcode_
    @bitcode_ Месяц назад

    Amazing content, thank you for helping out a noob

  • @notVinodRodrigo
    @notVinodRodrigo Месяц назад

    Wow man! This is gold!

  • @eduardo.z6909
    @eduardo.z6909 Месяц назад +1

    Thanks for video!

  • @RolandoLopezNieto
    @RolandoLopezNieto Месяц назад +1

    Great content, thanks

  • @alin50248
    @alin50248 Месяц назад +1

    Very good content!

  • @Blooper1980
    @Blooper1980 Месяц назад +6

    THIS IS AWSOME!

  • @prison9865
    @prison9865 Месяц назад

    actually, could you make an explanation when to do scaling of factors? is it needed only for distance based algorithms and how do you deploy a model if you did the scaling?

  • @ramayuda6578
    @ramayuda6578 Месяц назад +5

    What about data created dishonestly? Basically, I’m not an IT programmer, but I’m learning data science. As a practitioner, I’ve occasionally created or reported dishonest data. I think, as a human, others might do the same. Can this affect the accuracy of the model in general?

    • @joshuadavid6810
      @joshuadavid6810 Месяц назад +4

      Yeah definitely. If the data is wrong, no model can save it.

    • @kyleebrahim8061
      @kyleebrahim8061 22 дня назад

      This is where metrics can't reach, as mentioned in the video, domain knowledge is essential because it will most likely tell you why your model performs well in training and testing but fails in the real world. Somewhere, somehow there is always an answer to lies.

  • @bathalamallikarjuna2316
    @bathalamallikarjuna2316 Месяц назад

    Absolutely fantastic

  • @hellblazer7
    @hellblazer7 28 дней назад +3

    This video is insane. It's so good that it should be included in any ml based academic book as a synopsis.

  • @hanyanglee9018
    @hanyanglee9018 29 дней назад

    ver control, docs, are very good. I never shuffle data.

  • @ozgurdenizcelik
    @ozgurdenizcelik Месяц назад

    it's lot to take note from one session i believe you have more detailed videos on your channel. I'll check out later. Thanks

  • @vamsikrishna6968
    @vamsikrishna6968 24 дня назад

    Hey, thank you. Your videos are very informative. However, I have recently started studying ML and have few questions.
    1. Can you tell me what does "sample" mean?
    2. How many samples are needed minimum for considering a DL approach? Is there any criteria as such? I remember you showing scikit learn ML chart like a decision tree on what model to select based on the number of samples we have.
    Thanks in advance.

  • @Glorytroly9092
    @Glorytroly9092 Месяц назад

    Hey, please do the same videos for statistics and other concepts which are foundation ...

  • @Glorytroly9092
    @Glorytroly9092 Месяц назад

    Will you create similar content for statistics concepts like your previous videos ?

  • @zub3rahmed76
    @zub3rahmed76 15 дней назад

    Can we have similar video for Deep learning as well.

  • @damori3604
    @damori3604 29 дней назад +5

    I don't understand anything but I'm still hooked because my brain tells me it's helpful

  • @TinkerRaw
    @TinkerRaw 27 дней назад +1

    I submit an abstract on the 7th featuring my first ever ML work in my field, i’ve been very nervous about making simple errors or not presenting the research in a way that ML people would feel satisfied by. This was super helpful, thank you

  • @klane3514
    @klane3514 Месяц назад

    well, I would love to do hyperparameter optimization and use cross validation but each epoch is 16h and we need to publish a paper so :(

  • @tonycincera3353
    @tonycincera3353 27 дней назад +1

    Having recently retired after having worked as a Data Scientist for over 3 decades, this is a very, very good summary of the issues and fixes not just for ML but for any predictive modeling project.

  • @abhishekdas8807
    @abhishekdas8807 Месяц назад

    This is good stuff

  • @abhishekkurup
    @abhishekkurup 3 дня назад

    Damn this is good

  • @CAGonRiv
    @CAGonRiv День назад

    Hey Tim. I love you

  • @maitreyimandal8910
    @maitreyimandal8910 Месяц назад

    Very good

  • @anrichvanderwalt1108
    @anrichvanderwalt1108 28 дней назад

    Great video! All the lessons I had to learn the hard way in my first 2 years!

  • @emont
    @emont 26 дней назад

    Now I understand why Splunk makes sense in AI world, instead of create AI you must assure to have the better dataset first.

  • @Onlyhas99
    @Onlyhas99 Месяц назад

    instant subscribe

  • @narangfamily7668
    @narangfamily7668 Месяц назад +2

    Thanks! I try to avoid most of these

  • @mogaolimpiu7190
    @mogaolimpiu7190 25 дней назад

    this is great, its something i wish i had when i was banging my head against the wall when my models weren't behaving how I thought they would, this video mentions all the issues i spent weeks working one, and more !, great tool.

  • @piggybox
    @piggybox 28 дней назад

    This is years of experience and lessons learnt for "beginner"

  • @cycla
    @cycla 29 дней назад

    I just love these types of organized information, it's the data scientist's way

  • @NilasScweisthal
    @NilasScweisthal Месяц назад +1

    This video is a perfect checklist😂. Thanks🙏

  • @andrelim2428
    @andrelim2428 27 дней назад

    thank you for creating this video! quick question... re not shuffling data 9:58 for time series data, wouldn't shuffling data introduce train / test set contamination? also, wouldn't the order be important for time series data and shuffling it ruin the time arrow? thank you!

  • @alexandrodisla6285
    @alexandrodisla6285 29 дней назад

    Ohhh there are such things has model validation.

  • @bluesbucker70
    @bluesbucker70 29 дней назад +6

    Ignoring domain knowledge is the worst of these by far. If you don't understand the domain, you will generate trivial, weak or useless solutions even if you do everything else right.

  • @Nino21370
    @Nino21370 Месяц назад

    🔥

  • @Qris_7711
    @Qris_7711 Месяц назад +1

    Non-stationary data is being missed

    • @acasualviewer5861
      @acasualviewer5861 Месяц назад

      what do you mean by this? Can you explain?

    • @FedeAlbertini
      @FedeAlbertini Месяц назад

      @@acasualviewer5861Essentially, unless you have a degree in stats or maths, avoid time series data

    • @FedeAlbertini
      @FedeAlbertini Месяц назад

      @@acasualviewer5861Stationarity is a property of some time series data. It essentially ensures that the distribution of out which the time series data is generated does not change over time (this is for strict stationarity. For weak stationarity only the first two moments and autocovariance need to stay the same when analysing two time points that are h time steps apart). But yeah, unless you know what you are doing, stay away from time series

    • @acasualviewer5861
      @acasualviewer5861 Месяц назад

      @@FedeAlbertini you mean like temperatures tend to be cooler in the winter vs the summer? So you could say its non-stationary?

    • @FedeAlbertini
      @FedeAlbertini Месяц назад

      @@acasualviewer5861 yes, temperatures are non stationary. They have a trend(global warming) and seasonal components. Most processes are in fact non stationary, but pretty much all of the time series modelling techniques assume stationarity. Therefore, to model correctly, you need to know how to turn non stationary processes into stationary ones. Common techniques are differencing, de trending and log transformations.

  • @carlosandrescastromarin7775
    @carlosandrescastromarin7775 28 дней назад +5

    I stopped the video right away when SMOTE was suggested as a solution to class imbalance.

    • @jamestriveri304
      @jamestriveri304 27 дней назад

      Same!

    • @somnath3986
      @somnath3986 27 дней назад +1

      Why? is it not the actual solution

    • @carlosandrescastromarin7775
      @carlosandrescastromarin7775 26 дней назад +2

      @@somnath3986 SMOTE is more likely to create synthetic samples of the majority class instead of the minority. To assess this issue under-sampling is preferred to oversampling, however class imbalance is not a problem but usually the nature of the data, so best solution would be to actually use a loss function that penalizes the majority class.

    • @kyleebrahim8061
      @kyleebrahim8061 22 дня назад

      that's not a very good approach, neither to yourself nor aspiring data scientists, SMOTE is more likely to fail in creating synthetic samples in the minority class if the minority class already contains noisy or imbalanced data, this is not the fault of the technique. Instead, what you should suggest are variants of SMOTE that have been developed to address limitations in this technique. There just simply can't be a preferred sampling method to every class imbalance problem and you should know that if you understand how broad the application of ML is in every industry.

  • @init_yeah
    @init_yeah Месяц назад +1

    Duo

  • @pressspacetostop
    @pressspacetostop 5 дней назад

    hmm this video is too hard for me someone reply to me in a year reminding me to come back here please thanks

  • @__krossell__
    @__krossell__ 29 дней назад

    Using SMOTE is a beginner mistake

  • @Itz_Akashi.134
    @Itz_Akashi.134 Месяц назад +4

    First 😂

  • @jamesrosicky2912
    @jamesrosicky2912 27 дней назад +1

    Amazing, thanks!