Rachit Toshniwal
Rachit Toshniwal
  • Видео 77
  • Просмотров 424 000
Removing constant & Quasi constant features using Variance Threshold | Machine Learning
#variancethreshold #constant #quasiconstant
In this video, we will look at how we can effortlessly remove constant and quasi constant features from our datasets and make them leaner and more robust, using scikit-learn's VarianceThreshold implementation.
I've uploaded all the relevant code and datasets used here (and all other tutorials for that matter) on my github page which is accessible here:
Link:
github.com/rachittoshniwal/machineLearning
If you like my content, please do not forget to upvote this video and subscribe to my channel.
If you have any qualms regarding any of the content here, please feel free to comment below and I'll be happy to assist you in whatever capacity possible.
Thank ...
Просмотров: 3 142

Видео

Principal Component Analysis (PCA) Intuition | Machine Learning
Просмотров 9233 года назад
#pca #machinelearning #intuition In this video, we'll look at WHAT and the HOW of PCA. Thanks!
Machine Learning Project | Credit Risk Analysis | Learning Curves | Overfitting | Python
Просмотров 21 тыс.3 года назад
#machinelearning #python #project In this video we will look at a Machine Learning project that will try to predict whether someone will get their loan sanctioned or not. We will use a Randomized Search too find optimal set of parameters. We will then use precision recall curves and learning curves to assess the model performance. We will rectify the case of overfitting in the model and make am...
Machine Learning Project | Predicting Student Marks | Python
Просмотров 8 тыс.3 года назад
#ml #project #python In this video, we will make a quick and dirty ML model to predict the marks of a student. We will do some basic EDA, then use Column Transformers and Pipelines to make the model, use the GridSearchCV to find the best performing model and then save it using joblib. The link to the data and the notebook can be found here: github.com/rachittoshniwal/ML-projects/ Hope it helps!
How to tune hyper parameters using Grid Search CV | With and without a Pipeline | Machine Learning
Просмотров 4,9 тыс.3 года назад
#grisdearch #machine #learning #python In this tutorial, we'll look at Grid Search CV, a technique by which we can find the optimal set of hyper-parameters and fine tune our ML model to make a better model. Table of contents: 0:00 Intro 1:08 Randomized Search CV 1:52 Python code for Grid Search CV 3:20 Without a pipeline 8:21 With a pipeline I've uploaded all the relevant code and datasets used...
Churn Modeling Tableau Project for beginners
Просмотров 21 тыс.3 года назад
#tableau #project #beginners In this video, we'll build a simple Tableau dashboard for analyzing customer churn at a bank. We'll use filters, parameters, histograms, dashboard actions, and some formatting to prettify our viz. Table of contents : 0:01 Final dashboard 2:51 Making the sheets 16:08 Making the dashboard 22:50 Dashboard actions You can access the workbooks from my tableau public prof...
How to create and use groups to clean data and create higher level dimensions | Tableau Series
Просмотров 1133 года назад
#tableau #groups In this video, we'll look at how to create and use groups in Tableau in two different scenarios: 0:01 Using groups for handling messy data 4:23 Using groups for creating higher level "positions" dimension You can access the workbooks from my tableau public profile: public.tableau.com/profile/rachit.toshniwal#!/ Link for all the relevant materials used in these videos: github.co...
How to split different records having same values in a column in Tableau | Tableau Series
Просмотров 5633 года назад
#tableau In this video, we'll look at how to systematically organize fields into folders in Tableau. You can access the workbook from my tableau public profile: public.tableau.com/profile/rachit.toshniwal#!/ Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank y...
How to systematically organize fields into folders | Tableau Series
Просмотров 873 года назад
#tableau #folders #organize In this video, we'll look at how to systematically organize fields into folders in Tableau. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
How to create hierarchies in Tableau | Tableau Series
Просмотров 1413 года назад
#tableau #hierarchies In this video, we'll look at how to create hierarchies in Tableau. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
Using the replace function in tableau to clean messy columns | Tableau Series
Просмотров 4,4 тыс.3 года назад
#tableau #replace #function In this video, we'll look at how to use the in-built "replace" function in tableau to clean messy data. 0:01 Replace function in a calculated field 3:33 Exercises for solidifying concepts 4:17 Solutions for the exercises 7:08 Hiding the unwanted columns Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video...
Using the split function in a calculated field to clean messy data | Tableau Series
Просмотров 4603 года назад
#tableau #split #calculated #field In this video, we'll look at how to use the in-built split function in Tableau to clean messy data. 0:17 Problems with using auto and custom split for non-uniform columns 1:55 Using split in a calculated field Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my c...
Using auto and custom split in Tableau to split columns | Tableau Series
Просмотров 3853 года назад
#tableau #split #custom #auto In this video, we'll look at how to use the auto and custom split methods in Tableau to split string columns into n-number of different columns. 0:18 What is splitting and how does it happen? 1:41 Custom split 4:20 Auto split 6:49 Where auto split fails Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this vid...
Editing the metadata in Tableau | Tableau Series
Просмотров 4243 года назад
#tableau #editing #metadata In this video, we'll look at how to edit the metadata in tableau, to make the data ready for analysis and drawing inferences from. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
Data types in Tableau | Numeric, String, Geographic, Boolean, Date, Date & Time | Tableau Series
Просмотров 3383 года назад
#tableau #data #types In this video, we'll look at the different data types in Tableau. Link for all the relevant materials used in these videos: github.com/rachittoshniwal/tableau If you like this video, please consider subscribing to my channel and upvote this video! Thank you for watching!
When to add a new connection vs a Data source in Tableau | Tableau Series
Просмотров 1,8 тыс.3 года назад
When to add a new connection vs a Data source in Tableau | Tableau Series
Unions in Tableau | How to use a wildcard for unions in Tableau | Tableau Series
Просмотров 7813 года назад
Unions in Tableau | How to use a wildcard for unions in Tableau | Tableau Series
Blending multiple distinct data sources in Tableau | Tableau Series
Просмотров 2823 года назад
Blending multiple distinct data sources in Tableau | Tableau Series
Joins in Tableau | Inner, Outer, Left and Right Joins | Physical and Logical Layer | Tableau Series
Просмотров 1,5 тыс.3 года назад
Joins in Tableau | Inner, Outer, Left and Right Joins | Physical and Logical Layer | Tableau Series
Relationships - The new Tableau data model | Understanding the Performance Options | Tableau Series
Просмотров 5923 года назад
Relationships - The new Tableau data model | Understanding the Performance Options | Tableau Series
Connecting to a data source in Tableau | Different types of connections | Tableau Series
Просмотров 2263 года назад
Connecting to a data source in Tableau | Different types of connections | Tableau Series
Downloading Tableau | Tableau Desktop or Tableau Public? Advantages and limitations | Tableau Series
Просмотров 4273 года назад
Downloading Tableau | Tableau Desktop or Tableau Public? Advantages and limitations | Tableau Series
(Code) Iterative Imputer | MICE Imputer in Python | Machine Learning
Просмотров 14 тыс.3 года назад
(Code) Iterative Imputer | MICE Imputer in Python | Machine Learning
(Code) What is Winsorization | Using percentiles for capping outliers in Python | Machine Learning
Просмотров 6 тыс.3 года назад
(Code) What is Winsorization | Using percentiles for capping outliers in Python | Machine Learning
(Code) Trimming outliers using the IQR method | Machine Learning
Просмотров 2,2 тыс.3 года назад
(Code) Trimming outliers using the IQR method | Machine Learning
(Code) Capping outliers using the IQR method | Machine Learning
Просмотров 6 тыс.3 года назад
(Code) Capping outliers using the IQR method | Machine Learning
Using IQR for handling outliers | Calculating Percentiles | Inner & Outer Fences | Machine Learning
Просмотров 5893 года назад
Using IQR for handling outliers | Calculating Percentiles | Inner & Outer Fences | Machine Learning
(Code) Trimming outliers using the Z-score method | Machine Learning
Просмотров 9973 года назад
(Code) Trimming outliers using the Z-score method | Machine Learning
(Code) Capping outliers using the Z-score method | Machine Learning
Просмотров 1,8 тыс.3 года назад
(Code) Capping outliers using the Z-score method | Machine Learning
When to and when NOT to use Z-scores for handling outliers | Machine Learning
Просмотров 7513 года назад
When to and when NOT to use Z-scores for handling outliers | Machine Learning

Комментарии

  • @user-yq8jp6bc5u
    @user-yq8jp6bc5u 11 дней назад

    Thank you so much Rachit, video is really awesome

  • @fightsatan2408
    @fightsatan2408 15 дней назад

  • @venkyvenky4715
    @venkyvenky4715 17 дней назад

    but you can do getdummies before traintestsplit

  • @MRahdianEgakurnia
    @MRahdianEgakurnia Месяц назад

    i have an already scaled data with powertransformer, can you really scaled a new data outside the scaled data with scaled data as standard using fit.? because ive tried this and the data seems dont mach the scaled data. Thank you

  • @DataAnalystVictoria
    @DataAnalystVictoria Месяц назад

    Thanks! ❤

  • @umutg.8383
    @umutg.8383 2 месяца назад

    MICE part is good but the missingness definitions are all wrong.

  • @rishi1901
    @rishi1901 2 месяца назад

    excellent demonstration !I really appreciate your efforts. Very helpful for me as a beginner

  • @wtfashokjr
    @wtfashokjr 2 месяца назад

    why pd.get_dummies not working for me ?

  • @nishantwhig7206
    @nishantwhig7206 2 месяца назад

    Very clearly explained.Thank you.

  • @SodaPy_dot_com
    @SodaPy_dot_com 2 месяца назад

    verey detailed with the parameters. love it

  • @DhirajSahu-ct1jp
    @DhirajSahu-ct1jp 2 месяца назад

    Thank you so much!!

  • @r.s.572
    @r.s.572 3 месяца назад

    thank you for explaining this! :) poor PhDs are thankful for people like you who use their free time to do such videos!

  • @KA00_7
    @KA00_7 3 месяца назад

    in-depth and best explanation video

  • @KA00_7
    @KA00_7 3 месяца назад

    learned something new today. Thank you so much

  • @rishidixit7939
    @rishidixit7939 3 месяца назад

    If I want to use Simple Imputer on two different columns but with different strategies on each column then what should I do ?

  • @preethirathod6751
    @preethirathod6751 3 месяца назад

    You have explained so clearly

  • @deeptimittal4552
    @deeptimittal4552 3 месяца назад

    Wow now I completely understand pivoting. I was struggling to get the concepts, now its all clear. Thank you Rachit.

  • @longtuan1615
    @longtuan1615 4 месяца назад

    That's the best video I've seen! Thank you so much. But in this video, the "purchased" column is ignored because this is fully observed. So what happens if missing values are only present in the "age" column, I mean the "experience", "salary" and "purchased" are fully observed and for the same reason, we will ignore them so we only have the "age" column that can not use the regression? Please help me!

  • @cadeepakgoyal7500
    @cadeepakgoyal7500 5 месяцев назад

    thanks a lot. really helpful

  • @jahnavinama8534
    @jahnavinama8534 5 месяцев назад

    well explanation bro..i am wathching 5 6 vedios about split method but the only one vedio is helpful for me and that's is yours

  • @DrizzyJ77
    @DrizzyJ77 5 месяцев назад

    Thanks Needed a clear explanation for my missed class😅

  • @dinushachathuranga7657
    @dinushachathuranga7657 6 месяцев назад

    Bunch of thanks for the clear explanation❤

  • @philcrom6299
    @philcrom6299 6 месяцев назад

    Wow, that really helped me in evaluating my master thesis!!!

  • @MrTau123
    @MrTau123 7 месяцев назад

    Eknumber.

  • @exanessa1234
    @exanessa1234 7 месяцев назад

    How imbalanced before AUPRC is preferred.

  • @focus72343
    @focus72343 7 месяцев назад

    very simple explanation, thank you and subscribed!

  • @ItzLaltoo
    @ItzLaltoo 8 месяцев назад

    Hey, the video was very helpful.. Can anyone explain me while implementing MICE in RStudio we get two columns Iteration & Imputation, how can we connect that with this video. Like in RStudio for each iteration we get 5 imputed dataset (by default). But from this video, we only get one dataset for a iteration.. It would be really helpful if anyone can explain me this. Thanks in advance

  • @cuoivelo8360
    @cuoivelo8360 9 месяцев назад

    Can you turn on subtitle for the videos? Im bad at English listening

  • @anonymeironikerin2839
    @anonymeironikerin2839 9 месяцев назад

    Thank your very much for this great explanation

  • @shoaibahmed5848
    @shoaibahmed5848 9 месяцев назад

    What about 1 row missing value and 4th row missing value is those values to be filled necessary?

  • @ShubhamKumar-xy6kj
    @ShubhamKumar-xy6kj 10 месяцев назад

    Great video bro...

  • @roshantonge1952
    @roshantonge1952 10 месяцев назад

    very good video

  • @modhua4497
    @modhua4497 10 месяцев назад

    Thanks, do you have example on how to incorporate LOG or SQRT transformation of features before modeling?

  • @zk321
    @zk321 10 месяцев назад

    The word “namaste" in Sanskrit means “bowing to you". Muslims believe that one can bow/prostrate only to Allah. We don't bow down to any human. It's important to note that religious beliefs and practices can vary among individuals and communities.

  • @ubaidghante8604
    @ubaidghante8604 10 месяцев назад

    Brother found some specific examples to explain MAR and MNAR 😅

  • @martinngobye3574
    @martinngobye3574 11 месяцев назад

    Great explanation regarding column-transformer and pipeline, however how do you have the data frame column names back instead of numbers? Thank you!!

  • @ishikaagarwal6945
    @ishikaagarwal6945 11 месяцев назад

    Nicely explained

  • @osoriomatucurane9511
    @osoriomatucurane9511 11 месяцев назад

    Namaste, Awesome, Sir. I must addmit the best tutorial by far in pandas groupby I have ever accross. Keep it up

  • @bellatrixlestrange9057
    @bellatrixlestrange9057 11 месяцев назад

    best explanation!!!

  • @mathewfernand8460
    @mathewfernand8460 11 месяцев назад

    Sir how can i get the dataset

  • @meenatyagi9740
    @meenatyagi9740 Год назад

    Very good explaination.I was struggling to get clearity on it .

  • @skyrayzor3693
    @skyrayzor3693 Год назад

    This tutorial is awesome!!

  • @subtlehyperbole4362
    @subtlehyperbole4362 Год назад

    (note: this is not an issue specific to your video, but something i have been getting confused by for a long time, this is just the first time I decided to stop and try ask about it in the comment section) It seems like it should be necessary (or maybe if not necessary, at least would be useful) to tell the model which column each imputed indicator is indicating for, right? But in the final dataset that you produce the imputed data indicators are all bunched up as the first four columns. How does the model know A) these features are imputed data indicators and, more importantly, B) which of the remaining 93 columns in the dataset each one is supposed to be for? They could be indicating for columns 5, 6, 7, and 8, or they could be indicating for columns 45, 72, 8, and 92, or any other combination of the remaining 93 feature columns. How does this not affect how the model trains? My brain is thinking that possibly the algorithm somehow susses that out on its own... but i don't understand how or why it can do that. Am i making much ado about nothing here?

    • @rachittoshniwal
      @rachittoshniwal Год назад

      At the very core of things, the computer only understands 0s and 1s haha. For human interpretability - yes, it might be necessary to label the columns to see what a specific missing indicator column is for, but for a computer, it doesn't matter. The column headers are just for us, as the model only cares about the data being in a 2D array format. For: A) the model doesn't know that a particular column is indicating missing values in some other column. It only cares about the values in it. B) Again, to reiterate, the model doesn't care what each of the other 93 odd columns stand for. It is only looking at their values You can shuffle the column ordering, pass in X.values instead of a dataframe X to the model. It will not affect performance

    • @subtlehyperbole4362
      @subtlehyperbole4362 Год назад

      @@rachittoshniwal Yeah I understand that but the 93 columns are all features in and of themselves, the 4 imputation indicators aren't really features of data in the same underlying way, right? They are more like features about other features, not features about the event whose label it is trying to train on. It seems like the imputation indicator data point's entire utility is essentially to point at a single data point in another column and say "don't take this data point too seriously because it was made up" -- wouldn't the entire weighting system need to treat that types of columns be different? It feels like it would be problematic (or at least, not useful) to treat those columns as if they were just additional features columns that could be treated like any other of the existing 93 features. (I mean, I guess it depends on the particular algorithm, like i imagine decision tree based algos would probably be able to handle that kinda thing, but others it feels like wouldn't be well served to treat those columns like they were just any other feature columns, no different than any of the others for its own starting purposes)

    • @rachittoshniwal
      @rachittoshniwal Год назад

      @@subtlehyperbole4362 although the missing indicator column is based off some column X, it is essentially a brand new column holding the information that "there is a column X which has missing values for these rows" so the model will test whether NOT having values in that column X is indicative of something or not.

  • @imranyounas4478
    @imranyounas4478 Год назад

    i did not understand if s = df['population'] then how remove outlier from all dataset instead of population column.

  • @prashu25925
    @prashu25925 Год назад

    Do we apply scaling techniques on categorical columns after encoding? Plz help

    • @rachittoshniwal
      @rachittoshniwal Год назад

      even if you apply scaling after encoding, the 0s and 1s will be converted to some new numbers, but all 0s will be the same number x and all 1s will be the same number y. So it is again essentially encoded, just that instead of 0s and 1s you have x and y

  • @amitblizer4567
    @amitblizer4567 Год назад

    Very well explained video. Thank you!

  • @kylehankins5988
    @kylehankins5988 Год назад

    I have also seen univariate imputation refer to a situation were you are only trying to impute one column instead of multiple columns that might more than one missing value

  • @olatheog
    @olatheog Год назад

    Great video, Rachit. Thank you. I also heard OneHotEncoding is not good for large categorical data in real world projects. Please which method do you advise or is there a video of you doing it that we can watch? Thank you so much

  • @tarun94060sharma
    @tarun94060sharma Год назад

    Sometimes we get videos having great explanation.

  • @olatheog
    @olatheog Год назад

    This is such a great video. I am just sad you did not end it with fitting a model and training after transforming as that is where I have problems. Is there another video of yours where you did that? I would really appreciate. Thank you

    • @rachittoshniwal
      @rachittoshniwal Год назад

      Thanks! I do have a couple of end to end project videos where I've fitted models after transforming. Hope they help!