Handle Categorical features using Python

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 45

  • @HrishikeshShinde495
    @HrishikeshShinde495 4 года назад +1

    This is the simplest way of encoding the categorical features. Thanks man!!

  • @programsolve3053
    @programsolve3053 4 месяца назад

    Well explained🎉🎉

  • @sreedharsree361
    @sreedharsree361 4 года назад +5

    Thanks krish for this video ..
    I have a doubt, at last part of the video .. while converting from categorical feature to numerical feature 2001 pincode represented at one instance as 1 and at other instance it is represented as 0 .. on what basis we represented like this ?

  • @ishmeetsinghrayat
    @ishmeetsinghrayat 2 года назад

    you explain like pro bro.....

  • @MrKB_SSJ2
    @MrKB_SSJ2 2 года назад

    Thank you so much... It was so easy...

  • @amylock8275
    @amylock8275 Год назад

    Exactly what I was looking for! Thank you

  • @UnstoppableBird
    @UnstoppableBird 3 года назад

    thank you so much, this is actually clearer than the stupid class I enrolled earlier

  • @louerleseigneur4532
    @louerleseigneur4532 3 года назад

    Thanks Krish

  • @timothythampy7202
    @timothythampy7202 3 года назад

    Thank you so much, sir! You are the best teacher

  • @nikhilshingadiya7798
    @nikhilshingadiya7798 3 года назад +1

    what happen if we have all independent variable as categorical i.e. movies data set country origin,movie_type,director now i want to predict the imdb missing data how can i handle those categorical variable

  • @dilipgawade9686
    @dilipgawade9686 5 лет назад +4

    Hi Krish,
    can you show how to convert categorical variable to numeric variable through coding ?

  • @slowhanduchiha
    @slowhanduchiha 4 года назад +1

    One Question Sir. I was working on a classification dataset. My out put variable is also categorical in nature . I applied OHE and later when i saw the heatmap it made no sense because the columns were bit blank. Correct me where i am wrong here

  • @inderpreetkaur1441
    @inderpreetkaur1441 4 года назад +1

    I want to create box plot for categorical variable (like subscribed: yes/no)
    Firstly I wrote the code: train= pd.get_dummies(train['subscribed'], drop_first=1)
    And then for creating box plot: train['subscribed'].plot.box()
    But this will show error as- keyword: 'subscribed'
    Please let me know my mistake.

  • @simanchalpatnaik2566
    @simanchalpatnaik2566 4 года назад

    Thanks for the video Krish.
    When I ran the "df" command after concating, why all the values of Florida & New York comes as "NaN" ?

  • @51kaushal
    @51kaushal 3 года назад

    Kris, but what if we have a regression problem then we would not have output as 0/1, then how do we encode the categorical features like pincode, do we use frequency/count encoding in there??

  • @parakhsrivastava7743
    @parakhsrivastava7743 5 лет назад +1

    I have a doubt...
    When dealing with categorical values having many classes, you took all 2001's and find out the probability where O/P is 1.
    Suppose, that is coming 0.6(as in video). Now you are replacing all 2001's with 0.6, no matter O/P is 0 or 1... WHY?
    Should we not replace 2001's by 0.6 only if O/P is 1, else replace it with 0.4?
    Thanks for the video btw!

    • @swinal9710
      @swinal9710 5 лет назад

      The O/p column comes from where can you explain me that?

    • @theeagleseye4989
      @theeagleseye4989 4 года назад

      Same doubt here

    • @sahilkamboj747
      @sahilkamboj747 4 года назад

      @Premjith Augustine what if the output is not a classification variable. Target variable can be Continuous like Price, Fees,Profit

    • @aakashsinghrawat3313
      @aakashsinghrawat3313 4 года назад

      that 0.6 is mean for 0 as well as to 1

  • @sahanjayarathne3472
    @sahanjayarathne3472 4 года назад

    Thank you so much for the video

  • @aanniirr100
    @aanniirr100 2 года назад

    how can we save the count of a particular category obtained to be used later in any calculation

  • @somnathbanerjee2057
    @somnathbanerjee2057 3 года назад

    Hi Krish, could you please guide me how can I handle text column for a regression problem. It's not about encoding categorical features. But what I am looking for is---extracting some meaningful information from the existing column containing text data using string manipulation method from regex...Please recommend me an effective way of doing this.

  • @ijeffking
    @ijeffking 5 лет назад

    Thank you Krish.

  • @ranoyavanniysingam3390
    @ranoyavanniysingam3390 3 года назад

    if we have more than 5 categorical-feautures column, what to do for that? for example -- country, age group like this?

  • @swaruppanda2842
    @swaruppanda2842 5 лет назад +1

    At 17:08 u made it clear for 2001 as 1 as output will be 2/3=0.6 what about 2001 as 0 as output?

    • @aakashsinghrawat3313
      @aakashsinghrawat3313 4 года назад

      actually the idea is getting mean of 2001, suppose 2001 has value 1 and 0 obvious mean will be 0.5 for both

  • @chaithanyaravi8400
    @chaithanyaravi8400 4 года назад

    Hi Krish,
    suppose in the case if we have 8 categorical names at that time it will generate 8 new columns?

  • @arjyabasu1311
    @arjyabasu1311 4 года назад

    Sir, please upload a video on how to perform mean encoding !!

  • @priyak1008
    @priyak1008 3 года назад

    How to apply onehot encoding if we have categorical data in Y (dependent column).

  • @surbhisarawagi1821
    @surbhisarawagi1821 5 лет назад +1

    If we have large number of categorical variables say 21, then if we use get dummies we'll have large number of columns so how to deal such a case?

    • @gurjotsingh752
      @gurjotsingh752 4 года назад

      are u able to find, how to cater your problem ie 21 categorical variables?

    • @aakashsinghrawat3313
      @aakashsinghrawat3313 4 года назад

      i think, we can use feature_selection like SelectKBest to get top k(any N no. upto total columns) which means these new features have strong relationship with target

    • @gurjotsingh752
      @gurjotsingh752 4 года назад

      @@aakashsinghrawat3313 yes, your approach is good. I also find one more approach. Here we will replace categorical values with their number of count. Eg) We have 29 states. and suppose we have 10K records and Delhi has come 500 times, Karnataka come 900 times. So, i will replace delhi by 500 and Karntaka by 900.

    • @aakashsinghrawat3313
      @aakashsinghrawat3313 4 года назад

      @@gurjotsingh752 isn't it inefficient to labelling 500,900 to categorical feature? This method might be good to ordinal features.

  • @nashrakhan5361
    @nashrakhan5361 3 года назад

    getting key error for the column which I used for categorical data, please help

  • @SuperJg007
    @SuperJg007 3 года назад

    what to do if there is mixed data, continuous and categorical?

  • @ankita684
    @ankita684 4 года назад

    Hi Krish.. I wrote exactly the same codes simultaneously to practice, but my score came out to be -5.667 (I got a negative value) whereas you got 0.9304. I am not able to understand why am I not getting the same value. Please explain.

  • @satishakuthota6290
    @satishakuthota6290 5 лет назад

    please make video on visuvalistion using matplotlib

  • @vijaynale7893
    @vijaynale7893 5 лет назад

    Thanks bro

  • @aakashsinghrawat3313
    @aakashsinghrawat3313 4 года назад

    can someone please give me link of solved example using target encoding, mean encoding like above

  • @vipinamar8323
    @vipinamar8323 3 года назад

    What is the last encoding type called?