Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 664

  • @codebasics
    @codebasics  2 года назад +13

    Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

  • @celestineokpataku
    @celestineokpataku 4 года назад +46

    I have watched only 4 mins so far i had to pulse and write this comment. I will say this is one of the best tutorial i have seen in data science. Sir you need to take this to another level. What a great teacher you are

    • @codebasics
      @codebasics  4 года назад +5

      That for the feedback my friend 😊👍

  • @TheSignatureGuy
    @TheSignatureGuy 4 года назад +49

    For anyone stuck with the categorical features error.
    from sklearn.compose import ColumnTransformer
    ct = ColumnTransformer([("town", OneHotEncoder(), [0])], remainder = 'passthrough')
    X = ct.fit_transform(X)
    X
    Then you should be able to continue the tutorial without further issue.

    • @muhammadhattahakimkeren
      @muhammadhattahakimkeren 4 года назад +1

      thanks bro

    • @fatimahazzahra6181
      @fatimahazzahra6181 4 года назад

      thanks a lot! it helps

    • @souvikdas3189
      @souvikdas3189 Год назад +1

      Thank you brother.

    • @Ran_dommmm
      @Ran_dommmm 11 месяцев назад +1

      Hey, thank for the code.
      I tried using your code but it gives me an error, despite of converting it (X) to an array, it gives me this error.
      " TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
      "

    • @TheSignatureGuy
      @TheSignatureGuy 11 месяцев назад

      ​@@Ran_dommmm I know you said "despite converting X to an array", but just double check you have used the .toarray() method correctly. The error message seems pretty clear on this one.
      This function may help confirm that a dense numpy array is being passed.
      import numpy as np
      import scipy.sparse
      def is_dense(matrix):
      return isinstance(matrix, np.ndarray)
      Pass in X for matrix and it should return True.
      Good luck fixing this.

  • @venkatesanrf
    @venkatesanrf 3 года назад +21

    Hi,
    Your explanation is very simple and effective
    Ans for practice session A)Price of Mercedes Benz -4Yr old--mileage 45000= 36991.31721061
    B)Price of BMW_X5 -7Yr old--mileage 86000=11080.74313219
    C) Accuracy=0.9417050937281082(94 percent)

    • @ANIMESH_JAIN04
      @ANIMESH_JAIN04 3 месяца назад

      Same bro

    • @fathoniam8997
      @fathoniam8997 2 месяца назад

      same bro.... thx for replying so that i can check my results

  • @jhagaurav8292
    @jhagaurav8292 6 лет назад +113

    Sir pls continue your machine learning tutorials ,yours tutorials are one of the best I have seen so far .

    • @codebasics
      @codebasics  5 лет назад +23

      sure Gaurav, I just started deep learning series. check it out

    • @samrahafeez5001
      @samrahafeez5001 3 года назад +3

      @@codebasics
      Kindly explain the concept of dummies in deep learning as well

  • @codebasics
    @codebasics  4 года назад +15

    Exercise solution: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb
    Everyone, the error with catergorical_features is fixed. Check the new notebook on my github (link in video description). Thanks Kush Verma for giving me pull request for the fix.

    • @urveshdave1861
      @urveshdave1861 4 года назад

      Thank you for the wonderful explanation sir. However I am getting an error as __init__() got an unexpected keyword argument 'catergorical_features' for the line for my code onehotencoder = OneHotEncoder(catergorical_features = [0]). Is it because of change of versions?
      what is the solution to this?

    • @bishwarupdey10
      @bishwarupdey10 4 года назад

      _init__() got an unexpected keyword argument 'categorical_features' sir I get this error when I specify categorical features

    • @sejalmittal1326
      @sejalmittal1326 4 года назад

      @@urveshdave1861 Have you got any answer for this? I am having the same error

    • @sejalmittal1326
      @sejalmittal1326 4 года назад

      @@urveshdave1861 okay .. i will do that. thanks

    • @tanvisingh9298
      @tanvisingh9298 4 года назад

      @@urveshdave1861 Hey I am also getting the same error. how did you resolve it?

  • @Genz111-o4r
    @Genz111-o4r 4 года назад +24

    I was confuse from where to start studying ml and then my friend suggested this series.... It's great :-)

    • @rishabhjain7572
      @rishabhjain7572 3 года назад

      any other courses or source you are following? and any development you have begun ?

    • @sauravmaurya6097
      @sauravmaurya6097 2 года назад

      want to know how much this playlist is helpful? kindly reply.

    • @carti8778
      @carti8778 2 года назад

      @@sauravmaurya6097 its quite helpful if u are a beginner. Beginner in sense of {not from engineering or programming background }. U can accompany this with coursera’s andrew ng course.

    • @carti8778
      @carti8778 2 года назад +1

      @@sauravmaurya6097 if u already know calculus and python programming (intermediate level) , ML would feel easy . After doing this go to the deep learning series bcz thats what used in industries.

  • @sreenufriendz
    @sreenufriendz 5 лет назад +3

    Anyone can be a teacher , but real teacher eliminates the fear from students .. you did the same !! Excellent knowledge and skills

    • @codebasics
      @codebasics  5 лет назад

      Sreenivasulu, your comment means a lot to me, thanks 😊

  • @noubaddi8567
    @noubaddi8567 3 года назад +3

    This guy is AMAZING! I have spent 2 days trying decenes of other methods and this is the only one that worked for my data and didnøt come as an error, this guy totally saved my mental sanity, I was growing desperate as in DESPERATE! Thank you, thank you, thank you!

    • @codebasics
      @codebasics  3 года назад +1

      I am glad it was helpful to you 🙂👍

  • @vaishalibisht518
    @vaishalibisht518 5 лет назад +11

    Wonderful Video.
    This so far the easiest explanation I have seen for one hot encoding. I have been struggling from very long to find a proper video on this topic and my quest ended today.
    Thanks a lot, sir.

  • @programmingwithraahim
    @programmingwithraahim 3 года назад +49

    15:50 write your code like this:
    ct = ColumnTransformer(
    [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],
    remainder='passthrough'
    )
    X = ct.fit_transform(X)
    X
    Ok so it will work fine otherwise it will give an error.

    • @AxelWolf26
      @AxelWolf26 3 года назад +1

      what is the use of this " (categories='auto') " and " 'one_hot_encoder' "

    • @jollycolours
      @jollycolours 2 года назад +1

      Thank you, you're a lifesaver! I was trying multiple ways since categorical_features has now been depreciated.

    • @adilmajeed8439
      @adilmajeed8439 2 года назад +8

      @@jollycolours correct, the categorical_features parameter is deprecated and for the same following are the steps needs to be followed;
      from sklearn.compose import ColumnTransformer
      ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(),
      [0])], remainder='passthrough')
      X = np.array(ct.fit_transform(X), dtype=float)

  • @shrutijain1628
    @shrutijain1628 3 года назад +5

    this ML tutorial is by far the best one i have seen it is so easy to learn and understand and your exersise also helps me to apply what i have learn so far thank you.

  • @ankitparashar7
    @ankitparashar7 5 лет назад +51

    Merc: 36991.317
    BMW: 11080.743
    Score: 94.17%

    • @codebasics
      @codebasics  5 лет назад +7

      Your answer is perfect Ankit. Good job, here is my answer sheet for comparison: github.com/codebasics/py/blob/master/ML/5_one_hot_encoding/Exercise/exercise_one_hot_encoding.ipynb

    • @vishalrai2859
      @vishalrai2859 4 года назад +2

      thanks for posting the answer bro

    • @mutiulmuhaimin9156
      @mutiulmuhaimin9156 4 года назад +2

      Could we upvote this comment to the top? Been looking for this for quite some time now. This is important, and this comment matters.

    • @Augustus1003
      @Augustus1003 4 года назад +4

      @@codebasics I used pandas dummy variable instead of using onehotencoding, because it is too confusing.

    • @clashcosmos4641
      @clashcosmos4641 4 года назад +2

      Got the same answer using OneHotEncoder after correcting tons of errors and watching videos over and over.

  • @tech-n-data
    @tech-n-data Год назад +4

    Your ability to simplify things is amazing, thank you so much. You are a natural teacher.

  • @tushargahtori1570
    @tushargahtori1570 Год назад +1

    Even in 23 your video is such a relief..kudos to your teaching.

  • @mk9834
    @mk9834 4 года назад +2

    I was shocked after the first 5 minutes of the video and have never thought it would be so easy and fast! Thanks ALOT1

    • @codebasics
      @codebasics  4 года назад +1

      Miyuki... I am glad you liked it

  • @wangangcwayi9420
    @wangangcwayi9420 4 года назад +4

    You have gift of explaining things even to the layman. Big Up to you

    • @codebasics
      @codebasics  4 года назад

      Thanks a ton Wangs for your kind words of appreciation.

  • @phil97n
    @phil97n 9 дней назад

    I'm reading a textbook that has an exercise to study this same dataset to predict survived. I just finished the exercise from the book - I can't seem to go past 81% score.
    Thanks for your awesome explanation

  • @HashimAli-tz8fw
    @HashimAli-tz8fw Год назад +3

    I achieved the same result using a different method that doesn't require dropping columns or concatenating dataframes. This alternative approach can lead to cleaner and more efficient code
    df=pd.get_dummies(df,
    columns=['CarModel'],drop_first=True)

  • @hiver6411
    @hiver6411 3 года назад +1

    the god of data science......Amazing explanation sir..kudos to your patience in explanation

  • @snom3ad
    @snom3ad 5 лет назад +5

    This was really well done! Kudos to you! It's hard to find clear and concise free tutorials nowadays. Subscribed and hope to see more awesome stuff!

  • @armagaan007
    @armagaan007 5 лет назад +11

    Wait wait... I don't see the point 😕
    The first half of the video does the same thing as one hot encoding(the second half of video)but second half is more tedious and takes more steps
    Then why not use the pd.get_dummies instead of onehotencoding???
    What's the advantage of using onehot?

    • @codebasics
      @codebasics  5 лет назад +9

      I personally like pd.get_dummies as it is convenient to use. I wanted to just show two different ways of doing same thing and there are some subtle differences between the two. Check this: stackoverflow.com/questions/36631163/pandas-get-dummies-vs-sklearns-onehotencoder-what-is-more-efficient

    • @armagaan007
      @armagaan007 5 лет назад +1

      @@codebasics thank you :]... btw you make grt videos

  • @bandhammanikanta1664
    @bandhammanikanta1664 4 года назад +2

    First of all, 1000*Thanks for sharing such content on youtube..
    I got an accuracy of 94.17% on training data.

    • @codebasics
      @codebasics  4 года назад

      Bandham, I am glad you liked it buddy 👍

  • @ymoniem1
    @ymoniem1 4 года назад +1

    you really made it very easy to understand such new concepts, Thanks a lot
    starting from mint 12:30 about OneHotEncoder . Some udpates in Sklearn prevent using categorical_features=[0]
    here is the code update as of April 2020
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.compose import ColumnTransformer
    columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
    X = np.array(columnTransformer.fit_transform(x), dtype = np.str)
    X= X[:,1:]
    model.fit(X,y)
    model.predict([[1,0,2800]])
    model.predict([[0,1,3400]])

    • @petermungai5508
      @petermungai5508 4 года назад

      The code is working but give a different prediction compared to dummies

    • @petermungai5508
      @petermungai5508 4 года назад

      Plus my X is showing 5 column instead of 4

    • @petermungai5508
      @petermungai5508 4 года назад

      I was entering the 0 and 1 wrongly. I am getting the same answer thank you for the code

    • @rameshkrishna1956
      @rameshkrishna1956 7 месяцев назад

      thanks buddy

  • @tanmaykapure81
    @tanmaykapure81 2 года назад +1

    This is the best machine learning playlist i have came across on youtube😃👍, Hats off to you sir.

  • @roastwithmeall
    @roastwithmeall 7 месяцев назад

    your are the best teacher on youtube , i have never seen before

  • @ZehraKhuwaja65
    @ZehraKhuwaja65 11 месяцев назад

    I must say this is the best course I've come across so far.

  • @abhinavb717
    @abhinavb717 Год назад

    I am getting 84% accuracy without encoding variable, but after encoding i am getting 94% accuracy on model. Thank you for your teaching. Doing great Job

  • @vishwa4908
    @vishwa4908 4 года назад +2

    Awesome, you're explaining concepts in very simple manner.

    • @codebasics
      @codebasics  4 года назад +2

      Vishwa I am happy to help 👍

  • @gokkulkumarvd9125
    @gokkulkumarvd9125 3 года назад +4

    How can I like this video more than 100 times!

    • @codebasics
      @codebasics  3 года назад

      I am happy this was helpful to you.

  • @omharne1386
    @omharne1386 Год назад

    I will say this is one of the best tutorial i have seen in ML

  • @prasadjoshi8213
    @prasadjoshi8213 4 года назад +3

    Hi sir !! Most easier way u teach ML. Thanks a lot!!!. I m going through ur videos and assignments. I got the answer for merce: 36991.31, BMW:11080.74 & model score :0.9417. The Model score is 94.17%. My QUE is how to improve the Model score ??? Is there any way to apply the features?

  • @cahitskttaramal3152
    @cahitskttaramal3152 4 года назад +5

    Thank you for wery well explained tutorial. I have one question though, you are training all of your data here and yet model score is only 0.95. Why is that? It must be 1. If you were to split your data and train it would make sense but your case doesn't. What am I missing here?

    • @codebasics
      @codebasics  4 года назад +7

      Alper, It is not true that if you use all your training data the score is always one. Ultimately for regression problem like this you are trying to make a guess of a best fit line using gradient descent. This is still an *approximation* technique hence it will never be perfect. I am not saying you can never get a score of 1 but score less then 1 is normal and accepted.

  • @sanjanatarekar5942
    @sanjanatarekar5942 2 года назад +2

    Hi,
    Since OneHotEncoder's categorical_features has been deprecated... Can you please mention here how to proceed?

  • @datasciencewithshreyas1806
    @datasciencewithshreyas1806 3 года назад +1

    One of the best explanation for Encoding 👌👍

  • @srinivasreddy1709
    @srinivasreddy1709 4 года назад +2

    Hi Dhaval, your explanation on all the topics is crystal clear.
    Can you please make videos on NLP also

  • @ZOSELY
    @ZOSELY Год назад

    I wish I could give this videos 2 thumbs up! Great explanation of all the steps in one-hot encoding! Thank you!!

  • @weshallneversurrender
    @weshallneversurrender 2 года назад

    The Data Science GOAT! One day I will send you a nice donation for all that you have contributed to my journey sir!

  • @himanshusingh-vt9do
    @himanshusingh-vt9do 5 месяцев назад

    my model score 94% Accuracy .Thankyou sir for amazing video.

  • @deekshithkumar3234
    @deekshithkumar3234 3 года назад +1

    superb and precisely explained

  • @komalsunandenishrivastava9211
    @komalsunandenishrivastava9211 20 дней назад

    That image on one hot encoding 🤣🔥

  • @NoureddineBahi
    @NoureddineBahi 3 года назад

    Think you very much...wonderful work..special think from Morocco in north of Africa

  • @timse699
    @timse699 3 года назад +1

    You teach with passion! thank you for the series!

  • @flamboyantperson5936
    @flamboyantperson5936 6 лет назад +6

    Please make regression video using preprocessing library with standaridization and normalization variables

  • @maruthiprasad8184
    @maruthiprasad8184 2 года назад

    For Mercedec benz I got 51981.26, for BMW i got 39728.19 & score is 94.17% . Thank you very much to make ML easy.

  • @richard_shaju
    @richard_shaju Год назад +1

    You are a Gem

  • @subrahmanyamkesani7304
    @subrahmanyamkesani7304 2 года назад +2

    Can you please explain the difference between "get_dummies" and "OneHotEncoding" ?

  • @dineshgaddi1843
    @dineshgaddi1843 3 года назад +4

    First of all thank you for making life easier for people (who want to learn Machine Learning). You explain really well. Big Fan. When I was trying to execute categorical_features=[0], it gave an error. It seems this feature has been depreciated in the latest version of scikit learn. Instead they are recommending to use ColumnTransformer. I was able to get the same accuracy 0.9417050937281082. Another thing i wanted to know, when you had initially used label encoder and converted categorical values to numbers, why we specified the first column as categorical, when it was already integer value ?

  • @bharathdwarakanath1587
    @bharathdwarakanath1587 4 года назад

    The label encoding done for the independent variable column, 'town' in the second half of the video, I think, isn't needed. Instead just doing One Hot Encoding is enough. Wonderful contribution anyway. Thanks!!

  • @betzthomas9693
    @betzthomas9693 4 года назад +1

    Sir ,what is the best method to do label encoding for job designations like (management
    ,blue-collar,technician etc) .Please let me know the best practice.

  • @AruLcomments
    @AruLcomments 4 года назад

    You are doing a wonderful job, people like you inspire me to learn and share the knowledge i gain. It is very useful for me. All the best.

  • @shekharbabar2496
    @shekharbabar2496 4 года назад

    the best video series on ML sir ....Thank you very much sir....

  • @preetipisupati2308
    @preetipisupati2308 4 года назад +2

    Thanks for the excellent video.. but due to the recent enhancements, ColumnTransformer from sklearn.compose is to be used for OneHotEncoding.

    • @codebasics
      @codebasics  4 года назад

      Preeti, can you give me a pull request.

  • @uvinodh90
    @uvinodh90 5 лет назад +2

    Thanks for the excellent tutorial....
    I see there is a decrease in score between this and the exercise data. Maybe due to an extra column in exercise data ? With increase in columns on X, Will the linearRegression score decrease ?

  • @hamzazidan6093
    @hamzazidan6093 2 месяца назад

    Iam here from 2024 after 6 years and I want to say that this playlist is wonderful!
    I hope that you update it because there're many changes in the syntax of sklearn now

    • @codebasics
      @codebasics  2 месяца назад +1

      Hey next week I am launching an ML course on codebasics.io which will address this issue. It has the latest API, in depth math and end to end projects.

  • @debaratighatak2211
    @debaratighatak2211 3 года назад +1

    I learned a lot from the exercise that you gave at the end of the video, thank you so much sir!

  • @Adnan25048
    @Adnan25048 5 лет назад +1

    That's a great tutorial of one-hot encoding. I was unable to find a complete example anywhere. Thanks for sharing.

    • @codebasics
      @codebasics  5 лет назад +1

      Thanks Adnan for your valuable feedback

  • @tanmayck9887
    @tanmayck9887 2 года назад +1

    Why did we apply LabelEncoder & then OneHotEncoding in 2nd method as we can directly apply OHE itself to thre data?

  • @ayushi6424
    @ayushi6424 4 года назад

    Sir please so label encoding also...and also the difference between label encoding and one hot encoding

  • @nationhlohlomi9333
    @nationhlohlomi9333 Год назад

    A PLACE TO RUN TO WHEN ONE IS STUCK, THANK UOU SO MUCH SIR

  • @mayanktripathi4u
    @mayanktripathi4u 5 лет назад +2

    Hi Sir, how to select/ choose the correct model for prediction. Is there a way? Please create a tutorial on this, and if already have please share the link.

    • @codebasics
      @codebasics  5 лет назад +2

      Mayank, selecting appropriate model for given problem is an art as well as science. I will probably create a separate tutorial on this but to give you an idea, what people do is: do exploratory data analysis and visualization to first find out the nature of the dataset. Based on these visualization and primary data analysis you might get an idea on what set of models might be worth using. Then you try multiple models to find out their performance (or score).one technique to use is k fold cross validation. That will evaluate performance of various models for your dataset. K fold will help you identify best model for a given dataset. Again there is no fixed technique to find final answer, it is something like you use popular approaches and some trial error to find which model will work best for you. Hope this helps!

    • @sujithramanathan3275
      @sujithramanathan3275 4 года назад

      @@codebasics Thanks for your time. It would be great, If we get tutorial from you for " To find a best model for given dataset. "

  • @Dim-zt5ei
    @Dim-zt5ei Год назад +1

    Great videos! Unfortunately it becomes harder and harder to code in the same time as the video because there are more and more changes in the libraries you use. For example sklearn library removed categorical_features parameter for onehotencoder class. It was also the case for other videos from the playlist. Would be great to have the same playlist in 2022 :)

    • @codebasics
      @codebasics  Год назад +1

      Point noted. I will redo this playlist when I get some free time from tons of priorities that are in my plate at the moment

    • @Dim-zt5ei
      @Dim-zt5ei Год назад +1

      @@codebasics Thank you for the reply and again : Great job for all the quality tutorials!

  • @yahiagamal2876
    @yahiagamal2876 2 года назад +2

    Hi sir,
    thank you for trying to simplify ML.
    But honestly, this lecture has many unclarified steps, like:
    why did you convert x to a 2-dimensional array?
    after encoding on the ohe dataframe, you made many amendments with very very fast clarifications!
    last thing, while executing the below command, it always gives an error, although I check everything is ok:
    from sklearn.preprocessing import OneHotEncoder
    ohe = OneHotEncoder(categorical_features=[0])
    Thank you again and looking forward hearing from you
    BR
    YG

  • @mallikasrivastava
    @mallikasrivastava 3 года назад +1

    Your videos are awesome

  • @piyushjha8888
    @piyushjha8888 4 года назад

    model.predict([[45000,4,0,0]])=array([[36991.31721061]]),
    model.predict([[86000,7,0,1]])=array([[11080.74313219]]),
    model.score(X,Y)=0.9417050937281082.
    Thanks sir for these exercise

  • @elinem5311
    @elinem5311 4 года назад +1

    thank you, this helped me so much with multivariate regression with many categorical features!

  • @rooshanghous6912
    @rooshanghous6912 9 месяцев назад

    This is an amazing tutorial! saved me so much time and brought so much clarity!!! Thank you!

  • @jayshreedonga2833
    @jayshreedonga2833 Год назад

    thanks sir nice lecture
    sir you are really a great teacher
    you teach everything so nicely
    even tough thing becomes easy when you teach
    thanks a lot

  • @ayushmanjena5362
    @ayushmanjena5362 2 года назад +1

    15:50 write this code
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.compose import ColumnTransformer
    ct = ColumnTransformer([('town', OneHotEncoder(), [0])], remainder = 'passthrough')
    x = ct.fit_transform(x)
    x

  • @ramakrishnayellela7455
    @ramakrishnayellela7455 10 месяцев назад +1

    The parameters in OneHotEncoder are updated their is no parameter of categorical_features and it gives an error like on OneHotEncoder their is no parameter like categorical_features can any one know solution

  • @leooel4650
    @leooel4650 5 лет назад +1

    Mercedes = array([[36991.31721061]])
    BMW = array([[11450.86522658]])
    Accuracy = 0.9417050937281082
    Thanks for your time and knowledge once again!

  • @geekyprogrammer4831
    @geekyprogrammer4831 3 года назад

    This is really the best series to get started with ML

  • @thanusan
    @thanusan 5 лет назад +3

    Excellent video - thank you!

  • @late_nights
    @late_nights 4 года назад +10

    If anyone got struck at One hot encoder at 16:26 then type this command and execute pip install -U scikit-learn==0.20

  • @manasaraju8552
    @manasaraju8552 Год назад

    difficult topics are easily understood, Thank you so much for the content sir

  • @gisantarem
    @gisantarem 3 года назад

    Tks a lot for the tutorial, but I have a doubt: why didn´t u split the dataset in train and test? It seems u used your entire dataset to train the model, didn´t u?

  • @8biitbluee183
    @8biitbluee183 23 дня назад

    Hi, just saw the video. however, i have a question, what is the difference between dummy variable and one hot encoder since they require to drop one of the columns?

  • @souravdey8236
    @souravdey8236 2 года назад +1

    After applying get_dummies the order of values in the column get changed how we make the same order maintained
    Ex: A | A B C
    B |
    C |
    I want this order maintained pls tell how will i insure this

  • @kenzhao6236
    @kenzhao6236 3 года назад +1

    if your input [26] code can't compile successfully, try:
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.compose import ColumnTransformer
    ct = ColumnTransformer(
    [('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],
    remainder='passthrough')
    x = ct.fit_transform(x)
    x

  • @asamadawais
    @asamadawais 2 года назад

    Simply excellent explanation with very simple examples!

  • @vipin2872
    @vipin2872 15 дней назад

    In Onehot encoding method, In X only 2 Features were there, how come during prediction you are using 3 sir, its confusing 🤔

  • @brijesh0808
    @brijesh0808 4 года назад +2

    @13:20 we need to do :
    dfle = df.copy() ?
    because otherwise changes in dfle will reflect back to df
    Thanks :)

  • @sarafatima2252
    @sarafatima2252 3 года назад

    definitely one of the best videos to learn from!

  • @farjadmir8842
    @farjadmir8842 4 года назад

    I also got them correct. Sir, this course is amazing. You have made it so easy to understand.

  • @ramanandr7562
    @ramanandr7562 Год назад

    Thank you sir🎉. You made my ML Journey Better.. 🤩

  • @irmscher9
    @irmscher9 5 лет назад +1

    Why 'X' is capitalized and 'y' is not?

  • @claude-olivierbatungwanayo9059
    @claude-olivierbatungwanayo9059 6 лет назад +1

    Excellent as usual!

  • @mapa5000
    @mapa5000 Год назад

    You make it easy with your explanation !! Thank you !!

  • @mithunjain4834
    @mithunjain4834 3 года назад +2

    hey I got type error for "OneHotEncoder(categorical_features=[0])" when trying the exercise problem

    • @bindiyaroy8701
      @bindiyaroy8701 3 года назад +1

      Latest build of sklear library removed categorical_features parameter for onehotencoder class

  • @maxb.w5170
    @maxb.w5170 2 года назад

    What would help inform a decision to drop one of the dummy variables? You mentioned the linear regression classifier will typically be able to handle and nonlinear interaction between most dummy variables. When should we drop one?

  • @halibrahim
    @halibrahim Год назад +2

    Great tutorial, but the code is outdated; starting from version 0.20, OneHotEncoder no longer accepts categorical_features as a parameter. Instead, you should use the ColumnTransformer class to specify which columns in your input data should be one-hot encoded. Here is a more updated code
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.compose import ColumnTransformer
    # define the column transformer
    ct = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
    # fit and transform the input data
    X = ct.fit_transform(X)

  • @scriptfox614
    @scriptfox614 4 года назад

    The import linear regression statement lol. Amazing tutorial. :D

  • @nikhilrajput5030
    @nikhilrajput5030 4 года назад +1

    model is also predicting, When we give two categories as 1. That is not true because either car is mercedes or it is BMW X5. Please sir help me out how to solve this error?

  • @MrArunlama
    @MrArunlama 9 месяцев назад

    I was learning through a paid course, and then I had to come here to understand this concept of dummy variable.

  • @Piyush-yp2po
    @Piyush-yp2po Месяц назад

    For exercise i only used the first method using pandas dummies thingy and worked well, also the other method using one hot encoding and linear encoder was confusing for me, should i focus more on learning the other way, or method 1 is fine

  • @prashantgajjar1431
    @prashantgajjar1431 3 года назад +2

    Hi, can we know which interger values are assigned to the categories through 'LabelEncoder'. Here, only three categories are available, so it can be easily distinguished Encoded values. What if there are 10 categories available, and how to know the exact Encoded values with respect to Categories?

  • @felixgallo5132
    @felixgallo5132 3 года назад

    They're basically the same however pd.dummy variables are easier to use.
    Thank u, sir.

  • @judeleon8485
    @judeleon8485 3 года назад +1

    Thanks for your very nice and explanatory tutorial. However, I have two questions.
    First, in situation where you have more than one column to be encoded, how do you go about it using OneHotEncoder.
    Second, in your video you did not mention columnencoder but it's seen in notebook on github repository. Am I missing something?

    • @codebasics
      @codebasics  3 года назад +4

      when I uploaded this video sklearn api was different. later they added columnencoder so i updated notebook but of course video doesn't have it as it is old. For your first question, for multiple columns you use exactly same method and encode those columns too

    • @Justahobbylife
      @Justahobbylife 3 года назад

      @@codebasics Thanks. I had the same Question

  • @-GCET-RiyaGupta
    @-GCET-RiyaGupta 3 года назад

    Sir , I am not able to access this 'Mercedes Benz C class' variable .It's giving KEY ERROR. I am practicing on the jupyter notebook.
    Although all other variables are accessible.

  • @swaruppanda2842
    @swaruppanda2842 5 лет назад +1

    nicely explained👌