Machine Learning / Deep Learning Tutorials for Programmers playlist: ruclips.net/p/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: ruclips.net/p/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
They are hypocrites,, money money money,,, they have turned RUclips like a third class blog site which is flooded with ads,,, This video is phenomenal, thank you
OMG. Such a brilliant explanation. I have been watching your videos from past almost 1 hour and couldn't stop myself to watch the next one each time. Subscribed, liked and commented. Thank you so much. You are life saver!
While the concept itself is incredibly simple, I keep hearing so many people over-complicate or not explain why it's specifically called "one-hot" encoding. Great video! Exceptionally well explained!
Just went through the entire series to date and found it very helpful and easy to follow. I'm sure you have.a plan all mapped out but further topics might be in why you would want/need to add layers, how many nodes/neurons you'd want in each layer and how to optimize those (I.e. find out you don't have enough or too many), finding out when you have hit local minimums, choosing different functions such as relu, tanh, etc. You're really on a good roll here and I look forward to more!
- One Hot encoding is one way of converting categorical data into a (sparase) vector - Each one hot code represent the category of the data. So that machine can interpret each word. - As you add more category, the length of the vector increases.
Much easier than i expected. But why can't the NN take these inputes as decimal numbers? Like 1 2 3 or as the sum of the ascii equivalents of each Character in their name?
I have a question: if i have 4 columns in my dataSet, so the input layer of my NN will be as [x1, x2, x3, x4] right ! suppose we have categorical column with 4 categories... my imagination is, when we apply the one-hot encoding to the categorical column we will get more 4 columns for [cate1, cate2, cate3, cate4] right? the question is that i'm right? or it will be compressed into one label. if i'm right that's mean our input layer in the NN will be [ numerical columns + all categories ]
This is an awesome Series. Clear, concise and crisp. A question though, I trained a Vgg19 network using Pytorch to classify species of flowers. The categories were 102: 0 through 101. I din't have to one hot encode them, so when do you have to do this? I get that you may want to do it when you have multiple classes for the same input. You could then read off a vector like [1 0 0 1] as 1st and 4th class. The other reason given is that the model may give greater weight to a higher number, so does that mean it will predict the higher integer more often, can anyone shed some light on this? I got decent results (~ 95% accuracy on validation) without one-hot encoding. Do you have to always one-hot encode? Thanks!
pardon my stupidity, but why do we not use numbers instead of a binary vector for such classifications? would'nt it be more logical to use numbers instead when we have larger data sets? as in: 0-dog 1-cat 2-lizard 3-llama and so on?
is there an encoding type that uses all possible variations of binary numbers? for example, we can use 2 binary to represent 4 classes (00,01,10,11). I just thought that would make more space efficient.
Hey Dr. Khan - The network will only predict classes for which it has a corresponding output class. For example, if the network only knows about dogs and cats (2 output nodes), and we pass it a snake, then the network will give corresponding probabilities only for whether it thinks the snake image is more like a dog or a cat. In the snake case, the network may output around 50% probability for dog and 50% for cat. This indicates that the network is not confident at classifying this sample, and the prediction is no better than chance, which makes sense because the image is neither a dog or cat.
I do understand one-hot-encoding but why is it necessary to use random vectors for classification instead of a random variable? Why is it not sufficient to use a mapping 1 for cat, 2 for dog, 3 for lizard and so on?
Another great video. Just curious, though, I keep hearing that at least 80% of machine learning involves Data Cleaning. Is that the case? If so, do you have any video training on that?
Hey thanks for the amazing explanation!! But could I know why we simply can't assign them integer values? For instance map dog to 0, lizard to 1, cat to 2 and so on.... Why is one hot encoding needed?
Because these names of animals are categorical variables i.e., there is no relationship between cat and dog. They are discrete names. And the same goes for the city's name e.g., Boston, New York etc. So when you encode them integer wise, it is a misrepresentation of the data, since the algorithm would interpret them as the interval variables i.e., you could 3 is bigger than 1, 2 is bigger than 1. So the algorithm will try to harness this relationship which is not correct. Therefore we encode categorical variables into one-hot encoding.
What if I have features in factors( cat,dog,lizard,llama) itself like what if I want my algorithm to differentiate like black dog ,white cat, small lizard,big llama? @deeplizard
Hey Karan - For this, you would need labels and output classes for each possible category. For example, rather than the class being cat, you would instead have more descriptive and granular classes like black cat, white cat, orange cat, etc.
Suppose cat is 1 and dog is 2, lizard is 3 and it predicts 1.5 as a single variable? Would it be reliable? Even more so, if a picture has ambiguity between dogs and lizards in the image (stupid analogy), this could pose a problem and the predictor may make an entirely incorrect prediction of a value close to 2. One hot encodings put independent, separate, equal weights for every categorical prediction. As encoding is binary, the prediction could be: [0.6, 0, 0.4] and the prediction would select the highest number, so choice 1 with the second choice 3. This is more robust because it selects the outcome it's most confident in, as the NN is a probalistic model, and each of the outcomes does not interfere with each other in an unusual way. The neural network works with numbers, and Categorical values are not really numbers. It has to be translated into a pure, numerical value that it can work with. A binary set of "yes it is" or "no it's not" array works well
{ "question": "In supervised learning, the index of the \"hot one\" corresponds to a(n):", "choices": [ "Label", "Action", "Function", "Neuron" ], "answer": "Label", "creator": "Chris", "creationDate": "2019-12-13T04:14:11.847Z" }
Shouldn't we always leave one category out when one-hot encoding? as it impacts our models by introducing correlation. Is it not always the case or was this left out not to over complicated the concept?. Thank you for the explanation and the whole series.
Deeplizard, I did a "deep learning" course on udemy. Your course is incomparably more thought-out, refined and the graphical side beats the hell out of all other courses. And all this for free! I am glad that there are such people and I hope that you will never run out of motivation to continue sharing your knowledge. All the best for you and keep up the great job. P.S. your voice is sexy
Thank you again for these videos! I would like to know if this is the best approach for large data.. for instance, I have 10000 categories, which approach would be adequate?
Hey Anton - Good question. One-hot encoding wouldn't be preferred in a scenario where your labels have a natural order to them, _and_ you want to preserve this order. If your labels were, say, height of individuals, then you may not want to one-hot encode them because, if you did, then your network wouldn't be able to extract/learn anything regarding the order.
now i just need a vector to contain every single english world plus any words that don't exist yet or/and typos. it's going to be one really really long vector... (ya not using this method😅)
The normal integer encoding can be done for ordinal data. Your example does not have any ordinal relationship between them so one hot encoding is preferred.
Hm.. I think this is a very complex looking explanation for a very simple topic "one-hot encoding". Maybe you could add "deep learning" to the title. That way people would see, that this is not (only) a simple explanation on the very old and easy one-hot encoding.
No, this explanation is not complete. I would have expected to hear WHY one should use One-Hot (e.g. categorical cross-entroppy in TensorFLow) instead of ordinal category indexes (sparse categorical entropy). Following Richard Feynmans statement, "You cannot explain stuff well until you really understand it."... so I wonder if the channel owner knows what the one-hot encoding is, but not really how it could help). And one of the first question of a beginner (who sees the first set of tutorials and examples) will be, "Why one-hot [ 1, 0, 0 ] and not just [ 0, 1, 2 ] (ordinal)??". (In terms of TensorFlow training, e.g. CNN on an multi-category image dataset like CIFAR or MNIST, both approaches are possible, but which one would yield a better NN performance? Would there be a difference at all?)
All 6 minutes could have been easily squeezed to 6 seconds by saying that the labels are passed in a vector. OK, max 30 secs. The term one-hot is not explained btw.
I love the channel. Kinda dislike this video tho :O .... it's not complicated what you're saying. This should be a two minute video, not drawn out to six.
Spent too much time talking about how categories map to the vectors and not enough time on why you one hot encoding even matters. Like, why don't we just assign a number for each categories rather than making it a vector?
OMG. Such a brilliant explanation. I have been watching your videos for almost 3 hours and couldn't stop myself to watch the next one each time. Subscribed, liked and commented. Thank you so much.
Machine Learning / Deep Learning Tutorials for Programmers playlist: ruclips.net/p/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU
Keras Machine Learning / Deep Learning Tutorial playlist: ruclips.net/p/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
RUclips recommendation system must be really broken since it is not recommending me this magic!
Thanks Harsh! Hopefully soon 🦎🚀
They are hypocrites,, money money money,,, they have turned RUclips like a third class blog site which is flooded with ads,,,
This video is phenomenal, thank you
OMG. Such a brilliant explanation. I have been watching your videos from past almost 1 hour and couldn't stop myself to watch the next one each time. Subscribed, liked and commented. Thank you so much. You are life saver!
I'm in a coding bootcamp and this explained the concept far better than the lesson.
A very high-quality video -- I got the idea in no time. Thanks.
2:40 watching starting here for those 10 seconds just clicked it for me. Thanks
Greatest explanation I ever heared! Thank you!
Within 3 minutes my minds been completely opened
While the concept itself is incredibly simple, I keep hearing so many people over-complicate or not explain why it's specifically called "one-hot" encoding. Great video! Exceptionally well explained!
Thank you, Sterling!
Just went through the entire series to date and found it very helpful and easy to follow. I'm sure you have.a plan all mapped out but further topics might be in why you would want/need to add layers, how many nodes/neurons you'd want in each layer and how to optimize those (I.e. find out you don't have enough or too many), finding out when you have hit local minimums, choosing different functions such as relu, tanh, etc. You're really on a good roll here and I look forward to more!
Thanks for the suggestions, Brent! I'll definitely keep these topics in mind for future videos. I'm glad you're finding the videos helpful so far!
Good quality of both video and blogs, Really amazed by your work!!
This explanation was super clear! Awesomeee
😊perfect explanation! Loved the animals too 🐈⬛🐕🦎🦙 thanks!!
Easy to understand, even for foreigners. Thanks a lot! :)
Thanks a lot. Saved my 3 hours.
amazingly explained!!
Thank you so much for this video! Now I finally understand what this term means!
This was great, thank you for sharing
Thank you so much :) Clear and to the point explanation!!!!!
Thank you really cleared my concept. Big Fan
Love the channel!!! Killin it! Subscribed!
Just Woow!! Thank You for this video!! Keep Doing the Good work :)
Thank God I came across your channel, scientist these days just like to sound smart
This was really well explained. Thank you very much.
You did a great job explaining!
great video! really good teaching, simple and engaging. bless you for making this
Very helping...Thank you.
It's really very well explained
That was soo helpful! Thanks so much for the animations and detail in the explaination
- One Hot encoding is one way of converting categorical data into a (sparase) vector
- Each one hot code represent the category of the data. So that machine can interpret each word.
- As you add more category, the length of the vector increases.
It reminds me of dummy variables.
Great video, thanks,
The playlist could be split into ANN and CNN, so the list is shorter and more focused.
thank you very much for this clear and helpful explanation.
great explanation, thanks
this video is just amazing, thank you very much
Thanks for explaining in layman terms.
Liked and subscribed.!
To the point explanation.. Thank you for this quality content :)
Thank you! You explain things really well! 😃
Great content!!!
i'm still not clear on why the image tags couldn't be encoded as numerical values just going up from 0? What is the benefit behind this?
Nice explanation
best & clear explanation
Thanks a lot!
what if you want to have both categorical and continuous inputs to the same NN?
Much easier than i expected. But why can't the NN take these inputes as decimal numbers? Like 1 2 3 or as the sum of the ascii equivalents of each Character in their name?
well explained!!
thank you, this was really helpful !!
4:26 I thought [0,0,0,1] is going to be a turtle 😂. thanks again for the videos
I have a question:
if i have 4 columns in my dataSet, so the input layer of my NN will be as [x1, x2, x3, x4] right !
suppose we have categorical column with 4 categories...
my imagination is, when we apply the one-hot encoding to the categorical column we will get more 4 columns for [cate1, cate2, cate3, cate4] right?
the question is that i'm right? or it will be compressed into one label. if i'm right that's mean our input layer in the NN will be [ numerical columns + all categories ]
This is an awesome Series. Clear, concise and crisp. A question though, I trained a Vgg19 network using Pytorch to classify species of flowers. The categories were 102: 0 through 101. I din't have to one hot encode them, so when do you have to do this? I get that you may want to do it when you have multiple classes for the same input. You could then read off a vector like [1 0 0 1] as 1st and 4th class. The other reason given is that the model
may give greater weight to a higher number, so does that mean it will predict the higher integer more often, can anyone shed some light on this? I got decent results (~ 95% accuracy on validation) without one-hot encoding. Do you have to always one-hot encode? Thanks!
great explanation
Well explain
Great explanation. You mentioned about your previous video to check. Can you post the link?
Yes, here you go!
ruclips.net/video/pZoy_j3YsQg/видео.html
brilliant explanation
Do gradient boosted trees also use one hot encoding?
Cool but what happens if we add a fifth elemend? How does the vector look like then?
/s
one more vector is added of 0 and 1 with length 5. it could be [0,0,0,0,1]
@@sushilarya6994 woooosh
pardon my stupidity, but why do we not use numbers instead of a binary vector for such classifications?
would'nt it be more logical to use numbers instead when we have larger data sets?
as in:
0-dog
1-cat
2-lizard
3-llama
and so on?
is there an encoding type that uses all possible variations of binary numbers? for example, we can use 2 binary to represent 4 classes (00,01,10,11). I just thought that would make more space efficient.
Good explanation, keep doing more videos :)
Thanks a lot😁
great explanation!
great video - thanks!
what about a snake or a horse? when we use softmax the overall probability must be equal to 1. How will the neural network know to output all 0s?
Hey Dr. Khan - The network will only predict classes for which it has a corresponding output class. For example, if the network only knows about dogs and cats (2 output nodes), and we pass it a snake, then the network will give corresponding probabilities only for whether it thinks the snake image is more like a dog or a cat. In the snake case, the network may output around 50% probability for dog and 50% for cat. This indicates that the network is not confident at classifying this sample, and the prediction is no better than chance, which makes sense because the image is neither a dog or cat.
subscribed. no questions asked ;)
I do understand one-hot-encoding but why is it necessary to use random vectors for classification instead of a random variable? Why is it not sufficient to use a mapping 1 for cat, 2 for dog, 3 for lizard and so on?
Thumbs up!
Thanks
Another great video. Just curious, though, I keep hearing that at least 80% of machine learning involves Data Cleaning. Is that the case? If so, do you have any video training on that?
But how do you code this?
Thank you :D
What about integer Encoding?
Thanks !
it answered my question, thanks,
Nice Video :)
When should we use this encoding? why not just simply assign dog to nr 1 cat to nr 2 and so on?
Hey thanks for the amazing explanation!! But could I know why we simply can't assign them integer values? For instance map dog to 0, lizard to 1, cat to 2 and so on.... Why is one hot encoding needed?
Because these names of animals are categorical variables i.e., there is no relationship between cat and dog. They are discrete names. And the same goes for the city's name e.g., Boston, New York etc. So when you encode them integer wise, it is a misrepresentation of the data, since the algorithm would interpret them as the interval variables i.e., you could 3 is bigger than 1, 2 is bigger than 1. So the algorithm will try to harness this relationship which is not correct. Therefore we encode categorical variables into one-hot encoding.
What if I have features in factors( cat,dog,lizard,llama) itself like what if I want my algorithm to differentiate like black dog ,white cat, small lizard,big llama? @deeplizard
Hey Karan - For this, you would need labels and output classes for each possible category. For example, rather than the class being cat, you would instead have more descriptive and granular classes like black cat, white cat, orange cat, etc.
What's the benefit of doing one-hot encoding as opposed to label encoding
Suppose cat is 1 and dog is 2, lizard is 3 and it predicts 1.5 as a single variable? Would it be reliable? Even more so, if a picture has ambiguity between dogs and lizards in the image (stupid analogy), this could pose a problem and the predictor may make an entirely incorrect prediction of a value close to 2. One hot encodings put independent, separate, equal weights for every categorical prediction. As encoding is binary, the prediction could be: [0.6, 0, 0.4] and the prediction would select the highest number, so choice 1 with the second choice 3. This is more robust because it selects the outcome it's most confident in, as the NN is a probalistic model, and each of the outcomes does not interfere with each other in an unusual way. The neural network works with numbers, and Categorical values are not really numbers. It has to be translated into a pure, numerical value that it can work with. A binary set of "yes it is" or "no it's not" array works well
{
"question": "In supervised learning, the index of the \"hot one\" corresponds to a(n):",
"choices": [
"Label",
"Action",
"Function",
"Neuron"
],
"answer": "Label",
"creator": "Chris",
"creationDate": "2019-12-13T04:14:11.847Z"
}
Thanks, Chris! Just added your question to deeplizard.com
Great vid, thanks, subscribed.
Now, is that lizard ([0, 0, 1]) called Elizabeth?
Haha no, it's not 🦎
@@deeplizard hadda ask....
namemes que bonito vídeo!!
Shouldn't we always leave one category out when one-hot encoding? as it impacts our models by introducing correlation. Is it not always the case or was this left out not to over complicated the concept?. Thank you for the explanation and the whole series.
Deeplizard, I did a "deep learning" course on udemy. Your course is incomparably more thought-out, refined and the graphical side beats the hell out of all other courses. And all this for free!
I am glad that there are such people and I hope that you will never run out of motivation to continue sharing your knowledge. All the best for you and keep up the great job.
P.S. your voice is sexy
Thank you again for these videos! I would like to know if this is the best approach for large data.. for instance, I have 10000 categories, which approach would be adequate?
Hey Isaque - Yes, one-hot encoding is still be feasible for a larger category space, like the one you mentioned.
You rock!
Appreciate that, silenta :)
Thank youuu :D
What is an case where One-hot encoding isn't the preferred method?
Hey Anton - Good question. One-hot encoding wouldn't be preferred in a scenario where your labels have a natural order to them, _and_ you want to preserve this order. If your labels were, say, height of individuals, then you may not want to one-hot encode them because, if you did, then your network wouldn't be able to extract/learn anything regarding the order.
great explanation! Thank you
i think the better explanation is one hot works better on classification rather than regression
now i just need a vector to contain every single english world plus any words that don't exist yet or/and typos.
it's going to be one really really long vector...
(ya not using this method😅)
Why would you use one-hot encoding when you can use integers?
For example, 1 maps to cat, 2 maps to dog, and 3 maps to lizard.
The normal integer encoding can be done for ordinal data. Your example does not have any ordinal relationship between them so one hot encoding is preferred.
Why is one-hot encoding used?
I’ve never seen something so simple so over-explained.
Yeah. It is. Big 🧠 💪
Why One-hot Encoding is used? Why the label can't be 1, 2 ,3 and so on?
Hm.. I think this is a very complex looking explanation for a very simple topic "one-hot encoding". Maybe you could add "deep learning" to the title. That way people would see, that this is not (only) a simple explanation on the very old and easy one-hot encoding.
No, this explanation is not complete. I would have expected to hear WHY one should use One-Hot (e.g. categorical cross-entroppy in TensorFLow) instead of ordinal category indexes (sparse categorical entropy). Following Richard Feynmans statement, "You cannot explain stuff well until you really understand it."... so I wonder if the channel owner knows what the one-hot encoding is, but not really how it could help). And one of the first question of a beginner (who sees the first set of tutorials and examples) will be, "Why one-hot [ 1, 0, 0 ] and not just [ 0, 1, 2 ] (ordinal)??". (In terms of TensorFlow training, e.g. CNN on an multi-category image dataset like CIFAR or MNIST, both approaches are possible, but which one would yield a better NN performance? Would there be a difference at all?)
This is not specific to neural networks. Other than that good work.
You sure you want to say length and not dimension? The length of a vecter is its magnitude.
All 6 minutes could have been easily squeezed to 6 seconds by saying that the labels are passed in a vector. OK, max 30 secs. The term one-hot is not explained btw.
can you please replace my professor
(...) who came up with this name?
I love the channel. Kinda dislike this video tho :O .... it's not complicated what you're saying. This should be a two minute video, not drawn out to six.
Spent too much time talking about how categories map to the vectors and not enough time on why you one hot encoding even matters. Like, why don't we just assign a number for each categories rather than making it a vector?
OMG. Such a brilliant explanation. I have been watching your videos for almost 3 hours and couldn't stop myself to watch the next one each time. Subscribed, liked and commented. Thank you so much.