Train, Test, & Validation Sets explained

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024

Комментарии • 184

  • @deeplizard
    @deeplizard  6 лет назад +8

    Machine Learning / Deep Learning Tutorials for Programmers playlist: ruclips.net/p/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU
    Keras Machine Learning / Deep Learning Tutorial playlist: ruclips.net/p/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL

  • @messapatingy
    @messapatingy 6 лет назад +43

    I love how concise (waffle free) your videos are.

  • @dwosllk
    @dwosllk 5 лет назад +70

    You said that “test set is unlabeled” but actually it is a labeled dataset. Of course it could be unlabeled because it isn’t adding anything to the model while it is training, but we use a labeled test set to quickly determine our models performance when it has finished training.

    • @alfianabdulhalin1873
      @alfianabdulhalin1873 4 года назад +5

      Hi @Gábor Pelesz
      ... That's what I thought too. I was wondering if I could get your insights on the main difference between validation and test set. From what I understand, the validation set is used with training. Meaning, after training say a Logistic Regression model (~100,000 iterations with specific hyperparamters)... then we deploy the validation set on this trained model... after which some metrics are calculated. If the error is bad, then we tune the hyperparamters, or do whatever is necessary... and then train some more based on the changes. Then after that... we validate again using the validation data... and this goes on until the get a satisfactory error chart.
      Wouldn't the TEST set now be redundant? Since we already achieved good performance on the validation set. From what I've self-learned, we basically sample all sets (training, validation and test), from basically the same distribution... right? Would appreciate any insights.

    • @dwosllk
      @dwosllk 4 года назад +17

      @@alfianabdulhalin1873 1. In an ideal world, where we can train with data that is completely cover the space of the variables, the test set might be useless because it isn't adding any information to us (i.e redundant, was already in the training set). Therefore our models performance would be exactly what it achieved while training. But sadly we are so far from this world that with additional test sets, we are only able to speculate the performance of our models. So summing up, training and validation set is, let's say, 80%. The 20% that's left is more likely (and also it is important) to be unique and different.
      2. We are training with the training set so our model is most biased towards the training set. Let's assume the model is tested against the validation set after the model went through all our data once and want to start over for another iteration (i.e if we have 10 training samples, then after every 10th step we test against validation set). While validating, we modify some hyperparameters accordingly (e.g. learning rate). What's important is that we change things after seeing how our validation tests performed, thus our model is also biased towards the validation set (although not as much as towards the training set). This emphasizes the relevance of a test set, a set of datapoints that the model probably never seen before (also the test set is important to be unique, different than the others, to make sense).
      Hope your questions are answered!

    • @tamoorkhan3262
      @tamoorkhan3262 4 года назад +1

      Yes, the reason we pass labels with test data is to determine the accuracy, otherwise, those labels play no other role. It is like, you pass your unlabelled test data through the model collect all the predictions and then using the correct labels to compute the accuracy.

    • @aroonsubway2079
      @aroonsubway2079 2 года назад +1

      @@tamoorkhan3262 Do you mean test dataset is another validation set? After all, they are the same in the sense that their labels will not be used to update model parameter, and their labels are only used to generate some accuracy numbers.

    • @mikeguitar-michelerossi8195
      @mikeguitar-michelerossi8195 2 года назад +5

      ​@@aroonsubway2079 To the best of my knowledge the main point in distringuishing between validation set and test set is the following. During the training phase, we want to maximize the performance (accuracy) calculated on the validation set. By doing this after a while we are adjusting hyperparameters (n' of neurons, activation functions, n' of epochs...) to perform well in "that particular" validation set! (That's why cross-validation is generally a good choce)
      The test set should be considered "one shot". We do not generally adjust hyperparameters to have a better performance on test set, because that was the role of the validation set. (Also the test set is labelled)
      It's an approximation but in general:
      👉 train set -> to adjust weigths of our model
      👉 valid set -> to adjust hyperparmaters
      👉 test set -> calculate final accuracy

  • @mohamedlichouri5324
    @mohamedlichouri5324 5 лет назад +10

    I have finally understood the difference between the validation and test sets as well as the importance of the validation set. Thanks for the clear and sample explanation.

    • @adanegebretsadik8390
      @adanegebretsadik8390 5 лет назад +1

      Mr. mohamed could you please tell me the importance of validation set and how to prepare because i don't understand it well?
      thank you

    • @mohamedlichouri5324
      @mohamedlichouri5324 5 лет назад +6

      @@adanegebretsadik8390 Lets consider that you have a dataset D which we will split as follow:
      1- 70% Of D as train set = T'
      2- 30% of D as test set = S
      We will further split the train set T' as follow:
      1- 70% of T as train set = T
      2- 30% of T as valid set = V
      To construct a good classifier model we need it to learn all the important information related to T and then validate it first on D than test it in final on S.
      The perfect model will have a good score in both D and T.
      As a simple explanation to this setup would be this: If you are learning a new course (Machine Learning) D, you will have to pass some labs (V).
      If you have scored a good score in V you are eligible to pass the final test S with confidence. Otherwise you will have to re-learn the course material D and test yourself a second time on V until you achieve a good results in V.

    • @adanegebretsadik8390
      @adanegebretsadik8390 5 лет назад +1

      @@mohamedlichouri5324 thank you so much bro i finally understood but one which is not clear for me is how to split the V from train in keras python?
      again thank you

    • @mohamedlichouri5324
      @mohamedlichouri5324 5 лет назад +1

      @@adanegebretsadik8390 I often use train_test_split function like this:
      from sklearn.model_selection import train_test_split
      1- Split the data to 70% T' and 30% Test S.
      X_trn, X_test, y_trn, y_test = train_test_split(X, y, test_size=0.30)
      2- Resplit T' to 70% Train T and 30% Valid V.
      X_train, X_valid, y_train, y_valid = train_test_split(X_trn, y_trn, test_size=0.30)

    • @adanegebretsadik8390
      @adanegebretsadik8390 5 лет назад +1

      thank your sir. but i want to contact you to share your deep knowledge about machine learning since all the tips that i get from you are very essential for me till now. so do you mind if i contact you by social medias? for general information i am masters student in computer engineering so it may help me do my thesis.

  • @sagar11222
    @sagar11222 4 года назад +4

    Because of you, i am learning ANN during corona lockdown. Thank you very much.

  • @hmmoniruzzaman8537
    @hmmoniruzzaman8537 5 лет назад +6

    Really helpful. Finally understood the difference between Validation and Test set.

  • @ulysses_grant
    @ulysses_grant 4 года назад +1

    Ur videos are neat. I have even to pause them and digest all the information before moving on sometimes. Thanks for your work.

  • @hiroshiperera7107
    @hiroshiperera7107 6 лет назад +3

    Best video series so far found which explains the concepts of Neural networks :)

  • @sunainamukherjee7648
    @sunainamukherjee7648 2 года назад +2

    Loved all the videos and extremely clear with the concepts and the foundations of ML, often we run models but don't have in depth understanding of what exactly it is. Your explanation is by far the best across all videos I have seen. I can actually go ahead and explain the concepts to others with full clarity. Thank you so much for your efforts. One request, I think there is one concept that got missed, " regularizers ". It will be nice to have a short video on that too. Thanks again for your precious time and super awesome explanation. Looking forward to being an expert like you :)

    • @deeplizard
      @deeplizard  2 года назад

      Thanks, sunaina! Happy to hear how much you enjoyed the course :)
      Btw, regularization is covered here:
      ruclips.net/video/iuJgyiS7BKM/видео.html

  • @nahomghebremichael1956
    @nahomghebremichael1956 5 лет назад +2

    I am really appreciate how simply you explained the concept. Your videos really help me to get the basic concept of DNN

  • @gaborpajor3459
    @gaborpajor3459 Год назад +1

    well done; straightforward and clear; thanks a lot

  • @patchyst7577
    @patchyst7577 5 лет назад +4

    Very helpful, precise definition. I appreciate it :)

  • @MurodilDosmatov
    @MurodilDosmatov 2 года назад +1

    Thank you very much. I understood everything litteraly. Big thanks

  • @analuciademoraislimalucial6039
    @analuciademoraislimalucial6039 3 года назад +1

    Thanks Teacher!!! Gratefull

  • @ivomitdiamonds1901
    @ivomitdiamonds1901 5 лет назад +2

    Perfect rate at which you speak. Perfect.

  • @atakanbilgili4373
    @atakanbilgili4373 2 года назад +1

    Very clearly explained, thanks.

  • @TheBriza123
    @TheBriza123 5 лет назад +3

    Thanks a lot for these videos.. I was trying using CNN and Keras without explanation and I was just lost - now I get it.. Thx again angel

  • @mikeguitar-michelerossi8195
    @mikeguitar-michelerossi8195 2 года назад +1

    To the best of my knowledge the main point in distringuishing between validation set and test set is the following. During the training phase, we want to maximize the performance (accuracy) calculated on the validation set. By doing this after a while we are adjusting hyperparameters (n' of neurons, activation functions, n' of epochs...) to perform well in "that particular" validation set! (That's why cross-validation is generally a good choce)
    The test set should be considered "one shot". We do not generally adjust hyperparameters to have a better performance on test set, because that was the role of the validation set. (Also the test set is labelled)
    It's an approximation but in genral:
    👉 train set -> to adjust weigths of our model
    👉 valid set -> to adjust hyperparmaters
    👉 test set -> calculate final accuracy

  • @hiroshiperera7107
    @hiroshiperera7107 6 лет назад +1

    Best video series so far found which explains the concepts of Neural networks :) ... One small suggestion.. better if the font size of 'Jupiter Note book' is bit bigger. So it will be more easier to check the codes :)

    • @deeplizard
      @deeplizard  6 лет назад +1

      Thanks for the suggestion, Hiroshi! In later videos that show code, I've started zooming in to maximize the individual code cells I'm covering. As an example, you can see the code starting at 7:33 in this video: ruclips.net/video/ZjM_XQa5s6s/видео.htmlm33s
      Let me know what you think of this technique.

  • @draganatosic9638
    @draganatosic9638 6 лет назад +4

    Thank you for the video! Super concise and clear. If you could shortly mention some real world examples in the future videos, that would be great, I see in the comments that people have been wondering about similar things as I have. Or maybe you have done that, I'm about to check the other videos as well :)

    • @deeplizard
      @deeplizard  6 лет назад +2

      You're welcome, Dragana! And thank you!
      Yes, as the playlist progresses, I do introduce some examples. More hands-on examples (with code) are shown in the Keras and TensorFlow.js series below. Those series more so focus on how to implement the fundamental concepts we cover in this series.
      I hope you enjoy the next videos you check out!
      Keras: ruclips.net/p/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
      TensorFlow.js: ruclips.net/p/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ-

  • @tamoorkhan3262
    @tamoorkhan3262 4 года назад +1

    Loving this series. Concise and to the point. (Y)

  • @actechforlife
    @actechforlife 5 лет назад +12

    comments = "Thank you for your videos"

  • @maxmacken8859
    @maxmacken8859 2 года назад +1

    Fantastic video, one aspect I am confused about is what is the algorithm doing when it is 'training' the data? How does it train on data and how do we know it is correct? Do you have any videos on this question or know where I could look to understand? Thank you.

    • @deeplizard
      @deeplizard  2 года назад

      Yes, check out the Training and Learning lessons in the course:
      deeplizard.com/learn/playlist/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU

    • @sahilvishwakarma6509
      @sahilvishwakarma6509 2 года назад

      Check 3blue1brown's Neural Network videos

  • @annarauscher8536
    @annarauscher8536 3 года назад +1

    Thank you so much for your videos! You make machine learning so much more understandable and fun :) I really appreciate your passion! Keep it up!!!

  • @tymothylim6550
    @tymothylim6550 3 года назад +1

    Thank you very much for this video! It helps me get a quick understanding of the use of these 3 separate datasets (i.e. Train, Test and Validation)!

  • @qusayhamad7243
    @qusayhamad7243 3 года назад

    thank you very much for this clear and helpful explanation.

  • @wolfrinn2538
    @wolfrinn2538 3 года назад

    well i was reading deep learning with python and i got a bit lost, this video explained it to me very well so thank you and keep the hard work

  • @zenchiassassin283
    @zenchiassassin283 4 года назад +2

    Hello, do you have videos explaining different types of activation functions, when to use a specific one?
    And do you have a video about optimizers ? Like Momentum

    • @deeplizard
      @deeplizard  4 года назад +1

      This episode explains activation functions:
      deeplizard.com/learn/video/m0pIlLfpXWE
      No specific episode on optimizers, although we do have several explaining how a NN learns via SGD optimization.

  • @tomatosauce9561
    @tomatosauce9561 6 лет назад +3

    Such a great video!!!!! Thank you!!!!!!

  • @11MyName111
    @11MyName111 4 года назад +1

    One question:
    At 1:40 you said weights won't be updated based on validation loss. If so, how does validation set help us? Since we are not using it to update the model... Later at 1:57 you said it's used so model doesn't overfit. How? When does it come into play?
    Goes without saying, great video! I'm on a spree!

    • @deeplizard
      @deeplizard  4 года назад +1

      Hey Ivan - We, as trainers of the model, can use the validation set metrics as a monitor to tell whether or not the model is overfitting. If it is, we can make appropriate adjustments to the model. The model itself though is not using the validation set for learning purposes or weight updates.

    • @11MyName111
      @11MyName111 4 года назад +1

      @@deeplizard we adjust the model manually! Ok, it makes sense now :)
      Thank you!

  • @muhammadsaleh729
    @muhammadsaleh729 4 года назад

    Thank you for your clear explanation

  • @_WorldOrder
    @_WorldOrder 3 года назад +1

    i hope that i'll get reply, my question is that do i have to dig deeper in machine learning concepts for starting deep learning or all these fundamentals are fair enough to start deep learning btw thankyou for providing us such valuable content for free

    • @deeplizard
      @deeplizard  3 года назад +1

      Yes, you can start with this DL Fundamentals course without prior ML experience/knowledge. Check out the recommended deep learning roadmap on the homepage of deeplizard.com
      You can also see the prerequisites for each course there as well.

    • @_WorldOrder
      @_WorldOrder 3 года назад +1

      @@deeplizard thnakyou so much, i'm falling in love with lizard for the first time xD

  • @justchill99902
    @justchill99902 6 лет назад +1

    Really cleared my mind! Thank you :) Keep up the good work.

    • @justchill99902
      @justchill99902 6 лет назад

      I have one question. In Tensorflow's Object detection API they tell us to create a training directory and a test directory and as usual 90-10 distribution. But we gotta label all of them. So this means the test directory in case of Tensorflow's API is actually Validation set right?

    • @deeplizard
      @deeplizard  6 лет назад +1

      Hey Nirbhay - Not necessarily. Sometimes we'll label our test sets so we can see the stats from how well the model predicted on the test data. For example, we may want to plot a confusion matrix with the results from the test set. More on this here: ruclips.net/video/km7pxKy4UHU/видео.html
      If the test set is labeled, we just have to take extra precaution to make sure that the labels are not made available to the model, like they are for training and validations sets.

    • @justchill99902
      @justchill99902 6 лет назад

      Ok I have to dig up more in order to understand it. By the way the page at the weblink you sent ain't available. Could you please post it again? or perhaps the title of the video? Thanks :)

    • @deeplizard
      @deeplizard  6 лет назад

      The ")" was caught on the end of the URL. Here's the link: ruclips.net/video/km7pxKy4UHU/видео.html

  • @DataDrivenDecision
    @DataDrivenDecision 2 года назад

    I have a question? Why we can not use validation approach in normal machine learning? Why we only use it in deep learning problems to prevent overfitting?

  • @radouanebouchou7488
    @radouanebouchou7488 4 года назад +1

    Quality content , thanks

  • @thegirlnextdoor2660
    @thegirlnextdoor2660 4 года назад +1

    Explanation was really good ma'am but the white screen console that you showed could not be read. Please make those contents brighter and in big fonts.

    • @deeplizard
      @deeplizard  4 года назад +1

      Thanks for the feedback, Sayantani. In later videos, I zoom in on the code, so it is much easier to read. Also, note that most videos have corresponding text-based blogs that you can read at deeplizard.com
      The blog for this video can be found at
      deeplizard.com/learn/video/Zi-0rlM4RDs

  • @mauriciourtadoilha9971
    @mauriciourtadoilha9971 5 лет назад +1

    I believe you're wrong about the test set being unlabeled. As far as I remember from Andrew Ng course in Stanford, the training set is used for model tuning for multiple models; the validation set is used for model selection (this is where you compare different models to check which one best performs on data not used for training). Once you choose a definitive model, you still have to check if it generalizes well for data never seen before, that does not carry any bias on model selection. At this point, you don't do any further tuning. Besides, having a labeled test set allows you to define test error. If data are unlabeled, this term doesn't make any sense, does it?

    • @deeplizard
      @deeplizard  5 лет назад

      The test set's labels just cannot be known to the model in the way that the train and validation sets are. So as far as the model knows, the test set is unlabeled. You may have the test labels stored elsewhere though to do your own analysis.

  • @ismailhadjir9703
    @ismailhadjir9703 4 года назад +1

    thank you for this interesting video

  • @bonalareddy5339
    @bonalareddy5339 3 года назад

    I'm kinda confused about this, for a very large dataset - say (10 million records).
    In general in the production environment how will be the train test split would be done to evaluate how our model is working?
    -> I have heard in a few resources that it is okay to split the data into 98% for training , 1% validation (100,000 rows) and 1% testing (100,000 rows). The theory behind this is that 1% of the data is most probably representing the maximum variance in the data.
    -> And some say, we have to split the data more or less 70% for training, 15% for validation and 15% for testing. The theory behind this is that if we have a large data for validation, testing and if it is giving good accuracy on that, then we can say with "confidence" that it would work nearly the same in real time as well.
    If any of this is right or wrong, could you please explain me with a reason.

  • @carlosmontesparra8548
    @carlosmontesparra8548 3 года назад +1

    thanks very much for the videos!! then the labels of the training set and the array of its labels should be ordered to match properly, correct? f.i. if index 0 of the training array is the picture of a cat, then index 0 of label array should be 0 (and 1 if a dog)?

  • @arifurrahaman6493
    @arifurrahaman6493 6 лет назад

    It's indeed helpful and understandable. As I am at beginner level, I wonder if there any way to get the demo code you are using for making these videos. Thanks in advance.

    • @deeplizard
      @deeplizard  6 лет назад

      Thanks, Arifur! Download access to code files and notebooks are available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind

  • @nadaalay4528
    @nadaalay4528 6 лет назад +1

    Thank you for this video. In the results of the example, the validation accuracy is higher than the training accuracy. Is this considered a problem?

    • @deeplizard
      @deeplizard  6 лет назад +1

      Hey Nada - It's typically going to be up to the engineer of the network to determine what is considered acceptable regarding their results. In general, I would say that you typically want your training and validation accuracy to be as close as you can get them to each other. If the validation accuracy is considerably greater than the training accuracy, then you may want to take steps to decrease the difference between those metrics. If the model used in this video was one I planned to deploy to production, for example, then I would take steps to close this gap. This would be considered a problem of underfitting. I talk more about that here: ruclips.net/video/0h8lAm5Ki5g/видео.html

  • @kajalnadhawadekar3326
    @kajalnadhawadekar3326 5 лет назад +1

    @Deeplizard u explained very well...

  • @montassarbendhifallah5253
    @montassarbendhifallah5253 4 года назад

    Hello,
    Thank you for this playlist . It's awesome!
    My question is : In some cases, we don't specify a validation set. Why ? and when is not important to set a validation data ?

    • @nandinisarker6123
      @nandinisarker6123 3 года назад

      This is my question too. Hope someone answers.

    • @montassarbendhifallah5253
      @montassarbendhifallah5253 3 года назад +1

      @@nandinisarker6123 Well, I found these 2 links:
      stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set
      machinelearningmastery.com/difference-test-validation-datasets/

  • @alfianabdulhalin1873
    @alfianabdulhalin1873 5 лет назад +1

    One question. Data in the tet set does have labels, right? But it's not known to the classifier... It's only labeled so that we could calculate all the metrics at the end more easily... right?

    • @deeplizard
      @deeplizard  5 лет назад +1

      Sometimes we'll have labels for the test set, and other times we may not. When we do have the labels, you're correct that the network will not be aware of the labels that correspond to the samples. The network will understand that there are, say 10 different classes, for which it will need to classify the data from the test set, but it will not know the individual labels that correspond to each sample.

  • @neocephalon
    @neocephalon 4 года назад +3

    You're like the 3blue1brown of deep learning! You deserve waaay more subs.
    Maybe if you include tensorflow tutorials in this format you could get a crap ton of more subs because there'll be others out there looking for explanations of how tensorflow works in intuitive ways, who aren't mathematically literate. Key here is to reduce the need for mathematical literacy and make the concepts more intuitive and easier to get into.
    If you were to introduce math literacy needed to explain these concepts, then you'd need to hope that the people who are looking to understand these concepts have figured out that by watching the likes of 3blue1brown (assuming that they've found him in the process of wanting to understand the math (hint: most people don't want to learn the math, they just want to understand the code)).
    So there you have it, a possible method for you to gain more subs :P

    • @deeplizard
      @deeplizard  4 года назад +1

      Hehe thank you, Jennifer! :D

  • @fatihaziane4443
    @fatihaziane4443 5 лет назад +1

    Thanks

  • @boulaawny4420
    @boulaawny4420 6 лет назад +2

    Could you give a real world example of training set and validating set ?!
    kinda like i want to train if it's blue or red flower depending on its height or width ... and i use k-nearest neighbour so what validation set consists of ?

    • @deeplizard
      @deeplizard  6 лет назад

      Hey Andy,
      So, sticking with your example of flowers-- You would start out by gathering the data for red and blue flowers. This data would presumably be numerical data containing the height and width of the flowers, and each sample from your data set would be labeled with "blue" or "red." You would then split this data up into a training set and validation set. A common split is 80% training / 20% validation. You would then train your model on the data in the training set, and validate the model on the data in the validation set.
      Does this example make things more clear?

  • @tvrtkokotromanic9158
    @tvrtkokotromanic9158 4 года назад +2

    Universities are getting obsolete when you have RUclips. I mean I have siriously mearned more from RUclips on machine learnkng and C# coding than from professors at University. Thanks for this great explanation. 🙏

  • @dazzaondmic
    @dazzaondmic 6 лет назад +2

    Thank you for the video. I just have one question. How do we know how well our model is performing on the test set if we don't have labels to tell us the "correct answer" and if we don't even know what the correct answer is ourselves. How do we then know that the model performed well or badly on the test set? Thanks again.

    • @deeplizard
      @deeplizard  6 лет назад +1

      Hey dazzaondmic - You're welcome! If we don't know the labels to the test set ourselves, then the only way we can gauge the performance of the model is based on the metrics observed during training and validating. We won't have a solid way of judging the exact accuracy of the model on the test set. If we have a decently sized validation set though, and the data contained in it is a good representation of what the model will be exposed to in the test set and in the "real world," then that increases confidence in the model's ability to perform well on new, unseen data if the model indeed performs well on the validation data.
      Does this help clarify?

    • @aroonsubway2079
      @aroonsubway2079 2 года назад

      IMO,the test dataset should have labels so that we can at least have some accuracy numbers to look at in the end. The only difference btw validation dataset and test dataset is that, we still have chance to update model based on the validation results by tuning hyperparameters. However, test dataset only provide us a final accuracy number, even it is bad, we won't perform additional training.

  • @OpeLeke
    @OpeLeke 2 года назад

    validation set is used to tweak the hyperparameters of a model.

  • @richarda1630
    @richarda1630 3 года назад

    silly question: can you also pass in pandas df's or is it irrelevant? and numpy is enough?

  • @Normalizing-polyamory
    @Normalizing-polyamory 5 лет назад +1

    If the test set is unlabeled, how can you measure accuracy? How can you know that the model works?

    • @marwanelghitany8875
      @marwanelghitany8875 5 лет назад

      you actually got the labels of your test set, but don't get them through your model, so you wait until model make the prediction, then compare them with the labels which you held back at first, and calculate the accuracy based on how similar the are.

  • @thespam8385
    @thespam8385 4 года назад

    {
    "question": "The test set differs from the train and validation sets by:",
    "choices": [
    "Being applied after training and being unlabeled",
    "Being applied after training and being labeled",
    "Being randomly selected data",
    "Being hand-selected data"
    ],
    "answer": "Being applied after training and being unlabeled",
    "creator": "Chris",
    "creationDate": "2019-12-11T04:29:35.828Z"
    }

    • @deeplizard
      @deeplizard  4 года назад

      Thanks, Chris! Just added your question to deeplizard.com

  • @ranasagar1201
    @ranasagar1201 4 года назад +1

    Mam can just tell me do u have NLP playlist netural language processing

  • @Appletree-db2gh
    @Appletree-db2gh 3 года назад

    why can't you use Test after each epoch of training, since no weight will be updated from it?

  • @LeSandWhich
    @LeSandWhich Год назад

    FIT does NOT have validation ?
    most of the time, people code looks like this:
    clf=svm.SVC.fit( X_train, y_train)
    the validation_set and validation_split are no where to be found, even sklearn doc doesn't mention it.
    What is going on, how come these model don't get overfitting without a validation set ?

  • @sathyakumarn7619
    @sathyakumarn7619 4 года назад +3

    My dear Channel,
    I only wish for you to change the ominous music in the beginning!
    TY :-(

    • @deeplizard
      @deeplizard  4 года назад +1

      Lol it has been changed in later episodes :D

    • @sathyakumarn7619
      @sathyakumarn7619 4 года назад

      @@deeplizard Looking forward! Thanks again for the video

  • @GelsYT
    @GelsYT 5 лет назад

    Thanks, it made my mind clear you deserve a sub. But I have a question , what if we don't train? so it means like the accuracy will drop? like 0%? will give an error? or maybe like training is a must and what a stupid question I am asking hahahaha
    btw learning NLP here using python - nltk

    • @GelsYT
      @GelsYT 5 лет назад

      model? you mean the software or system right?

    • @deeplizard
      @deeplizard  5 лет назад

      If you don't train the model, then it will likely perform no better than chance for the given task. By "model," I mean the neural network.

  • @bobjoe275
    @bobjoe275 4 года назад

    There's an error in using validation_data in model.fit. The format should be a tuple of NumPy arrays, i.e. valid_set = (np.array([0.6,0.5]), np.array([1,1]))

    • @deeplizard
      @deeplizard  4 года назад

      Yes, this is specified in blog for the episode below:
      deeplizard.com/learn/video/dzoh8cfnvnI

  • @bisnar1307
    @bisnar1307 4 года назад

    You are the best :)

  • @mohamamdazhar6813
    @mohamamdazhar6813 4 года назад

    does this work for unsupervised and reinforcement learning?

    • @deeplizard
      @deeplizard  4 года назад +1

      No, RL works differently. Check out our RL course:
      deeplizard.com/learn/playlist/PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv

    • @mohamamdazhar6813
      @mohamamdazhar6813 4 года назад

      @@deeplizard ty

  • @joxa6119
    @joxa6119 Год назад

    Where to get the dataset?

  • @denisutaji2094
    @denisutaji2094 3 года назад

    what is the different about model.evaluate() and model.predict()
    model.predict() got the lower accuracy than model.evaluate()

  • @tusharvatsa5293
    @tusharvatsa5293 4 года назад

    You are fast!

  • @poojakamra1177
    @poojakamra1177 6 лет назад +1

    I have trained and validate data. Now how can i test an image in the model?

    • @deeplizard
      @deeplizard  6 лет назад +1

      Hey Pooja - Check out this video, and let me know if it answers your question.
      ruclips.net/video/bfQBPNDy5EM/видео.html

  • @peaceandlove5855
    @peaceandlove5855 5 лет назад

    how to test accuracy when predicting non-binary output ?
    (as far as i know , they use ''confusion matrix'' when output are binary)

    • @deeplizard
      @deeplizard  5 лет назад

      You can also use a confusion matrix for non-binary output. I show an example of this towards the end of this video: ruclips.net/video/FNqp4ZY0wDY/видео.html

  • @hasnain-khan
    @hasnain-khan 4 года назад

    If i have 1000 rows in dataset. Then how can select 800 rows for training and 200 for testing instead of select randomly in splitting?

    • @deeplizard
      @deeplizard  4 года назад

      Keras can automatically split out a percentage of your training data for validation only.
      More here: deeplizard.com/learn/video/dzoh8cfnvnI

  • @aceespadas
    @aceespadas 5 лет назад

    Using Google Colab on this.
    I've set the same model and hyper-parameters. I've also used the same code to preprocess the data and the same params to train the model (batch_size, Adam's lr, validation_split, epochs) but I'm not getting the same metrics as you while training no matter how much I try.
    the validation accuracy plateaus around 0.75 and the val_loss starts at around 0.68 decreases then starts increasing around the 12th epoch to end around 0.66. This is bugging me and I can't figure it out.
    PS: I also tried with theano as a backend for Teras

    • @deeplizard
      @deeplizard  5 лет назад

      Hey Yassine - Are you using the same data to train your model? This data was created in the Keras series. If you did use the same data, then be sure that you caught the reference in the Keras validation set video to reverse the order of the for-loops that generates the data. Let me know.

    • @aceespadas
      @aceespadas 5 лет назад

      @@deeplizard Thank you for reaching back.
      Yes, I generated the data in the same fashion as you did in the preprocessing data video from the Keras series. I caught the for-loop reverse reference in that same series and after rectifying my code the val loss accuracy and loss behaved normally while fitting. But I'm not sure why the behaviour changed. As you explained, doing the validation split will take a percentage of the training set prior to fitting (I dunno if the validation set is generated after a shuffle or not in this case) and won't regenerate on each shuffle on each epoch. But why is switching the for-loops order mattered? you are still taking the 10 or 20% bottom of your data regardless of the for loop order and you end up with the same validation data on each epoch.
      Also, I used a sigmoid function in the output as you did in this series yet my prediction probabilities don't sum up to 1 as you depicted in the prediction video within the same playlist. Using a softmax function like in Keras API series works fine. It helps if you could clear up this confusion.

    • @deeplizard
      @deeplizard  5 лет назад

      The validation_split parameter takes the last x% of the data in the training set (10% in our example), and doesn't shuffle it. With the way I had the for-loops organized originally, the validation split would completely capture all of the data in the second for loop, which was the 5% of younger individuals who did experience side effects and the 5% of older individuals who did not experience side effects. Therefore, none of the data in the second for-loop would be captured in the training set.
      With the re-ordering of the for-loops, the training set is made up of the data that is now generated in both for-loops.
      Another (better) approach we could have taken is, after generating all the data with both for-loops (regardless of which order the loops are in), we could shuffle all the data completely, and then pass that shuffled data to our model. With shuffled data, the training set and validation sets would be a more accurate depiction of the true underlying data since there would be no real ordering to it. As long as your data in the real world is shuffled before you pass it to your model, you shouldn't experience this problem.

  • @balabodhiy730
    @balabodhiy730 6 лет назад +1

    All fine, but I came to know that the dependent variable are not included in the training set, how is that? Thank you

    • @deeplizard
      @deeplizard  6 лет назад

      Hey balabodhi - I'm not sure what you mean by "dependent variable." Can you please elaborate?

    • @balabodhiy730
      @balabodhiy730 6 лет назад

      For predictic we split a dataset into two sets, one is train set and another is test set. But when we have separate datasets for training and testing given, then do we include the dependent variable (response variable or the Y Variable) in testing dataset. Because in one of my simple Logistic reg analysis, they have given three datasets separately: training, validation and testing. In the testing dataset, I don't have the response variable ie., the Y variable. So this my question, can we test a dataset without the response variable Y.

    • @balabodhiy730
      @balabodhiy730 6 лет назад

      For predictic we split a dataset into two sets, one is train set and another is test set. But when we have separate datasets for training and testing given, then do we include the dependent variable (response variable or the Y Variable) in testing dataset. Because in one of my simple Logistic reg analysis, they have given three datasets separately: training, validation and testing. In the testing dataset, I don't have the response variable ie., the Y variable. So this my question, can we test a dataset without the response variable Y.

    • @deeplizard
      @deeplizard  6 лет назад

      I see. Yes, many times, we don't have the labels for the test data.
      This is completely fine. The labels for training and validation data are required, but labels for test data are not required.

    • @balabodhiy730
      @balabodhiy730 6 лет назад

      ok that's fine, but I couldn't understand what is labels

  • @alexiscruz5854
    @alexiscruz5854 4 года назад +1

    I love your videos

  • @greeshmanthmacherla2105
    @greeshmanthmacherla2105 4 года назад

    what happens if i give same set of images for both validation and training datasets?

    • @deeplizard
      @deeplizard  4 года назад

      You will not be able to identify overfitting or see how your model is generalizing to data it wasn't trained on.

    • @greeshmanthmacherla2105
      @greeshmanthmacherla2105 4 года назад

      @@deeplizard i got an error :"`validation_split` is only supported for Tensors or NumPy " what should I do now?

    • @deeplizard
      @deeplizard  4 года назад

      Either convert your data to a supported data type, or manually create a separate validation set. More details in this blog:
      deeplizard.com/learn/video/dzoh8cfnvnI

  • @marouaomri7807
    @marouaomri7807 4 года назад

    I think you should slow down when you are explaining to let the information sink in :)

    • @deeplizard
      @deeplizard  4 года назад +1

      I have in later videos. In the mean time, each video has a corresponding written blog on deeplizard.com that you can check out for a slower pace :)

    • @marouaomri7807
      @marouaomri7807 4 года назад

      @@deeplizard Thank you so much the course helped me to understand better

  • @amiryavariabdi8962
    @amiryavariabdi8962 3 года назад

    Dear the artificial intelligence community
    I am pleased to introduce DIDA dataset, which is the largest handwritten digit dataset. I will be grateful, if you could help me to introduce this dataset to the community.
    Thanks

  • @literaryartist1
    @literaryartist1 5 лет назад +1

    I'm lost. Are we really talking about weight training or something else?!

  • @appinventorappinventorpak9154
    @appinventorappinventorpak9154 6 лет назад +1

    Please share the code as well

    • @deeplizard
      @deeplizard  6 лет назад

      Hey App Inventor- The code files for this series are available as a perk for the deeplizard hivemind at the following link: www.patreon.com/posts/code-for-deep-19266563
      Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind

    • @appinventorappinventorpak9154
      @appinventorappinventorpak9154 6 лет назад

      thanks alot, this is really excellent tutorial, explained so well in a simple manner

    • @appinventorappinventorpak9154
      @appinventorappinventorpak9154 6 лет назад

      i request you to please provide a tutorial on 3d convolution algorithm, to process the medical image files

    • @deeplizard
      @deeplizard  6 лет назад

      Thank you, I'm glad you're enjoying the videos!
      I'll add 3D convolutions to my list of potential topics to cover in future videos. Thanks for the suggestion.
      In the mean time, I do have a video on CNNs in general below if you've not yet seen that one.
      ruclips.net/video/YRhxdVk_sIs/видео.html

  • @Leon-pn6rb
    @Leon-pn6rb 4 года назад

    1:00 - 1:55 You lost me there. First you said Validation is for HP tuning and then you say that it is Not
    -_- off to another video/article

    • @deeplizard
      @deeplizard  4 года назад +2

      By validating the model against the validation set, we can choose to adjust our hyperparameters based on the validation metrics. The weights of the network, however, are not adjusted during training based on the validation set. (Note that weights are not hyperparameters.) The weights are only adjusted according to the training set. This is what is stated in the video. Hope it's clear now.

  • @radouaneaarbaoui7206
    @radouaneaarbaoui7206 4 года назад

    speaking very fast as we are computers to catch up with the speed.

  • @big-blade
    @big-blade 4 года назад

    why is the music so scary?

  • @TP-gx8qs
    @TP-gx8qs 5 лет назад +5

    Just talk more slowly. I had to put you at 0.75 speed and you sound like you are drunk.

    • @deeplizard
      @deeplizard  5 лет назад +2

      Lol
      The blogs are helpful for slower pac as well:
      deeplizard.com/learn/video/Zi-0rlM4RDs

    • @MrKrasi97
      @MrKrasi97 4 года назад

      haha same issue here

  • @blankslate6393
    @blankslate6393 2 года назад

    The role of validation set in adjusting weights is still unclear to me after listening that part 3 times. So not great explanation. Maybe you need to make a video specific to this.

  • @GQElvie
    @GQElvie 3 года назад

    not helpful at all. why cant anybody just show an EXAMPLE so that we can really wrap our head around this. I have a vague idea of validation. I take it the validation just "updates" because of the new information?? if that is it, then why are there 10 different definitions. what is an example of the validation demonstrating underfitting or overfitting?

  • @MrUsidd
    @MrUsidd 5 лет назад

    Use 0.5x speed.
    Thank me later.

  • @nmcfbrethren1407
    @nmcfbrethren1407 3 года назад

    I think you could redo this video and speak slowly and calmly.

  • @styloline
    @styloline 4 года назад

    wayway

  • @rankzkate
    @rankzkate 4 года назад

    Too fast for me.I kept on rewinding

    • @deeplizard
      @deeplizard  4 года назад

      You can use the corresponding blogs for every video on deeplizard.com to move at a slower pace as well.

    • @deeplizard
      @deeplizard  4 года назад

      deeplizard.com/learn/video/Zi-0rlM4RDs

  • @moyakatewriter
    @moyakatewriter 4 года назад

    Maybe talk slower. It's hard to understand people when they're racing.

  • @OscarRangelMX
    @OscarRangelMX 5 лет назад +1

    man!!! you speak soooo fast it is hard to keep up with the video and what you are saying, great content, but you need to slow down......

  • @ShivamPanchbhai
    @ShivamPanchbhai 3 года назад

    speak slowly

  • @omidasadi2264
    @omidasadi2264 5 лет назад

    too fast and too bad quality about telling concept

  • @534A53
    @534A53 2 года назад

    This is wrong, the test set must also be labelled. Otherwise you cannot evaluate the model at the end. The video should be corrected because it is teaching incorrect information to people.