Everything in life easy if you understand the fundamentals. I am trying my best to communicate the fundamentals so you feel more comfortable with coding. Thanks for watching and you can follow me on Twitter or connect over LinkedIn. (Both: @digitalsreeni)
1. Is this also suitable for large datasets? If so, then why do we even bother using UNet? 2. What are its limitations that are fulfilled by training a UNet model from scratch?
Wow thank you, I am confused, that your masks only have some cells highlighted, as you have shown in the beginning, and in the end the predicted one has more or less segmented all cells of the brain, but in the training there were only a few, I don't understand how the network could learn to classify the others not as background?
I am also surprised by the same thing. It didnt' fit to my knowledge and I feel that I have to do some experiments to update my understanding of ML. :-D
I was a bit confused at 9:11 where you said that opencv is reading images in bgr, and you want to convert them to rgb. But what you are doing in line 40 is converting from rgb to bgr, which is exactly the opposite thing? Am I misunderstanding smth. here?
Thank you so much for your amazing educational channel. In this video,You dropped pixels that had zero label because of fastening the process. But if we dropped background pixels in training,How can the model learn to distinguish background pixels?
In this example pixel value 0 does not represent background, it represents unlabeled pixels so I dropped them as they do not represent any real features (including background) in my image.
Hello Sreeni, I'm a bit confused about your statement that UNets do not work well if you don't have 'tremendous amounts' of training data. However, according to the UNet paper by Ronneberger, UNets are specifically designed to "be trained end-to-end from very few images". In fact, it repeatedly states that this architecture has been created precisely because biomedical tasks have very little training data, the authors intended to tackle this issue of successful training of deep networks requiring many thousand annotated training samples. So, as far as how I understand it, UNets can be used as a way of bypassing the issue of limited training data. Or did I actually completely misunderstand what Ronneberger et al. said in their paper, did I confuse some things there? Do you think there are any contradictions on this matter? Please help me out on this. Thanks in advance!
"UNets do not work well if you don't have 'tremendous amounts' of training data" - Here, tremendous refers to the data size in comparison with the data required for Random Forest. With traditional machine learning, you just need a few scribbles of ground truth from your images. With U-net, you need a lot more than a few scribbles. I did a video on the topic of limited training data for U-net. In the video, I've demonstrated using only 12 images (fully annotated) for U-net segmentation. While we ended up with acceptable results, there was still a lot of room for improvement. In summary, U-net does require a lot more training data than random Forest. And U-net may be efficient with smaller training datasets compared to other semantic segmentation deep learning architectures.
but why are you combining the 8 images together? 8 images of size 1024x996x3 convolved to 1024x96x64. so technically making 8 images of 1024X996 with 64 channel features. so its logical to combine and flat 1024X996. But why would you combine all 8 train images to flattening? i don't understand that, could you explain?
For some reason, my GPU memory get overflowed if I have more than 8 images in my dataset. I have 4GB VRAM GTX 1650Ti and images are reduced to 256x256. I tried using batch size = 1 as well but no difference. Please help.
Thank you so much. I'm getting the following type error on the line "feature = new_model.predict(X_train)" (the shape of my 'X_train' is (163, 450, 300, 3)): "TypeError: tf__predict_function() missing 19 required positional arguments: 'x', 'y', 'batch_size', 'epochs', 'verbose', 'callbacks', 'validation_split', 'validation_data', 'shuffle', 'class_weight', 'sample_weight', 'initial_epoch', 'steps_per_epoch', 'validation_steps', 'validation_batch_size', 'validation_freq', 'max_queue_size', 'workers', and 'use_multiprocessing'". Would you perhaps know what my problem is? Thank you.
Hi , thank you for sharing this video. Can we use this model *vgg16+RF) for rgb pictures? I think in this model we could not do Upsampling and transpose convolution, Am i right?
Thank you so much. Is the number of masks necessary have to represent each image in the training set. Or does the number of masks have to be the same as that of the training samples
Dear Sir, All very nice and amazing lesson. Thank you so much. I would like to know in tutorial 91 you mention that up to the 'block5_conv3' the CNN almost learn about the image how to classify. And in particular this tutorial you took up to "'block1_conv2" so far my question is: 1. Can this amount of feature is really enough to classify a new image? 2. And, If I want to take more features probably up to 'block5_conv3' how can I create the data set? TIA Isalm
block1_conv2 in VGG16 gives 64 features by keeping my original image size the same. So I don't have to do any additional reshaping of my image arrays. This is why I picked that block. 64 filters (features) are more than enough, in my experience. You can try deeper features but you need to reshape arrays to match input shape.
Hi Sreeni, thank you for sharing this video. I was wondering if I have a picture that has more than 3 channels, is there any way I can get pre-trained weight? If not, what may be the best way to extract features. Can you please help me? Thank you!
Hi Sir when you use this pre-trained model, you need to pre-process the images in the same way that images were pre-processed to train the VGG16. In this way, why did you not perform this in your tutorial? it is necessary ? Thank you in advance.
You don't have to preprocess images the same way as the original model unless you are using the original model in its entirety for prediction. Here, I am just using pre-trained weights as feature extractors so it does not matter whether I scale or normalize or follow my own pre-processing steps.
Thanks for this informative content. Can we use any other label/ image format other than tiff, in this code? My dataset is not microscopy related but i tried to label using apeer tool, but failed to do it. So i was wondering if i could use any other tool like labelme to annotate my data and then use your code for semantic segmentation. Loooking forward for your reply. Thanks in advance!
Hello sir, I wanted to ask if this combination of VGG16 and Random Forest could be used for road segmentation. I have satellite images and masks for where the road are located. Could it work ? Thank you !
Just use top_k_categorical_accuracy from tensorflow or keras. Link: www.tensorflow.org/api_docs/python/tf/keras/metrics/top_k_categorical_accuracy I will probably record a video bt here is how to implement it... from keras.metrics import top_k_categorical_accuracy def top_5_categorical_accuracy(y_true, y_pred): return top_k_categorical_accuracy(y_true, y_pred, k=5) #Add this as a metric to track during training model.compile(optimizer = 'rmsprop',loss = 'categorical_crossentropy', metrics = ['accuracy', top_5_categorical_accuracy]) When I run it on my system I see this during training... Epoch 1/2 386/1000 [==========>...................] - ETA: 2:37 - loss: 1.8852 - accuracy: 0.3315 - top_5_categorical_accuracy: 0.8160
at line 108 in your code i get this error message "ValueError: Length of values (0) does not match length of index (12238848)" i am trying to get rid of it for some time now, do u think you can help?
Sounds like an issue with reading your masks. Please make sue they are being properly read with the right dimensions. Check whether you see what you expect in the output from the previous line where you print out the unique pixel values.
sir can I do this for landslides prediction ? based on remote sensing images. I have 256 landslides points and my predictors factors are 15 and want to predict landslides hazards zones
I am not familiar with your application but if you want to segment pixels in an image to display regions of specific interest (e.g. landslide prone) then this approach may work for you.
Transfer learning is amazing, thank u for explanation, I wonder if I can run that on CPU with a fair speed for video processing. A dedicated video for using 'Apeer' would be very nice. If u plan to do, please use most common images as example, like cars and airplanes, cats and dogs, nuts and screws etc. so it would be easier for us run the same procedures as shown in the video :)
I did the code in my video on CPU so I hope it works even for for video processing. There are a lot of videos on APEER on its channel, just look for apeer_micro on RUclips.
Machine learning demands a lot of resources and unfortunately the only way is to find a way to get additional resources. Saying that I wonder why you need 1000 images? Try 100 images first and see how the result looks, if it is not good then increase a bit more. I never ever used 1000 images for semantic segmentation, that is a lot of data and may not be needed to begin with. I got excellent results with 10 images, each 1kx1k size.
Great video and content. How can we train model from scratch or fine tuned DL models and extract features to pass traditional ml model for semantic segmentation? Can you make video on that, sometime transfer learning might be not perform well on medical images. Thanks in advance
If you want to train a model from scratch for semantic segmentation then please my watch my Unet videos. Also, watch my videos on traditional segmentation, videos 67 and 67b. Training your own neural network and using it for feature generator doesn't make sense as VGG16 and others spent thousands of hours doing the same on many images. Of course, you can always train a network yourself, save the model and follow the process from this tutorial.
For 3d images you can consider them as a stack of 2d images. That way you can process one 2d image at a time and put them into a stack. If you have features in the 3rd dimension where having the extra dimension helps then you need to consider using 3d kernels and computation will be very heavy. Keras already has 3d conv that you can use out of the box. from keras.layers import Conv3D
Hi Sreeni!Thanks for sharing this video you teach better than my college Professors. I love to learn from you. How to use SVM in this case as of now you used random forest. Can I use Pretrained CNN (VGG16 - imagenet) for the classification of Microscopic images.
Syed, thanks for your complements. For SVM just swap Random Forest with SVM, that easy!!! Yes, of course this process can be used for microscope images. Pretrained CNN (VGG16) is trained to 'understand' various image features. With this approach we are just using those as feature generators (digital filters).
@@DigitalSreeni Thank you. So I can use the same model (Pretrained CNN (VGG16 - imagenet)) for feature generator and for classification I can use SVM. Please correct me if I am wrong. Also please let me know how to generate a graph for accuracy comparison between the training and validation of the model that is Epoch Accuracy and Epoch Loss
@@DigitalSreeni Hi Sreeni Sir,Will you Please let me know how to use vggnet on this image dataset www.kaggle.com/c/recursion-cellular-image-classification/data and classify the cells.Can you please make a tutorial.I would be very grateful to you.
It doesn't matter what data you have, this approach should work. So please label your own images and try it yourself. If you have natural scenes with a lot of information in the scene then this approach may not be ideal. You need full deep learning approach (e.g. U-Net).
Thanks for the tip. I was considering it and thinking about naming the channel same as my social media profile name. Looks like a lot of people think this is only for microscopy related topics.
Thanks a lot for your great courses, is it possible for you to explain my question? How should we add non-image features to our CNN model (features like object prices) to our flatten layer? Does the CNN model new added features belong to which input image?
The greatest teaching skills in one guy. Many thanks!
My pleasure!
Sir, you are providing real training with coding makes your video exceptional
Thanks a ton
Just found your channel and must say I am enjoying the way you make it easy. I will like to be in your network so I can share ideas with you.
Everything in life easy if you understand the fundamentals. I am trying my best to communicate the fundamentals so you feel more comfortable with coding. Thanks for watching and you can follow me on Twitter or connect over LinkedIn. (Both: @digitalsreeni)
sir awsm videos all are .........I am suggested many of my frns to go with Digital Sreeni.
Thanks
Thankyou so much sir you're the best
Most welcome
1. Is this also suitable for large datasets? If so, then why do we even bother using UNet?
2. What are its limitations that are fulfilled by training a UNet model from scratch?
So unique , thanks sir so much , huge respect n support
Most welcome
This is gold. Really.
NVIDIA Rapids framework provides CuML for GPU accelerated machine learning algorithms. CuDF is the GPU implementation of Pandas ( dataframes ).
thank you so much Sreeni😇
Wow thank you, I am confused, that your masks only have some cells highlighted, as you have shown in the beginning, and in the end the predicted one has more or less segmented all cells of the brain, but in the training there were only a few, I don't understand how the network could learn to classify the others not as background?
This is what machine learning is - you train a machine to do tasks by learning from a few examples and then extending the learning to other cases.
I am also surprised by the same thing. It didnt' fit to my knowledge and I feel that I have to do some experiments to update my understanding of ML. :-D
interesting
but there is error on features=new_model.predict(X_train)
KeyError: 'pop from an empty set' what could be ?
It says 'empty set' - looks like there is an issue reading images or masks.
I was a bit confused at 9:11 where you said that opencv is reading images in bgr, and you want to convert them to rgb. But what you are doing in line 40 is converting from rgb to bgr, which is exactly the opposite thing? Am I misunderstanding smth. here?
Cv2 swaps rgb to bgr when reading inages. RGB2BGR is the same as BGR2RGB.
God bless you man
Thank you so much for your amazing educational channel.
In this video,You dropped pixels that had zero label because of fastening the process.
But if we dropped background pixels in training,How can the model learn to distinguish background pixels?
In this example pixel value 0 does not represent background, it represents unlabeled pixels so I dropped them as they do not represent any real features (including background) in my image.
Hello Mr. Sreeni. when I changed images and run it. this error show ValueError: Length of values (3059712) does not match length of index (1019904)
This video is extremely helpful for the beginner! Thank you very much :)
Hello Sreeni,
I'm a bit confused about your statement that UNets do not work well if you don't have 'tremendous amounts' of training data.
However, according to the UNet paper by Ronneberger, UNets are specifically designed to "be trained end-to-end from very
few images". In fact, it repeatedly states that this architecture has been created precisely because biomedical tasks have very little training data, the authors intended to tackle this issue of successful training of deep networks requiring many thousand annotated training samples.
So, as far as how I understand it, UNets can be used as a way of bypassing the issue of limited training data. Or did I actually completely misunderstand what Ronneberger et al. said in their paper, did I confuse some things there? Do you think there are any contradictions on this matter? Please help me out on this. Thanks in advance!
"UNets do not work well if you don't have 'tremendous amounts' of training data" - Here, tremendous refers to the data size in comparison with the data required for Random Forest. With traditional machine learning, you just need a few scribbles of ground truth from your images. With U-net, you need a lot more than a few scribbles. I did a video on the topic of limited training data for U-net. In the video, I've demonstrated using only 12 images (fully annotated) for U-net segmentation. While we ended up with acceptable results, there was still a lot of room for improvement. In summary, U-net does require a lot more training data than random Forest. And U-net may be efficient with smaller training datasets compared to other semantic segmentation deep learning architectures.
but why are you combining the 8 images together? 8 images of size 1024x996x3 convolved to 1024x96x64. so technically making 8 images of 1024X996 with 64 channel features. so its logical to combine and flat 1024X996. But why would you combine all 8 train images to flattening? i don't understand that, could you explain?
For some reason, my GPU memory get overflowed if I have more than 8 images in my dataset. I have 4GB VRAM GTX 1650Ti and images are reduced to 256x256. I tried using batch size = 1 as well but no difference. Please help.
Dr Sreeni good day sir. Thank you for the video. please I have a question. what effect does convolutional filter size has on SVM. Thanks
Thank you so much. I'm getting the following type error on the line "feature = new_model.predict(X_train)" (the shape of my 'X_train' is (163, 450, 300, 3)): "TypeError: tf__predict_function() missing 19 required positional arguments: 'x', 'y', 'batch_size', 'epochs', 'verbose', 'callbacks', 'validation_split', 'validation_data', 'shuffle', 'class_weight', 'sample_weight', 'initial_epoch', 'steps_per_epoch', 'validation_steps', 'validation_batch_size', 'validation_freq', 'max_queue_size', 'workers', and 'use_multiprocessing'". Would you perhaps know what my problem is? Thank you.
There are 10k videos of echoNet dataset. How can I segment so many images?
Would you please provide a training video about apeer, this tool? Thank you.
Excellent video. I wonder if there is a way to quantitatively evaluate the model accuracy for image segmentation as the case for classification.
Yes, use IOU for evaluating semantic segmentation. I’ll try to make a video on this topic.
Hi , thank you for sharing this video. Can we use this model *vgg16+RF) for rgb pictures? I think in this model we could not do Upsampling and transpose convolution, Am i right?
Thank you so much. Is the number of masks necessary have to represent each image in the training set. Or does the number of masks have to be the same as that of the training samples
Every image must have a corresponding labeled mask. So yes, the number of images and masks need to be the same.
You are greaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaat
Lots of love from US to Germany. I miss travel, otherwise I'd be at Hofbräuhaus in München by now....
@@DigitalSreeni Christmas market is missing you already :D
Dear Sir,
All very nice and amazing lesson. Thank you so much.
I would like to know in tutorial 91 you mention that up to the 'block5_conv3' the CNN almost learn about the image how to classify. And in particular this tutorial you took up to "'block1_conv2" so far my question is: 1. Can this amount of feature is really enough to classify a new image? 2. And, If I want to take more features probably up to 'block5_conv3' how can I create the data set?
TIA
Isalm
block1_conv2 in VGG16 gives 64 features by keeping my original image size the same. So I don't have to do any additional reshaping of my image arrays. This is why I picked that block. 64 filters (features) are more than enough, in my experience. You can try deeper features but you need to reshape arrays to match input shape.
@@DigitalSreeniDo you have any example to reshape this?
I just want to know how to find out the accuracy for this model and to plot it .... will you please help me to do this??
Hi Sreeni, thank you for sharing this video. I was wondering if I have a picture that has more than 3 channels, is there any way I can get pre-trained weight? If not, what may be the best way to extract features. Can you please help me? Thank you!
Can we use this approach using few layers in pretrained model and built multi classifier?
Hi Sir when you use this pre-trained model, you need to pre-process the images in the same way that images were pre-processed to train the VGG16. In this way, why did you not perform this in your tutorial? it is necessary ? Thank you in advance.
You don't have to preprocess images the same way as the original model unless you are using the original model in its entirety for prediction. Here, I am just using pre-trained weights as feature extractors so it does not matter whether I scale or normalize or follow my own pre-processing steps.
Thanks for this informative content.
Can we use any other label/ image format other than tiff, in this code? My dataset is not microscopy related but i tried to label using apeer tool, but failed to do it. So i was wondering if i could use any other tool like labelme to annotate my data and then use your code for semantic segmentation. Loooking forward for your reply. Thanks in advance!
Hello sir, I wanted to ask if this combination of VGG16 and Random Forest could be used for road segmentation. I have satellite images and masks for where the road are located. Could it work ?
Thank you !
Yes, absolutely. I see no reason why it would not work.
@@DigitalSreeni Our dataset is (300, 608, 608, 3), thus resulting in a out of memory problem... Maybe I will try with batches. Thank you
Hi,
Really appreciate your videos which is very helpful for beginners. I would like to know if an array of float64 can be given as input to vgg16?
Yes.
i just want know that how to drop more lable from dataframe
Thank you so much for your great efforts.
Please how one can get top-5 accuracy of a classifier (e.g. RF or SVM)
Just use top_k_categorical_accuracy from tensorflow or keras.
Link: www.tensorflow.org/api_docs/python/tf/keras/metrics/top_k_categorical_accuracy
I will probably record a video bt here is how to implement it...
from keras.metrics import top_k_categorical_accuracy
def top_5_categorical_accuracy(y_true, y_pred):
return top_k_categorical_accuracy(y_true, y_pred, k=5)
#Add this as a metric to track during training
model.compile(optimizer = 'rmsprop',loss = 'categorical_crossentropy', metrics = ['accuracy', top_5_categorical_accuracy])
When I run it on my system I see this during training...
Epoch 1/2
386/1000 [==========>...................] - ETA: 2:37 - loss: 1.8852 - accuracy: 0.3315 - top_5_categorical_accuracy: 0.8160
at line 108 in your code i get this error message "ValueError: Length of values (0) does not match length of index (12238848)" i am trying to get rid of it for some time now, do u think you can help?
Sounds like an issue with reading your masks. Please make sue they are being properly read with the right dimensions. Check whether you see what you expect in the output from the previous line where you print out the unique pixel values.
@@DigitalSreeni You were perfectly right. Thank you!
I’m glad it helped 😌
sir can I do this for landslides prediction ? based on remote sensing images.
I have 256 landslides points and my predictors factors are 15 and want to predict landslides hazards zones
I am not familiar with your application but if you want to segment pixels in an image to display regions of specific interest (e.g. landslide prone) then this approach may work for you.
thank you sir I will try it
Transfer learning is amazing, thank u for explanation, I wonder if I can run that on CPU with a fair speed for video processing.
A dedicated video for using 'Apeer' would be very nice. If u plan to do, please use most common images as example, like cars and airplanes, cats and dogs, nuts and screws etc. so it would be easier for us run the same procedures as shown in the video :)
I did the code in my video on CPU so I hope it works even for for video processing. There are a lot of videos on APEER on its channel, just look for apeer_micro on RUclips.
Thank you so much for this excellent video! It would be great if you could add an "evaluation modul" at the end of your code. :-)
Evaluation for semantic segmentation using model.accuracy is useless, you need IoU. I will record a video on that topic soon.
Sir, I was trying this on 1000 images in google colab but RAM is getting exhausted. How can I resolve it?
Machine learning demands a lot of resources and unfortunately the only way is to find a way to get additional resources. Saying that I wonder why you need 1000 images? Try 100 images first and see how the result looks, if it is not good then increase a bit more. I never ever used 1000 images for semantic segmentation, that is a lot of data and may not be needed to begin with.
I got excellent results with 10 images, each 1kx1k size.
Where can we find the dataset?
can you make some examples on Hyperspectral images?
I will try.
Great video and content. How can we train model from scratch or fine tuned DL models and extract features to pass traditional ml model for semantic segmentation? Can you make video on that, sometime transfer learning might be not perform well on medical images. Thanks in advance
If you want to train a model from scratch for semantic segmentation then please my watch my Unet videos. Also, watch my videos on traditional segmentation, videos 67 and 67b. Training your own neural network and using it for feature generator doesn't make sense as VGG16 and others spent thousands of hours doing the same on many images. Of course, you can always train a network yourself, save the model and follow the process from this tutorial.
Sir great videos series, sir how to image segmentation for 3d images or you suggest some guids to follow
For 3d images you can consider them as a stack of 2d images. That way you can process one 2d image at a time and put them into a stack. If you have features in the 3rd dimension where having the extra dimension helps then you need to consider using 3d kernels and computation will be very heavy. Keras already has 3d conv that you can use out of the box.
from keras.layers import Conv3D
Hi Sreeni!Thanks for sharing this video you teach better than my college Professors. I love to learn from you. How to use SVM in this case as of now you used random forest. Can I use Pretrained CNN (VGG16 - imagenet) for the classification of Microscopic images.
Syed, thanks for your complements. For SVM just swap Random Forest with SVM, that easy!!!
Yes, of course this process can be used for microscope images. Pretrained CNN (VGG16) is trained to 'understand' various image features. With this approach we are just using those as feature generators (digital filters).
@@DigitalSreeni Thank you. So I can use the same model (Pretrained CNN (VGG16 - imagenet)) for feature generator and for classification I can use SVM. Please correct me if I am wrong. Also please let me know how to generate a graph for accuracy comparison between the training
and validation of the model that is Epoch Accuracy and Epoch Loss
@@DigitalSreeni Hi Sreeni Sir,Will you Please let me know how to use vggnet on this image dataset www.kaggle.com/c/recursion-cellular-image-classification/data and classify the cells.Can you please make a tutorial.I would be very grateful to you.
Would you recommend me some guidance or blog videos
Well, my videos should be useful :)
Otherwise, just search for content on RUclips. For most people it is easy to learn via videos than reading text.
this model🤔 outperform unet or not?
Please use this approach if you have limited training images. If you have lots of training data, I recommend trying U-net.
can I detect a person with this method?
Can you make this tutorial with non-microscopies dataset?
It doesn't matter what data you have, this approach should work. So please label your own images and try it yourself. If you have natural scenes with a lot of information in the scene then this approach may not be ideal. You need full deep learning approach (e.g. U-Net).
I would suggest to change your channel name. I think low viewers are one reason for this. Your videos are gold btw.
Thanks for the tip. I was considering it and thinking about naming the channel same as my social media profile name. Looks like a lot of people think this is only for microscopy related topics.
Thanks a lot for your great courses, is it possible for you to explain my question? How should we add non-image features to our CNN model (features like object prices) to our flatten layer? Does the CNN model new added features belong to which input image?
*Pooing*