Great video, thank you! It would be interesting to know how to relate SAM to other models for additional classification! Could you possibly make a video about it?
Great video as always. I think the function to find bboxes might be improved to take care of the fact that you might have multiple objects in a patch (I guess you could do a simple watershed and then find min and max for each instance). Also I'm wondering if you could improve results by adding some heuristics to how you choose your grid points, for instance concentrating points in darker areas in this case?
Thank you for the video, your videos are always helpful! I'm facing this error and can't find a solution. In block 16, when accessing 'train_dataset[0]', I encounter the error: 'ValueError: Unsupported number of image dimensions: 2'. Skipping the block doesn't help as the same error occurs during training. I've searched online but couldn't find anything useful. I'm using Google Colab and these library versions: transformers 4.39.0.dev0, torch 2.1.0+cu121, datasets 2.18.0. I would greatly appreciate it if you could help me solve this problem. Thanks in advance.
Yes I figured it out. The solution was to simply convert the grayscale images to RGB images by reshaping their arrays. The masks still need to stay as grey scale though.
def __getitem__(self, idx): item = self.dataset[idx] image = item["image"] image = np.array(image) # Check if the image is grayscale and convert it to RGB if image.ndim == 2: # Image is grayscale image = np.expand_dims(image, axis=-1) # Expand dimensions to (H, W, 1) image = np.repeat(image, 3, axis=2) # Repeat the grayscale values across the new channel dimension ground_truth_mask = np.array(item["label"]) # Get bounding box prompt prompt = get_bounding_box(ground_truth_mask) # Prepare image and prompt for the model inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt") # Remove batch dimension which the processor adds by default inputs = {k: v.squeeze(0) for k, v in inputs.items()} # Add ground truth segmentation inputs["ground_truth_mask"] = ground_truth_mask return inputs Here is the code for it. This works for me. I hope it will work for you as well.
Great video. If we have multiple objects in an image that we want to fine tune, should we create one mask for each image with all objects masked and having like multiple bboxes , or a separate mask for each object in the same image?
I had the same problem, i solved this by pairing the image with the bounding box and then the mask corresponding to that bounding box as one training sample this way you can have the same image in different training samples but what differs is the bounding box and the ground truth mask. Hope it helps
Hello Sir! I want to fine-tune my satellite datasets to delineate crop field parcels. But I am confused how to prepare masks for them. I want each crop parcel has different number (like instance segmentation). But it seems this tutorial provide for binary segmentation. How to solve this issue? Can you give me some advice to prepare masks datasets?
When changing patch_size from 256 to 512 and step size from 256 to 512 I get this error: "Error: AssertionError: ground truth has different shape (torch.Size([2, 1, 512, 512])) from input (torch.Size([2, 1, 256, 256]))" Why is this?
There is a part in the image processor class of the 'from transformers import SamProcessor' where it calls a function, and it is stated that the default maximum patch size is 256x256. It took a couple of hours to realize, and I hope it will help somebody. I encourage everyone who wants to understand the code to check the code libraries
@@carlosjarrin3170 is there any chance to use a bigger patch size or is fine- tuning SAM only possible with 256x256? Maybe by using another image processor?
@@FelixWei-rn4bt I tried to scale the predicted_masks. And it worked for me. Try this: predicted_masks = outputs.pred_masks.squeeze(1) gt_shape = (640, 640) # the shape of your patch interpolated_mask = F.interpolate(predicted_masks, gt_shape, mode="bilinear", align_corners=False) predicted_masks = interpolated_mask.float()
@@carlosjarrin3170 Is there any way to fix it? because I have dataset with all images of dimension 64x273 so I did not make patches of the images. and because of this size problem I am not able to train SAM
Hey man, nice job, u e amazing like a what. I have got a problem in 26:00 min in video, in that 'example' i have an error that says, if anyone can help me, i really appreciate that. this is the last part of ERROR: ...raise ValueError(f"Unsupported number of image dimensions: {image.ndim}") ValueError: Unsupported number of image dimensions: 2
i have the same problem... i wish he did this on spyder ide so we could see the variable explorer. i need to see the dimensions of the input images and masks (hope he can give an answer soon)
@@lee-ちゃん The data that returns, is a dic that has 2 keys. also we can use '.dataset' whit that, but i don't really know what i gonna do, also in 2 or 3 lines later, we have this bunch of the code : "batch = next(iter(train_dataloader))" also with same error. hope someone help...
I was going through the same problem of drop_last=True. This is simply because if the last batch in your dataset contains only 1 training sample, you will get this error since batch normalization can be applied to one training sample. For instance, if the batch size is 2, and your training dataset is 101, in this case, you have 51 batches, the last batch contains only one training sample, and this absolutely will throw an error. You can generate this error and comment right here.
Hey there! Great work. I came across this video while researching about Segmentation using Transformers. However, on my dataset I am facing a problem. In the cell train_dataset = SAMDataset(dataset=dataset, processor=processor) example = train_dataset[0] for k,v in example.items(): print(k,v.shape) I am getting an error which says Unsupported number of image dimensions: 2. I am using grayscale images here and have tried expanding the dimension of the images while reading it, only to give the same error. If anyone has any suggestion or is aware of some update I have missed, then please go on ahead and educate me :). Am in dire need of some help. Thanks.
Hi, good content. How can we train overlapping case? Train with one box and it's segment mask at a time? Or can we train with all boxes at a time utilising three output channels?
Thanks for sharing the video! At 1:44, you mention SAM is designed to take text prompt describing what should be segmented. I am not sure that is the case, can you explain how?
Its called langsam. You can find it by search for segment-geospatial. I think it works by using a combination of object-detection and segmentation. The object detection is done with Grounding Dino, which return a bunch of bounding boxes. The object inside these bounding boxes are then segmented using SAM.
Hi i have used your code in order to fine tune sam in order to segment aerial images , but when i use my finetunedsam.pth it doesn’t even segment the images that it used to segment with no finetuning, what do you think is the problem ? Thank you in advance !!
if we already have prompt(mask) for test image as an input, why we use SAM to get the mask ? I mean - we already have an answer, how using SAM will help us?
can someone please explain to me how can i use this model in the same context but with multiple classes i'm trying to train a sam mode on the fickr material dataset so that it detects materials composing objects
Thanks for the video! I am getting the error "ValueError: Unsupported number of image dimensions: 2" in the SAMDataset, and I am strugling to fix it. Anyone with similar error?
I guess you are working a gray image and SAM expects a color image with 3 channels. If this is the case, you can copy your array twice to create an array with shape (x, y, 3) instead of just (x,y).
Hello Sreeni, first of I really enjoy your videos and they are really awesome. I was trying to re-run the code you have but I am facing to an issue on the line where you have example = train_dataset[0]. I get the following error: ValueError: Unsupported number of image dimensions: 2. is there any package I am missing? your help would be appreciated.
Great video, and great instructor. However... This get_bounding_box is not very good for multiple objects. Furthermore, I could not make it work for more than one bounding box as a prompt. Do you have an idea how to generalize it?
Thank you so much for this incredible and praactical video. Is there a way to segment multiple different objects within the same model or does it need to be two separate? For example if i wanted to segment both mitochondria and lysosomes (and train a model to recognizes BOTH those things but as different things). would i need a separate SAM for mito vs lysosomes? Is there a way to do it that would be combined?
Thanks for this amazing share. Is there any possibility SAM output the label associated with predicted mask in order to know the name of the instance segmented using SAM please? Thanks in advance
This is great thanks a lot ! However, since you deleted the images with empty masks, this means that this can work only for images where there are mitochondria. Could this be extended so that the model returns an empty mask when there is no mito ? (or other things for other applications)
can we train sam on custom image size? I have a dataset that has an image size of 128x128 and I am unable to figure out how to train the model. any help would be appreciated.
SAM was originally trained on 1024x1024 images. It uses a ViT (Vision Transformer) backbone that expects this input size. Training directly on 128x128 images is challenging because SAM's architecture is designed for larger images. The model's receptive field and positional encodings are tailored for 1024x1024 inputs. You could upsample your 128x128 images to 1024x1024 before feeding them into SAM.
Hi! This was great - thank you very much for the tutorial! I was also trying to extend your work and work with the RGB rather than single-channel ones. I adjusted the code to deal with the RG images; however, I don't think I have it right for the loss calculations since I am getting a huuuge negative loss value. I was wondering if you have attempted to work with the RGB images as well?
Thanks for the great video. I am getting this error: AssertionError: ground truth has different shape (torch.Size([1, 1, 1024, 1024])) from input (torch.Size([1, 1, 256, 256])). Does anyone know how to solve it without using interpolation?”
Good day Sir please is it possible to us the SamautomaticMaskgenerator with fine tuned model please how can we generate the mask in the same way SamautomaticMaskgenerator works.
Great work, but I have some trouble. Instead of the example images you provided, I have used mine which are 200x200. However, I have encountered two problems: - The images have to be in grayscale if they are RGB the program stops working in "batch = next(iter(train_dataloader))" - The images have to be 256x256. If I use my 200x200 grayscale images it crashes when training, more specifically when calculating the loss. It says that the ground truth is 200x200, and the prediction is 256x256. Do you know how I can fix this problem?
@@DDDOOO-r9e you should definitely be able to, I haven't tried honestly, you'll probably simply need to take into account the different shape of the image (e.g. (3,256,256) instead of (256,256)). But also, it depends what you want to do (e.g. do you need segmenting the three channels together or separately?)
Hello DigitalSreeni, thank you for this tutorial. I'm getting an error and it's driving me crazy, because I am running your notebook and the same dataset. Everything runs fine, getting exactly the same results, up to the moment where we check an example from the dataset: example = train_dataset[0] for k,v in example.items(): print(k,v.shape) I am getting the following error (Unsupported number of image dimensions: 2): ValueError Traceback (most recent call last) Cell In[17], line 1 ----> 1 example = train_dataset[0] 2 for k,v in example.items(): 3 print(k,v.shape) Cell In[14], line 24 21 prompt = get_bounding_box(ground_truth_mask) 23 # prepare image and prompt for the model ---> 24 inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt") 26 # remove batch dimension which the processor adds by default 27 inputs = {k:v.squeeze(0) for k,v in inputs.items()} File c:\Users\F72070\Document\FC20-dipnn-sot\env_fc20\Lib\site-packages\transformers\models\sam\processing_sam.py:71, in SamProcessor.__call__(self, images, segmentation_maps, input_points, input_labels, input_boxes, return_tensors, **kwargs) 57 def __call__( 58 self, 59 images=None, (...) 65 **kwargs, 66 ) -> BatchEncoding: 67 """ 68 This method uses [`SamImageProcessor.__call__`] method to prepare image(s) for the model. It also prepares 2D 69 points and bounding boxes for the model if they are provided. 70 """ ... --> 200 raise ValueError(f"Unsupported number of image dimensions: {image.ndim}") 202 if image.shape[first_dim] in num_channels: 203 return ChannelDimension.FIRST ValueError: Unsupported number of image dimensions: 2 Any ideas or suggestions would be very appreciated!
Try this: image = np.expand_dims(image, axis=-1) # Add channel dimension image = np.repeat(image, 3, axis=-1) # Repeat grayscale channel to create 3 channels The SAM Processor expects to get 3 input channels. Adding these above two lines of code to the __getitem__ method in the SAMDataset class should solve this issue. See the full example below ####################################################### from torch.utils.data import Dataset class SAMDataset(Dataset): """ This class is used to create a dataset that serves input images and masks. It takes a dataset and a processor as input and overrides the __len__ and __getitem__ methods of the Dataset class. """ def __init__(self, dataset, processor): self.dataset = dataset self.processor = processor def __len__(self): return len(self.dataset) def __getitem__(self, idx): item = self.dataset[idx] image = item["image"] image = np.expand_dims(image, axis=-1) # Add channel dimension image = np.repeat(image, 3, axis=-1) # Repeat grayscale channel to create 3 channels ground_truth_mask = np.array(item["label"]) # get bounding box prompt prompt = get_bounding_box(ground_truth_mask) # prepare image and prompt for the model inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt") # remove batch dimension which the processor adds by default inputs = {k:v.squeeze(0) for k,v in inputs.items()} # add ground truth segmentation inputs["ground_truth_mask"] = ground_truth_mask return inputs
Great tutorial as always Sreeni, thank you, There is a project called medical SAM, that is already custom training with thousands of medical images, to check it out. In social media you have mentioned a tutorial to pass from binary image to polygon masks. Is there any resource that I can base myself on to do this process?
Converting annotations will be my focus for the next video - hoping to release it on Sep 20th. I need to collect my code from different projects and put it together into a single video tutorial. Please stay tuned :)
predicted_masks = outputs.pred_masks.squeeze(1) ground_truth_masks = batch["ground_truth_mask"].float().to(device) loss = seg_loss(predicted_masks, ground_truth_masks.unsqueeze(1)) can you explain the output shapes and why ground_truth masks are unsqueezed?
Hi sreeni, great video it is very helpful for me. i was trying to fine tune model for my own custom data but it has 3 channels. while preparing Pytorch custom dataset i had error like "ValueError: zero-size array to reduction operation minimum which has no identity". can you help me to sort out this issue?
This error probably refers to one of your training masks being blank. Try to sort your masks so you only use the ones where you have some information, otherwise the tensor would be empty.
Hi sreeni Thanks for your reply. I have trained SAM model for RGB image but prediction result was empty . can you please tell me what could be wrong? @@DigitalSreeni
I am trying this tutorial on Breast-Ultrasound-Images-Dataset on Kaggle, I get the same error message during creating a DataLoader instance. When I try to convert to mask into np.array to get the ground_truth_seg, np_unique(ground_truth_seg) does not output array([0, 1], dtype=int32). Instead it outputs an array of bunch of numbers and dtype is. uint8 instead.
@@DigitalSreeni Thank you! Yes I was getting the same error as I mentioned before and it was because of the blank masks. I filtered them and the error went away.
What if you have bigger objects than mitochondria so that the patches of 256x256 are to small? In this video (video 206) ruclips.net/video/LM9yisNYfyw/видео.html you say that patches should be at least 4 times bigger than the objects. But what if the object is big and I try to change patch size from 256 to e.g. 512 in your colab script I get this error: "Error: AssertionError: ground truth has different shape (torch.Size([2, 1, 512, 512])) from input (torch.Size([2, 1, 256, 256]))"
great video, Now we are waiting for SAM2 using custom data
Great video, thank you! It would be interesting to know how to relate SAM to other models for additional classification! Could you possibly make a video about it?
Awesome. Thanks for this detailed explanation. It helped me a lot as a starter practitioner of SAM.
Glad it helped!
Great video as always. I think the function to find bboxes might be improved to take care of the fact that you might have multiple objects in a patch (I guess you could do a simple watershed and then find min and max for each instance). Also I'm wondering if you could improve results by adding some heuristics to how you choose your grid points, for instance concentrating points in darker areas in this case?
Thank you for the video, your videos are always helpful! I'm facing this error and can't find a solution. In block 16, when accessing 'train_dataset[0]', I encounter the error: 'ValueError: Unsupported number of image dimensions: 2'.
Skipping the block doesn't help as the same error occurs during training. I've searched online but couldn't find anything useful.
I'm using Google Colab and these library versions: transformers 4.39.0.dev0, torch 2.1.0+cu121, datasets 2.18.0.
I would greatly appreciate it if you could help me solve this problem. Thanks in advance.
I'm having the same issue, how did you solve it?
@@adikrish6926 I haven't figured it out yet, have you?
Yes I figured it out. The solution was to simply convert the grayscale images to RGB images by reshaping their arrays. The masks still need to stay as grey scale though.
def __getitem__(self, idx):
item = self.dataset[idx]
image = item["image"]
image = np.array(image)
# Check if the image is grayscale and convert it to RGB
if image.ndim == 2: # Image is grayscale
image = np.expand_dims(image, axis=-1) # Expand dimensions to (H, W, 1)
image = np.repeat(image, 3, axis=2) # Repeat the grayscale values across the new channel dimension
ground_truth_mask = np.array(item["label"])
# Get bounding box prompt
prompt = get_bounding_box(ground_truth_mask)
# Prepare image and prompt for the model
inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt")
# Remove batch dimension which the processor adds by default
inputs = {k: v.squeeze(0) for k, v in inputs.items()}
# Add ground truth segmentation
inputs["ground_truth_mask"] = ground_truth_mask
return inputs
Here is the code for it. This works for me. I hope it will work for you as well.
@@AakashGoyal25 It worked for me! Thank you so much!!
Thanks!
Thank you very much.
Great video. If we have multiple objects in an image that we want to fine tune, should we create one mask for each image with all objects masked and having like multiple bboxes , or a separate mask for each object in the same image?
Hi did you ever figure this one out?
I had the same problem, i solved this by pairing the image with the bounding box and then the mask corresponding to that bounding box as one training sample this way you can have the same image in different training samples but what differs is the bounding box and the ground truth mask. Hope it helps
Thanks for such an elaborate explanation, learned a lot 🙏
Hello Sir! I want to fine-tune my satellite datasets to delineate crop field parcels. But I am confused how to prepare masks for them. I want each crop parcel has different number (like instance segmentation). But it seems this tutorial provide for binary segmentation. How to solve this issue? Can you give me some advice to prepare masks datasets?
Excellent tutorial Sreeni!!! 👏👏Thank you so much!!!
Could you make a video on how to use the SAM image encoder only as a feature extractor and then use any other decoder to get the prediction mask?
When changing patch_size from 256 to 512 and step size from 256 to 512 I get this error:
"Error: AssertionError: ground truth has different shape (torch.Size([2, 1, 512, 512])) from input (torch.Size([2, 1, 256, 256]))"
Why is this?
There is a part in the image processor class of the 'from transformers import SamProcessor' where it calls a function, and it is stated that the default maximum patch size is 256x256. It took a couple of hours to realize, and I hope it will help somebody. I encourage everyone who wants to understand the code to check the code libraries
@@carlosjarrin3170 is there any chance to use a bigger patch size or is fine- tuning SAM only possible with 256x256? Maybe by using another image processor?
@@FelixWei-rn4bt I tried to scale the predicted_masks. And it worked for me. Try this:
predicted_masks = outputs.pred_masks.squeeze(1)
gt_shape = (640, 640) # the shape of your patch
interpolated_mask = F.interpolate(predicted_masks, gt_shape, mode="bilinear", align_corners=False)
predicted_masks = interpolated_mask.float()
@@carlosjarrin3170 Is there any way to fix it? because I have dataset with all images of dimension 64x273 so I did not make patches of the images. and because of this size problem I am not able to train SAM
Hey man, nice job, u e amazing like a what. I have got a problem in 26:00 min in video, in that 'example' i have an error that says, if anyone can help me, i really appreciate that. this is the last part of ERROR:
...raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
ValueError: Unsupported number of image dimensions: 2
i have the same problem... i wish he did this on spyder ide so we could see the variable explorer. i need to see the dimensions of the input images and masks (hope he can give an answer soon)
@@lee-ちゃん The data that returns, is a dic that has 2 keys. also we can use '.dataset' whit that, but i don't really know what i gonna do, also in 2 or 3 lines later, we have this bunch of the code : "batch = next(iter(train_dataloader))" also with same error. hope someone help...
got the same error
@@Theredeemer-wc6ly Uh mate
@@mmd_punisher there was a fix a few comments ahead
Thank you very much for such a wonderful tutorial!!!
I was going through the same problem of drop_last=True. This is simply because if the last batch in your dataset contains only 1 training sample, you will get this error since batch normalization can be applied to one training sample. For instance, if the batch size is 2, and your training dataset is 101, in this case, you have 51 batches, the last batch contains only one training sample, and this absolutely will throw an error. You can generate this error and comment right here.
Is there a way that we can use SAM for an image sequence? I'm trying to segment grains and pore area for small sand.
Hey there! Great work. I came across this video while researching about Segmentation using Transformers. However, on my dataset I am facing a problem. In the cell
train_dataset = SAMDataset(dataset=dataset, processor=processor)
example = train_dataset[0]
for k,v in example.items():
print(k,v.shape)
I am getting an error which says Unsupported number of image dimensions: 2. I am using grayscale images here and have tried expanding the dimension of the images while reading it, only to give the same error. If anyone has any suggestion or is aware of some update I have missed, then please go on ahead and educate me :). Am in dire need of some help. Thanks.
Thank you very much for this amazing tutorial
thanks for the great video
can you please tell me how to i add classes name in prdicted segmentation
Hi, good content. How can we train overlapping case? Train with one box and it's segment mask at a time? Or can we train with all boxes at a time utilising three output channels?
this is gold, thanks
Your videos are so good.. please post a video on deep image prior..
Thanks
Thanks for sharing the video!
At 1:44, you mention SAM is designed to take text prompt describing what should be segmented.
I am not sure that is the case, can you explain how?
Its called langsam. You can find it by search for segment-geospatial.
I think it works by using a combination of object-detection and segmentation. The object detection is done with Grounding Dino, which return a bunch of bounding boxes. The object inside these bounding boxes are then segmented using SAM.
Hi i have used your code in order to fine tune sam in order to segment aerial images , but when i use my finetunedsam.pth it doesn’t even segment the images that it used to segment with no finetuning, what do you think is the problem ? Thank you in advance !!
if we already have prompt(mask) for test image as an input, why we use SAM to get the mask ? I mean - we already have an answer, how using SAM will help us?
can someone please explain to me how can i use this model in the same context but with multiple classes i'm trying to train a sam mode on the fickr material dataset so that it detects materials composing objects
Can i train a multi class semantic segmentation SAM model on my custom dataset?
How to make a tif file for images and masks if I have custom data to train or is there any work around to train the model on custom data?
Thanks for the video! I am getting the error "ValueError: Unsupported number of image dimensions: 2" in the SAMDataset, and I am strugling to fix it. Anyone with similar error?
I guess you are working a gray image and SAM expects a color image with 3 channels. If this is the case, you can copy your array twice to create an array with shape (x, y, 3) instead of just (x,y).
@@DigitalSreeni That was exactly the problem, thank you!
Hello Sreeni, first of I really enjoy your videos and they are really awesome. I was trying to re-run the code you have but I am facing to an issue on the line where you have example = train_dataset[0]. I get the following error: ValueError: Unsupported number of image dimensions: 2. is there any package I am missing? your help would be appreciated.
May I know where is the 12 images tif? the website only gives us two sets of tif, each have 165 images
Great video, and great instructor. However...
This get_bounding_box is not very good for multiple objects. Furthermore, I could not make it work for more than one bounding box as a prompt. Do you have an idea how to generalize it?
Thank you so much for this incredible and praactical video. Is there a way to segment multiple different objects within the same model or does it need to be two separate? For example if i wanted to segment both mitochondria and lysosomes (and train a model to recognizes BOTH those things but as different things). would i need a separate SAM for mito vs lysosomes? Is there a way to do it that would be combined?
Thanks for this amazing share.
Is there any possibility SAM output the label associated with predicted mask in order to know the name of the instance segmented using SAM please?
Thanks in advance
This is great thanks a lot ! However, since you deleted the images with empty masks, this means that this can work only for images where there are mitochondria. Could this be extended so that the model returns an empty mask when there is no mito ? (or other things for other applications)
can we train sam on custom image size? I have a dataset that has an image size of 128x128 and I am unable to figure out how to train the model. any help would be appreciated.
SAM was originally trained on 1024x1024 images. It uses a ViT (Vision Transformer) backbone that expects this input size. Training directly on 128x128 images is challenging because SAM's architecture is designed for larger images. The model's receptive field and positional encodings are tailored for 1024x1024 inputs. You could upsample your 128x128 images to 1024x1024 before feeding them into SAM.
Hi! This was great - thank you very much for the tutorial! I was also trying to extend your work and work with the RGB rather than single-channel ones. I adjusted the code to deal with the RG images; however, I don't think I have it right for the loss calculations since I am getting a huuuge negative loss value. I was wondering if you have attempted to work with the RGB images as well?
Hello. I also need to work with RGB data. Could you please your modified code with me?
Is there any progress on it?
Have you already figured out why the loss function has such a high negative value? I have the same problem
Thanks for the great video. I am getting this error: AssertionError: ground truth has different shape (torch.Size([1, 1, 1024, 1024])) from input (torch.Size([1, 1, 256, 256])). Does anyone know how to solve it without using interpolation?”
How to finetune a multiclass segmentation label? How to make the prompt based on the label too?
have you find anything related to it?
Great Tutorial. can you share your notebook?
github.com/bnsreenu/python_for_microscopists/blob/master/331_fine_tune_SAM_mito.ipynb
How can you know if you overtrain?
how to measure the masks created from the SAM model? Thank you very much!.
Good day Sir please is it possible to us the SamautomaticMaskgenerator with fine tuned model please how can we generate the mask in the same way SamautomaticMaskgenerator works.
how could I train this on my datasets on roboflow?
Great work, but I have some trouble.
Instead of the example images you provided, I have used mine which are 200x200. However, I have encountered two problems:
- The images have to be in grayscale if they are RGB the program stops working in "batch = next(iter(train_dataloader))"
- The images have to be 256x256. If I use my 200x200 grayscale images it crashes when training, more specifically when calculating the loss. It says that the ground truth is 200x200, and the prediction is 256x256.
Do you know how I can fix this problem?
My guess is you can just zero pad your image and it should work (np.pad makes that very easy)
@@NicolaRomano Thank you! Could you handle work with RGB images?
@@DDDOOO-r9e you should definitely be able to, I haven't tried honestly, you'll probably simply need to take into account the different shape of the image (e.g. (3,256,256) instead of (256,256)). But also, it depends what you want to do (e.g. do you need segmenting the three channels together or separately?)
Really great video. Thank you so much.
Thanks a lot for the informative video! Do you have any videos applying MedSAM3D?
Not yet!
Hello DigitalSreeni, thank you for this tutorial. I'm getting an error and it's driving me crazy, because I am running your notebook and the same dataset. Everything runs fine, getting exactly the same results, up to the moment where we check an example from the dataset:
example = train_dataset[0]
for k,v in example.items():
print(k,v.shape)
I am getting the following error (Unsupported number of image dimensions: 2):
ValueError Traceback (most recent call last)
Cell In[17], line 1
----> 1 example = train_dataset[0]
2 for k,v in example.items():
3 print(k,v.shape)
Cell In[14], line 24
21 prompt = get_bounding_box(ground_truth_mask)
23 # prepare image and prompt for the model
---> 24 inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt")
26 # remove batch dimension which the processor adds by default
27 inputs = {k:v.squeeze(0) for k,v in inputs.items()}
File c:\Users\F72070\Document\FC20-dipnn-sot\env_fc20\Lib\site-packages\transformers\models\sam\processing_sam.py:71, in SamProcessor.__call__(self, images, segmentation_maps, input_points, input_labels, input_boxes, return_tensors, **kwargs)
57 def __call__(
58 self,
59 images=None,
(...)
65 **kwargs,
66 ) -> BatchEncoding:
67 """
68 This method uses [`SamImageProcessor.__call__`] method to prepare image(s) for the model. It also prepares 2D
69 points and bounding boxes for the model if they are provided.
70 """
...
--> 200 raise ValueError(f"Unsupported number of image dimensions: {image.ndim}")
202 if image.shape[first_dim] in num_channels:
203 return ChannelDimension.FIRST
ValueError: Unsupported number of image dimensions: 2
Any ideas or suggestions would be very appreciated!
Try this:
image = np.expand_dims(image, axis=-1) # Add channel dimension
image = np.repeat(image, 3, axis=-1) # Repeat grayscale channel to create 3 channels
The SAM Processor expects to get 3 input channels. Adding these above two lines of code to the __getitem__ method in the SAMDataset class should solve this issue. See the full example below
#######################################################
from torch.utils.data import Dataset
class SAMDataset(Dataset):
"""
This class is used to create a dataset that serves input images and masks.
It takes a dataset and a processor as input and overrides the __len__ and __getitem__ methods of the Dataset class.
"""
def __init__(self, dataset, processor):
self.dataset = dataset
self.processor = processor
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
item = self.dataset[idx]
image = item["image"]
image = np.expand_dims(image, axis=-1) # Add channel dimension
image = np.repeat(image, 3, axis=-1) # Repeat grayscale channel to create 3 channels
ground_truth_mask = np.array(item["label"])
# get bounding box prompt
prompt = get_bounding_box(ground_truth_mask)
# prepare image and prompt for the model
inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt")
# remove batch dimension which the processor adds by default
inputs = {k:v.squeeze(0) for k,v in inputs.items()}
# add ground truth segmentation
inputs["ground_truth_mask"] = ground_truth_mask
return inputs
@@davidsolooki3051 thanks!
Great tutorial as always Sreeni, thank you, There is a project called medical SAM, that is already custom training with thousands of medical images, to check it out. In social media you have mentioned a tutorial to pass from binary image to polygon masks. Is there any resource that I can base myself on to do this process?
Converting annotations will be my focus for the next video - hoping to release it on Sep 20th. I need to collect my code from different projects and put it together into a single video tutorial. Please stay tuned :)
@@DigitalSreeni thank you Sreeni, I'll stay tuned.
And do I get the bounding boxes from the resulting mask?
how to unpatch the images?
Hi, How can we train SAM with RGB images and masks like dubai aerial segmentation dataset , can you help with some feedbacks?
Hello. I also want to modify the code for RGB images. Did you successfully execute the code?
Thank you sir, got clear understanding
Are you planning on a similar tutorial for SAM2?
SAM2 is similar but I can do a video on multi-class segmentation using SAM2. This example is just a single class.
Where in the notebook segment-anything repo is used.
Please can you make a video on fine tuning for coco.json data set. Is it possible to fine tune the model for multi-class images
Thanks for great video
Is the same way can I apply it on multi class
Sorry, I haven't tested this for multi-class.
can we try??
How does this model compare to the nnUNetv2 model?
predicted_masks = outputs.pred_masks.squeeze(1)
ground_truth_masks = batch["ground_truth_mask"].float().to(device)
loss = seg_loss(predicted_masks, ground_truth_masks.unsqueeze(1))
can you explain the output shapes and why ground_truth masks are unsqueezed?
can you do freelancing ? "solar panel counting from UAV images using SAM"
Is it possible to use text prompts for fine tuning?
hi sreeni n ppl! does anyone know about any computer vision ML online forum, to post related questions?. Thx!
Nice video. Thanks for sharing!!!
Hi, Thanks for the video, is there a option that we can add point prompts ?
hello, I'm trying to do that right now. Please tell me if you were able to do it
Hi sreeni, great video it is very helpful for me. i was trying to fine tune model for my own custom data but it has 3 channels. while preparing Pytorch custom dataset i had error like "ValueError: zero-size array to reduction operation minimum which has no identity". can you help me to sort out this issue?
This error probably refers to one of your training masks being blank. Try to sort your masks so you only use the ones where you have some information, otherwise the tensor would be empty.
Hi sreeni Thanks for your reply. I have trained SAM model for RGB image but prediction result was empty . can you please tell me what could be wrong?
@@DigitalSreeni
I am trying this tutorial on Breast-Ultrasound-Images-Dataset on Kaggle, I get the same error message during creating a DataLoader instance. When I try to convert to mask into np.array to get the ground_truth_seg, np_unique(ground_truth_seg) does not output array([0, 1], dtype=int32). Instead it outputs an array of bunch of numbers and dtype is. uint8 instead.
@@DigitalSreeni Thank you! Yes I was getting the same error as I mentioned before and it was because of the blank masks. I filtered them and the error went away.
Hello. I also need to work with RGB data. Could you please your modified code with me?
Please post a video on deep image prior.Thanks
Dear, how can i modify to train with input shape (512x512x3). Reply me plz~~~
x3 means that it is a color image, change it to greyscale so it is 2d. 512 by 512
@@Theredeemer-wc6ly thank you bro for replying me 🙏
Kindly run df-gan and hifi-gan code. Your code videos are really helpful please help me in running these codes
Nice tutorials
Классное видео ! Спасибо за подробное объяснение!
great job! thanks!
What if you have bigger objects than mitochondria so that the patches of 256x256 are to small? In this video (video 206) ruclips.net/video/LM9yisNYfyw/видео.html you say that patches should be at least 4 times bigger than the objects. But what if the object is big and I try to change patch size from 256 to e.g. 512 in your colab script I get this error: "Error: AssertionError: ground truth has different shape (torch.Size([2, 1, 512, 512])) from input (torch.Size([2, 1, 256, 256]))"
what's up with tffs dude.
darkmode please....... for the love of all that is holy.....
this shi complicated af
Thanks!
Thank you
Thank you
Thanks!