Hope somwething similar to FastSAM and MobileSAM comes also for SAM2. Also, combining with Grounding Dino to autodistill to a smaller model would really be something amazing for video segmentations
for the inference of video stream, maybe you can still reduce the video size to sixteen times less to its original(which is 4 times less at each edge), that might help with increasing the speed😂
Haha maybe ;) I’ll be reading the paper today. (crazy that I didn’t have time to do it so far) and I’ll try to learn more on how did they benchmarked it.
@Roboflow, I would be grateful if you could answer a question. I have several images which consist of a few objects. How can I segment a specific object whose orientation is not fixed in an image? In your code you are demonstrating only one image and manually drawing bounding boxes/points then serving those bounding box parameters as a prompt to the model then the model gives back the segmentation mask for those objects. Is it possible to define a rule once like focusing only on a specific object in an image and apply the same to other images to extract those specific masks?
@@Roboflow Is it possible to segment all dogs / all persons in a video. Without any manual intervention or our help. Maybe combining it with YOLO can help detect persons and and eliminate the need for our help?
@@Roboflow Sorry for my late response. Yes, let's say we have many dogs in an image and we are interested in only one specific dog, we have multiple images which include more or less the same amount of dogs but each time we only want to segment let's say white dog whose position in each image is varying. In the video at 26:29 mins you are plotting all possible binary masks. Can we somehow select the mask which we are interested in automatically from each image? e.g the right mask of the dog is located in the third position from the video (at 26:29 mins), on the other hand - to select it manually we can define the index value like mask[2] and do whatever we want to do with it. I want to automate this selection process so that the selection of the right mask is happening automatically. Hope you understand my concern.
Is it compatible with Jetson Orin? I assume it is since SAM is, however I'm having issues installing using pip: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.) And at the end of the pip output: raise OSError('CUDA_HOME environment variable is not set. ' OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. However both numpy, CUDA, CUDA_HOME variable, all are set and working fine.
@@Roboflow Thanks for the reply! Jetson Orin is actually Nvidia so it should work though? Can I please get a link to the unofficial package? Thanks again!
is there any modifications if I want to track only one object ? the code is perfectly working with multiple objects but with single object I have error('not enough values to unpack (expected 2, got 1)') in this line : xyxy=sv.mask_to_xyxy(masks=masks) , I think its related to this : masks = np.squeeze(masks). I need your help , thnx
I think the code is writed for multiple objects. When I try to segment one single object like just ball, took same error but when I add another object to object list and take it to prompt, model runned. But however I could not find the multiple object condition part.
@Roboflow do you know if the model works in real time on camera, live on device? Without a video input file, no user input of any kind, just a plain camera running on for example iPhone, and model inferring the segmentation live as the person is moving? Thanks a lot!!
SAM2 can segment anything, close included. But it would need a little bit of help to understand what clothes are. You’d need to prompt it the right way.
@@Roboflow any tips on doing this would be appreciated. You may have covered it, I just found your channel via a search for SAM 2 and labeling, so I will look around. I’m still in image segmentation part of this video, enjoying the session so far.
@@Roboflow Thanks for your reply man!! Do you have the reference code for the reid anywhere? Did u just concatenate the 3 videos, will that alone help in reid? My question is more like can we use it for maybe some realtime use cases, like a retail store where a person moves from one camera to another and still gets re identified. Also can we use this to save the features of people and reidentify a person who reappears after a long time.
I wanted to count bees and small chicken in their hive without labeling and training. It was not successful on SAM (1). Hope it can succeeded using SAM 2.
@@Roboflow using CVAT annotation + SAM online service. Not yet tried using roboflow segmentation tools. Detection using Detection2, but result is not yet accurate. Using napari and stardict yield better result. But the annotation tooling is not yet yield consistent result and needs to code the process.
@@Roboflow i will try to be more accurate, i want to use SAM2 for segment objects by point, and the notebook you shared is only show masks, so my question is how do i show the objects mark as you did with the box (before and after - side by side) and thanks a lot !! you are super talent :)
Hope somwething similar to FastSAM and MobileSAM comes also for SAM2. Also, combining with Grounding Dino to autodistill to a smaller model would really be something amazing for video segmentations
I have no doubt people already work on project like this!
It was an awesome and fun session, thank you so much ! :)
Always a pleasure to see you in the chat!
Greetings from Egypt,,,, thanks for your awosome presentation
My pleasure!
Hi, Can we have a video on fine tuning SAM on videos pleaseeeeee....
for the inference of video stream, maybe you can still reduce the video size to sixteen times less to its original(which is 4 times less at each edge), that might help with increasing the speed😂
Haha maybe ;) I’ll be reading the paper today. (crazy that I didn’t have time to do it so far) and I’ll try to learn more on how did they benchmarked it.
thank you for sharing. I wonder how to work with our own mask in predictor.predict method instead of box and/or points.
WoW man. Insane value
Doing my best!
thanks a lot for sharing!
Fra! What’s up?
@Roboflow, I would be grateful if you could answer a question. I have several images which consist of a few objects. How can I segment a specific object whose orientation is not fixed in an image? In your code you are demonstrating only one image and manually drawing bounding boxes/points then serving those bounding box parameters as a prompt to the model then the model gives back the segmentation mask for those objects. Is it possible to define a rule once like focusing only on a specific object in an image and apply the same to other images to extract those specific masks?
What do you mean by specific object? You would for example like to segment all dogs on all images?
@@Roboflow Is it possible to segment all dogs / all persons in a video. Without any manual intervention or our help. Maybe combining it with YOLO can help detect persons and and eliminate the need for our help?
@@Roboflow Sorry for my late response. Yes, let's say we have many dogs in an image and we are interested in only one specific dog, we have multiple images which include more or less the same amount of dogs but each time we only want to segment let's say white dog whose position in each image is varying. In the video at 26:29 mins you are plotting all possible binary masks.
Can we somehow select the mask which we are interested in automatically from each image? e.g the right mask of the dog is located in the third position from the video (at 26:29 mins), on the other hand - to select it manually we can define the index value like mask[2] and do whatever we want to do with it.
I want to automate this selection process so that the selection of the right mask is happening automatically. Hope you understand my concern.
Thank you awesome guy ~~
Thank you
Thanks for the video. Does it work with Cuda 11.8?
I’m not sure. I used 12.2.
Is it compatible with Jetson Orin? I assume it is since SAM is, however I'm having issues installing using pip:
UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
And at the end of the pip output:
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
However both numpy, CUDA, CUDA_HOME variable, all are set and working fine.
As far as I know if you install from source it is only compatible with NVIDIA GPUs. But the unofficial package samv2 is installable on other devices.
@@Roboflow Thanks for the reply! Jetson Orin is actually Nvidia so it should work though? Can I please get a link to the unofficial package? Thanks again!
Did you do something like Lang Sam 2 already? You mentioned that at one point in the video
I did combo Florence2 + SAM2: huggingface.co/spaces/SkalskiP/florence-sam
is there any modifications if I want to track only one object ? the code is perfectly working with multiple objects but with single object I have error('not enough values to unpack (expected 2, got 1)') in this line : xyxy=sv.mask_to_xyxy(masks=masks) , I think its related to this : masks = np.squeeze(masks). I need your help , thnx
I think the code is writed for multiple objects. When I try to segment one single object like just ball, took same error but when I add another object to object list and take it to prompt, model runned. But however I could not find the multiple object condition part.
Can you provide the notebook link for video segmentation? Thanks
You can find the notebook here: github.com/roboflow/notebooks
@Roboflow do you know if the model works in real time on camera, live on device? Without a video input file, no user input of any kind, just a plain camera running on for example iPhone, and model inferring the segmentation live as the person is moving? Thanks a lot!!
It can and can’t depending on what you want to do. Could you be an bit more specific? What outcome do you expect?
Mobile friendly real time camera on phone. Model inferring live as person moves, separating person from the background
Can the model used to delineate the object use satellite image as input????
Do you mean „detect”?
Does SAM2 allow for instance or panoptic segmentation?
Great question. Unfortunately not. SAM2 only gives you masks without classes.
Finetuning soon? : )
Hahaha. Not sure if I’m smart enough to fine-tune SAM ;)
can I segment clothes with sam 2 ?
if no do you know any pretrained model
SAM2 can segment anything, close included. But it would need a little bit of help to understand what clothes are. You’d need to prompt it the right way.
@@Roboflow any tips on doing this would be appreciated. You may have covered it, I just found your channel via a search for SAM 2 and labeling, so I will look around. I’m still in image segmentation part of this video, enjoying the session so far.
Did u try the reid?
I did! Take a look at this Twitter post: x.com/skalskip92/status/1818648396002951178?s=46&t=PmKZyPs_J7tyW5sS8kHeLg
@@Roboflow Thanks for your reply man!! Do you have the reference code for the reid anywhere? Did u just concatenate the 3 videos, will that alone help in reid?
My question is more like can we use it for maybe some realtime use cases, like a retail store where a person moves from one camera to another and still gets re identified. Also can we use this to save the features of people and reidentify a person who reappears after a long time.
I wanted to count bees and small chicken in their hive
without labeling and training. It was not successful on SAM (1). Hope it can succeeded using SAM 2.
How did you tied to do it last time?
@@Roboflow using CVAT annotation + SAM online service. Not yet tried using roboflow segmentation tools.
Detection using Detection2, but result is not yet accurate.
Using napari and stardict yield better result. But the annotation tooling is not yet yield consistent result and needs to code the process.
any support for mac?
There is a pip package called samv2 that you can run on Mac. Unfortunately only cpu device is available. No mps as far as I know :/
why when i try to segment an object by point it "thinking" as a box first ? it is make the accuracy low . someone can help ?
Just to make sure your question is why you get better masks when you prompt with boxes compared to points?
@@Roboflow i will try to be more accurate, i want to use SAM2 for segment objects by point, and the notebook you shared is only show masks, so my question is how do i show the objects mark as you did with the box (before and after - side by side)
and thanks a lot !! you are super talent :)
👏👏