I’m curious to know the rationale behind combining loss functions. Do you initially know that X loss function will do x task, and Y loss function will do y task. And then you combine both losses? Or is there some deep literature done to decide which losses do what, and hope they work as expected?
I think the combination of Dice & Focal loss is one of those empirically "tried and tested" combos that tend to do well in segmentation tasks. Many segmentation paper mention this including the Meta's Segment Anything paper from last year. Intuitively, the focal loss/BCE is good for pixel-wise or low-level classification (the focal loss also addresses class imbalances) & dice loss is a more area-wise or higher level segmentation. At the end of the day, I think people try different loss functions/hyperparameters to test what works best for their dataset/model, and for segmentation the Dice+Focal loss is a traditional place to start experimenting.
Great suggestion. Referring Image Segmentation is pretty wild. The next video will be a follow up on this UNET project where I’ll be implementing YOLO from scratch, but I’ll add RIS to my queue. Meanwhile, I’ll also suggest checking out my Multimodal Neural Nets video which has a ton of info on the evolution of text+image models. Multimodal AI from First Principles - Neural Nets that can see, hear, AND write. ruclips.net/video/-llkMpNH160/видео.html
The short answer is that: YOLO is generally used for object localization+detection with anchor boxes (or bounding boxes)… UNET operates at pixel level as shown in the video and used for pixel-perfect object segmentation. So they kinda fulfill different purposes.
I’m curious to know the rationale behind combining loss functions. Do you initially know that X loss function will do x task, and Y loss function will do y task. And then you combine both losses?
Or is there some deep literature done to decide which losses do what, and hope they work as expected?
I think the combination of Dice & Focal loss is one of those empirically "tried and tested" combos that tend to do well in segmentation tasks. Many segmentation paper mention this including the Meta's Segment Anything paper from last year. Intuitively, the focal loss/BCE is good for pixel-wise or low-level classification (the focal loss also addresses class imbalances) & dice loss is a more area-wise or higher level segmentation. At the end of the day, I think people try different loss functions/hyperparameters to test what works best for their dataset/model, and for segmentation the Dice+Focal loss is a traditional place to start experimenting.
Hello, could you please upload tutorials to provide direction on referring image segmentation (RIS) research?
Great suggestion. Referring Image Segmentation is pretty wild. The next video will be a follow up on this UNET project where I’ll be implementing YOLO from scratch, but I’ll add RIS to my queue. Meanwhile, I’ll also suggest checking out my Multimodal Neural Nets video which has a ton of info on the evolution of text+image models. Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.
ruclips.net/video/-llkMpNH160/видео.html
can we get the access to the code for this project its really interested as i am a big football fan
Ping me on twitter - @neural_avb
What are its advantages and disadvantages over YOLO?
The short answer is that: YOLO is generally used for object localization+detection with anchor boxes (or bounding boxes)… UNET operates at pixel level as shown in the video and used for pixel-perfect object segmentation. So they kinda fulfill different purposes.
@@avb_fj What about the segmentation YOLO models, not the detection ones? For example yolov8-seg ..