There is some very confusing issue with the model name: * Facebook have another model called Dino, which is a self supervised vit model * The lineage of detr have a Semantic-SAM model , again with the same name as Facebook segmentation model * And to make it more confusing, the original detr was developed by Facebook From what I see, All these models are very capable and interesting
Love the video and series in general, would love to see something similar with other topics such as NLP or maybe super-resolution? Is anything like that planned. Also keep up the great work 🔥
Was just planning to record a video about gpt-2, and then other nlp topics - various berts, robertas and debertas, t5, e5, RAG, rlhf and all that stuff. Also don't want to stop on computer vision, exciting topics still to come. I do make 1 video a week though, so it will take some time )
@@makgaiduk Yes i have noticed the upload schedule and honestly I love that it’s pretty often, yet not overwhelming, giving me time to work on other personal stuff
Hey Mak! Thanks for such a great video. Since I'm working with DINO or something similar for my thesis project, was wondering if I could work with the model only including the denoising queries and deformable attention but excluding dynamic anchor boxes since it may not lead to significant improvements in performance for my use case?
DAB Detr concepts (i.e., dynamic anchor boxes) are unfortunately baked into the foundation of DN Detr and DINO. DAB Detr was the one to propose separation of "anchor boxes" and "content embeddings" from the decoder input. Without it, query denoising makes little sense. You might try to disable some aspects of the dynamic anchor boxes, like hw_attention_modulation (github.com/IDEA-Research/DINO/blob/main/models/dino/deformable_transformer.py#L1032), though support for that seems rather limited - it is not a config option, but rather a constant in code, and I am not 100% sure it will work correctly if you change it.
Thanks alot for this great series
I'm happy you find it helpful!
here goes the giant creature! Thanks for the video, high quality as always
Thanks for watching!
There is some very confusing issue with the model name:
* Facebook have another model called Dino, which is a self supervised vit model
* The lineage of detr have a Semantic-SAM model , again with the same name as Facebook segmentation model
* And to make it more confusing, the original detr was developed by Facebook
From what I see, All these models are very capable and interesting
Yeah, that's why I've put the entire paper title as the video name.
Search optimisation is hard as it is...
Love the video and series in general, would love to see something similar with other topics such as NLP or maybe super-resolution? Is anything like that planned. Also keep up the great work 🔥
Was just planning to record a video about gpt-2, and then other nlp topics - various berts, robertas and debertas, t5, e5, RAG, rlhf and all that stuff. Also don't want to stop on computer vision, exciting topics still to come.
I do make 1 video a week though, so it will take some time )
@@makgaiduk Yes i have noticed the upload schedule and honestly I love that it’s pretty often, yet not overwhelming, giving me time to work on other personal stuff
It will be helpful if you can remove the smaller window w/ the presenter's video. This will help to focus on the main content more.
Awesome thumbnails :)
1 week of your time = -3 weeks of research time * number of subscribers
Plus gamma * expected subscribers in the next week + gamma squared * expected subscribers in 2 weeks plus ...
@@makgaiduk Oh yes, my fault. It is also missing the derivative of the exponential growth of AI researchers per week.
Hey Mak! Thanks for such a great video. Since I'm working with DINO or something similar for my thesis project, was wondering if I could work with the model only including the denoising queries and deformable attention but excluding dynamic anchor boxes since it may not lead to significant improvements in performance for my use case?
DAB Detr concepts (i.e., dynamic anchor boxes) are unfortunately baked into the foundation of DN Detr and DINO. DAB Detr was the one to propose separation of "anchor boxes" and "content embeddings" from the decoder input. Without it, query denoising makes little sense.
You might try to disable some aspects of the dynamic anchor boxes, like hw_attention_modulation (github.com/IDEA-Research/DINO/blob/main/models/dino/deformable_transformer.py#L1032), though support for that seems rather limited - it is not a config option, but rather a constant in code, and I am not 100% sure it will work correctly if you change it.
Check out my next video - reading DINO source code ruclips.net/video/513MgXnqEhk/видео.html
1