DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

Поделиться
HTML-код
  • Опубликовано: 16 дек 2024

Комментарии • 18

  • @fayezalhussein7115
    @fayezalhussein7115 8 месяцев назад +2

    Thanks alot for this great series

    • @makgaiduk
      @makgaiduk  7 месяцев назад

      I'm happy you find it helpful!

  • @anhduy9553
    @anhduy9553 8 месяцев назад +3

    here goes the giant creature! Thanks for the video, high quality as always

    • @makgaiduk
      @makgaiduk  7 месяцев назад +1

      Thanks for watching!

  • @eliaweiss1
    @eliaweiss1 8 месяцев назад +4

    There is some very confusing issue with the model name:
    * Facebook have another model called Dino, which is a self supervised vit model
    * The lineage of detr have a Semantic-SAM model , again with the same name as Facebook segmentation model
    * And to make it more confusing, the original detr was developed by Facebook
    From what I see, All these models are very capable and interesting

    • @makgaiduk
      @makgaiduk  8 месяцев назад +1

      Yeah, that's why I've put the entire paper title as the video name.
      Search optimisation is hard as it is...

  • @matejsirovatka
    @matejsirovatka 8 месяцев назад +1

    Love the video and series in general, would love to see something similar with other topics such as NLP or maybe super-resolution? Is anything like that planned. Also keep up the great work 🔥

    • @makgaiduk
      @makgaiduk  8 месяцев назад

      Was just planning to record a video about gpt-2, and then other nlp topics - various berts, robertas and debertas, t5, e5, RAG, rlhf and all that stuff. Also don't want to stop on computer vision, exciting topics still to come.
      I do make 1 video a week though, so it will take some time )

    • @matejsirovatka
      @matejsirovatka 8 месяцев назад

      @@makgaiduk Yes i have noticed the upload schedule and honestly I love that it’s pretty often, yet not overwhelming, giving me time to work on other personal stuff

  • @bmonamie
    @bmonamie 14 дней назад

    It will be helpful if you can remove the smaller window w/ the presenter's video. This will help to focus on the main content more.

  • @vslaykovsky
    @vslaykovsky 8 месяцев назад +1

    Awesome thumbnails :)

  • @davidro00
    @davidro00 8 месяцев назад +3

    1 week of your time = -3 weeks of research time * number of subscribers

    • @makgaiduk
      @makgaiduk  8 месяцев назад

      Plus gamma * expected subscribers in the next week + gamma squared * expected subscribers in 2 weeks plus ...

    • @davidro00
      @davidro00 8 месяцев назад

      @@makgaiduk Oh yes, my fault. It is also missing the derivative of the exponential growth of AI researchers per week.

  • @subramanyabhat446
    @subramanyabhat446 7 месяцев назад

    Hey Mak! Thanks for such a great video. Since I'm working with DINO or something similar for my thesis project, was wondering if I could work with the model only including the denoising queries and deformable attention but excluding dynamic anchor boxes since it may not lead to significant improvements in performance for my use case?

    • @makgaiduk
      @makgaiduk  7 месяцев назад +1

      DAB Detr concepts (i.e., dynamic anchor boxes) are unfortunately baked into the foundation of DN Detr and DINO. DAB Detr was the one to propose separation of "anchor boxes" and "content embeddings" from the decoder input. Without it, query denoising makes little sense.
      You might try to disable some aspects of the dynamic anchor boxes, like hw_attention_modulation (github.com/IDEA-Research/DINO/blob/main/models/dino/deformable_transformer.py#L1032), though support for that seems rather limited - it is not a config option, but rather a constant in code, and I am not 100% sure it will work correctly if you change it.

  • @makgaiduk
    @makgaiduk  7 месяцев назад

    Check out my next video - reading DINO source code ruclips.net/video/513MgXnqEhk/видео.html

  • @yunootsuka9093
    @yunootsuka9093 8 месяцев назад +1

    1