Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 33

  • @Kram1032
    @Kram1032 4 года назад +15

    It's gonna take a lot of doing to make that feasible but I'm really curious what could happen with this attentional type of processing for multimodal data.
    Like, imagine you could scrub the web like they did for GPT-3 but include not just text but also images. Entire illustrated books. Embedded videos with spoken language.
    Language is fundamentally dependent on the real world. It's crazy how far we can get with *just* text but I'd imagine a lot of things could be easily disambiguated if words aren't just typed but also heard or in the context of other stuff.
    So making attention more efficient for images is a solid step towards something like this and I'm really looking forward to what'll come of it.

    • @felipemello1151
      @felipemello1151 4 года назад +2

      Google actually has a trained NN that accepts all sorts of inputs (images, text, etc). The idea was to have a single model for everything. I cant remember the name of it though.

    • @Kram1032
      @Kram1032 4 года назад +2

      @@felipemello1151 Google Brain I think, but I'd imagine there was quite some progress since

    • @jasdeepsinghgrover2470
      @jasdeepsinghgrover2470 4 года назад +2

      I think we are very close to something like this but positional embedding should become something general then. Like context embedding. Something like an Image caption should be associated with both the image and the text referring to the image. Maybe after that, this will be possible.

  • @herp_derpingson
    @herp_derpingson 4 года назад +6

    27:00 I think a better interpretation would be. "When I am at this position, I am more important. Or less important."
    Also attention based models are inherently interpretable compared to convolution based models. So, I think these will win out in the long run. Perhaps we can have a hybrid of CNN and attention.

    • @socratic-programmer
      @socratic-programmer 4 года назад +4

      To an extent convolutional models can also be analysed to see what parts were the most excited (and contributed to the final prediction). The other main advantage - and the reason I think we will at least have some hybrid of conv + attention - is that convolutions are much more parameter-efficient than FC or self-attention layers.

    • @redjammie8342
      @redjammie8342 4 года назад +2

      @@socratic-programmer Also local connectivity at low level visual features make perfect sense.

  • @whatdl6002
    @whatdl6002 4 года назад +12

    Are we a couple of million dollars of Neural Architecture Search away from the end of convolutions???

  • @binjianxin7830
    @binjianxin7830 4 года назад +1

    When convolutions go deep, they seem not only to be more efficient but also condense information in various abstract and profound ways. Certainly the Attention layers need more efficiency.

  • @alceubissoto
    @alceubissoto 4 года назад +1

    Thanks for the video Yannic. Amazing explanation!

  • @jahcane3711
    @jahcane3711 4 года назад +1

    Beautiful. Thank you Yannic

  • @marcussky
    @marcussky 4 года назад +4

    Check out Tabnet... Attention is coming for tabular data as well...

  • @PaganPegasus
    @PaganPegasus 2 года назад

    7:55 Yannic just predicted the Perceiver architecture. Madman.

  • @TechVizTheDataScienceGuy
    @TechVizTheDataScienceGuy 4 года назад +1

    Nicely explained! 👍

  • @blizzard072
    @blizzard072 3 года назад

    As the subscript implies, there seems to be a positional embedding r_p for every output position o. Then I'm not sure if that would be memory friendly.. Having relative positional embeddings for every pixel seems intense.

  • @shrutishrestha8296
    @shrutishrestha8296 4 года назад +3

    are there any code using this for segmentation?

  • @sahilriders
    @sahilriders 3 года назад +1

    Did you checked out MaX-DeepLab paper? It will be nice if you can make a video on that.

  • @jackeown
    @jackeown 4 года назад +1

    You should do a video on TabNet for tabular data using neural nets. I feel like there's a lot there and the explanations online kind of suck.

  • @seyeeet8063
    @seyeeet8063 3 года назад

    can someone explain to me what does axial means? :) have a hard time getting it

  • @freddiekalaitzis5708
    @freddiekalaitzis5708 3 года назад

    In times when SOTA is unfortunately king to young reviewers, I can appreciate the authors' need to perform well at least within some class of models. Imagine the frustration when all you offer to the community is a competitive alternative, only for the reviewer to retort it's not the best tool by an arbitrary margin.
    Great video.

  • @GyuHobbyRC
    @GyuHobbyRC 4 года назад +1

    I enjoyed a great video let's be friends 😊😊😊😊
    우리 친구 해요.!!!~~^^

  • @trevormartin1944
    @trevormartin1944 4 года назад

    Does anyone know what Yannic uses to be able to draw and edit over the PDFs?

  • @az8134
    @az8134 3 года назад

    Attention is the new MLP when you are rich

  • @monstrimmat
    @monstrimmat 3 года назад

    "What's a good number?"

  • @Lee-vs5ez
    @Lee-vs5ez 4 года назад +1

    So many tricks for reducing computational powers lately. Intuitive but also qustionable

    • @autonomous2010
      @autonomous2010 4 года назад +1

      Yep. A lot of approaches scale very poorly requiring exponentially more resources the more data you have. So there's a lot of experimenting to try to get around that major limitation.

  • @mariomariovitiviti
    @mariomariovitiviti 4 года назад +4

    These names are getting out of hand

  • @qimingzhong1044
    @qimingzhong1044 4 года назад

    with transformer dominating ranking indexes, light weight neural network might be a thing of the past.

    • @redjammie8342
      @redjammie8342 4 года назад

      what do you mean lightweight neural network?