Vision Transformers (ViT) pytorch code

Поделиться
HTML-код
  • Опубликовано: 27 сен 2024

Комментарии • 7

  • @mashaan14
    @mashaan14  28 дней назад

    Hi everyone 👋
    It’s been a while since I post this video, and it’s time to reflect back. First, there are multiple ways to visualize attention in vision transformers. This paper (“Transformer Interpretability Beyond Attention Visualization”, arxiv.org/abs/2012.09838v1 ) compared different visualization methods.
    What I did in this video is just testing the attention at the first layer by feeding a test image and pulling out the response, which are the query, key, and value matrices. Multiplying the query matrix with the transposed key matrix gives us a squared matrix showing how each patch is “paying attention” to every other patch. If we order the patches back to their positions in the original image, we’ll see which patches have the highest attention values.
    I updated the code by adding more comments and printouts, just to make it more readable.

  • @tylervandermate
    @tylervandermate 29 дней назад

    This is exactly what I've been trying to find for model visualization. Thank you! It's difficult finding any in-depth info on transformers involving the query, key, and value matrices.

    • @mashaan14
      @mashaan14  28 дней назад

      Thank you, that's great to hear.
      I'm currently working on a Swin transformer visualization video. Hopefully, I'll post it in a week or so.

  • @doublesami
    @doublesami 4 месяца назад +1

    very informative, Can you please make a video on vision mamba or Vmamba and explain the theoretical as well as the implementation part ? Looking forward

    • @mashaan14
      @mashaan14  4 месяца назад

      Thanks, I just checked VMamba on github. Sure I'll add it to my todo list. The thing is I'm recording a series on graph neural networks. Once I'm done with that, I'll get back to vision transformers.

  • @dossantos4415
    @dossantos4415 5 месяцев назад +1

    Could you the same but for NLP

    • @mashaan14
      @mashaan14  5 месяцев назад

      I guess you want maps similar to the ones in this paper:
      Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014
      If that's the case, I'm sorry I'm not familiar with that topic.