Swin Transformer - Paper Explained

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024

Комментарии • 31

  • @VedantJoshi-mr2us
    @VedantJoshi-mr2us 6 месяцев назад +4

    By far one of the best + complete, SWIN transformer explanations on the entire Internet.

    • @soroushmehraban
      @soroushmehraban  6 месяцев назад

      Thanks!

    • @FinalProject-rw1yf
      @FinalProject-rw1yf 6 месяцев назад

      @@soroushmehraban Hi sir, could you also explain the FasterViT and GCViT paper...

  • @tukaramugile573
    @tukaramugile573 День назад

    Very good explanation. Thank you

  • @omarabubakr6408
    @omarabubakr6408 Год назад

    That's The Most Illustrative Video Of Swin-Transformers on The Internet!

    • @soroushmehraban
      @soroushmehraban  Год назад

      Glad you enjoyed it 😃

    • @omarabubakr6408
      @omarabubakr6408 Год назад

      @@soroushmehraban yes abs thx so much, although I Have a Quick Question More Related to PyTorch actually which is in min 12:49 in line 239 in the code 1st what does -1 here means and what does it do exactly with the tensor 2nd from where we get [4,16] the 4 here from where we got it cuz its not mentioned in the reshaping. Thanks in advance.

  • @yehanwasura
    @yehanwasura Год назад +2

    Really informative, helped me lot to understand many concepts here. Keep up the good work

  • @SizzleSan
    @SizzleSan Год назад +1

    Thorough! Very comprehensible, thank you.

  • @kerenc91
    @kerenc91 17 дней назад

    Great explanation, thanks!

  • @rohollahhosseyni8564
    @rohollahhosseyni8564 Год назад

    Very well explained, thank you Soroush.

  • @kundankumarmandal6804
    @kundankumarmandal6804 11 месяцев назад

    You deserve more likes and subscribers

  • @symao-ir9vw
    @symao-ir9vw 26 дней назад

    17:15, may I ask why the number at the right bottom of the 3rd swin block is 6?

    • @soroushmehraban
      @soroushmehraban  25 дней назад

      That's a hyperparameter I believe. It's hard to use lots of layers at first and second stage because of the memory constraints we have with 4x4 and 8x8 patches and 32x32 patch at the last stage has the highest patch size (least attention to details). So they used the most at 16x16 patch size instead.

  • @antonioperezvelasco3297
    @antonioperezvelasco3297 Год назад

    Thanks for the good explanation!

  • @symao-ir9vw
    @symao-ir9vw 26 дней назад

    The discussion about patch size at around 16:40 is confusing

    • @soroushmehraban
      @soroushmehraban  25 дней назад

      I was comparing 4x4 swin transformer vs 4x4 ViT. In 4x4 ViT the whole layers have patches of 4x4 pixels so in all layers they have good attention to details. But in swin transformer as we go forward we merge these tokens so we have less attention to details in deep layers (that's why the end layer output is not enough for segmentation).

  • @proteus333
    @proteus333 Год назад

    Amazing video !

  • @SaniaEskandari
    @SaniaEskandari Год назад

    perfect description.

  • @siarez
    @siarez Год назад

    Great video! Thanks

  • @pradyumagarwal3978
    @pradyumagarwal3978 3 месяца назад

    where is the code that u were referring to?

    • @soroushmehraban
      @soroushmehraban  3 месяца назад

      github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L222

  • @akbarmehraban5007
    @akbarmehraban5007 Год назад

    I enjoy very much

  • @Karthik-kt24
    @Karthik-kt24 5 месяцев назад

    very nicely explained thank you! likes are at 314 so didnt hit like it😁subbed instead

  • @dslkgjsdlkfjd
    @dslkgjsdlkfjd 5 месяцев назад

    2:43 C would be equal to the number of filters not the number of kernels. In the torch.nn.conv2d operation being performed we have 3 kernels for each input channel and then C number of filters. Each filter having 3 kernels not C number of kernels.