Pixel Shuffle - Changing Resolution with Style

Поделиться
HTML-код
  • Опубликовано: 3 ноя 2024

Комментарии • 31

  • @salmiac-3105
    @salmiac-3105 Год назад +30

    would've loved an example image for the pixel suffle too there to really grasp what is happening

    • @ziggycross
      @ziggycross Год назад +7

      Was just about to leave a comment to say this! Was waiting for some example images, would be great to keep in mind for future videos!

    • @ELjoakoDELxD
      @ELjoakoDELxD Год назад +1

      @@ziggycross I wanted some images too. I didn't understand fully what the output is going to be with pixelshuffle
      Edit: grammar will always be difficult for me

    • @djmips
      @djmips Год назад +2

      It's not working on actual pixels. The 'depth' or input to the shuffle is the feature maps generated from the low res image and it's at this last stage that the image is upsampled. This is in contrast to older methods that would upsample the image straight away and then try and process that into the super resolution output which was both less efficient and potentially introduced the artifacts mentioned in the video. For more information see the paper referenced in the video. "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network" by Shi et al.

    • @johnpope1473
      @johnpope1473 7 месяцев назад

      @@djmips - now I understand. thanks

  • @VFXVideoeffectsCreator
    @VFXVideoeffectsCreator Год назад +9

    I have to say, that's really awesome! Especially the hint that transposed convolution is just the gradient computation of convolution w.r.t. its inputs. I regularly contribute to the backends of Deep Learning Frameworks in the Julia Programming Language, and transposed convolution (or deconvolution, or some freaky way to say it: fractionally strided convolution) is really just a function call to the function calculating the adjoint (gradient) of a normal convolution (except output_padding, but this just affects the size calculation anyway).

  • @chrisminnoy3637
    @chrisminnoy3637 Год назад +3

    Thanks. Was already using that for quite some time in my super resolution upscaler. Downside of the tensorflow implementation, as far as I know, you can only use squares, but it would make sense to also just do it in one dimension, or more in a rectangle. Some work to be done there ...

  • @kevalan1042
    @kevalan1042 Год назад +2

    Beautiful work as always

  • @I77AGIC
    @I77AGIC 5 месяцев назад

    this made it make a ton of sense. but one problem is pixel shuffle does not get rid of the artifacts. it introduces its own artifacts

  • @Firestorm-tq7fy
    @Firestorm-tq7fy 6 месяцев назад

    One of the best channels! I wish u‘d be covering more topics than only CNN, but guess can’t be a top pro in every topic. I def subbed and wished u‘d have way more videos already. But i can see that it takes alot of time and effort so i will wait. Thank u so much for this work ❤

  • @Biuret.
    @Biuret. Год назад +2

    Great content. Thank you!

  • @ScottzPlaylists
    @ScottzPlaylists Год назад +2

    👍 you make awesome illustrations.. ❤ Can you explain Transformers encoding and inference? ❓
    That would be a big hit also. 👏

  • @MarvinEckhardt
    @MarvinEckhardt 11 месяцев назад +1

    this video should have way more likes...

  • @HinaTan250
    @HinaTan250 Год назад +2

    This is really cool! 😄 Thanks for the information.

  • @AdmMusicc
    @AdmMusicc 5 месяцев назад

    Loved the animation thank you!!

  • @j________k
    @j________k Год назад +2

    Great series! Keep it up :)

  • @SrDlay
    @SrDlay Год назад

    thanks for your effort

  • @chrisminnoy3637
    @chrisminnoy3637 Год назад

    Would be nice to have a video about TensorTrain technique

  • @coryfan5872
    @coryfan5872 7 месяцев назад

    Hi, isn't this virtually the same effect as a stride 2, 2x2 transpose convolution with the output channel just being 4 times smaller? Its a convolutional filter with some binary weights that causes each pixel channel to be mapped to some new channel. The aforementioned transpose convolution would be the same if you just had a linear layer before the pixel shuffle.

  • @Erosis
    @Erosis Год назад +1

    Do you have a paper or resource about the artifacts in the gradient when using strided 3x3 convs?

    • @animatedai
      @animatedai  Год назад +3

      If you accept that transposed convolution (kernel size=3, stride=2) produces gridding artifacts in the output image then by definition, standard convolution (kernel size=3, stride=2) produces gridding artifacts in the input image gradient. The reason is that transposed convolution is implemented as a literal call to the gradient function of standard convolution in TensorFlow and PyTorch.
      I learned this at some point studying the papers and code of the StyleGAN saga. (nvlabs.github.io/stylegan2/versions.html) I wish I could narrow it down more for you, if you're trying to cite this. I have a feeling I learned it from reading their code or one of their references. You'll notice in all the versions of their code, they go out of their way to implement downsampling as a blur -> convolution rather than just a plain strided convolution. StyleGAN3 is all about aliasing.

    • @coryfan5872
      @coryfan5872 7 месяцев назад +1

      Its probably because some pixels overlap the convolutional filter only once (the ones in the centers), some pixels overlap the convolutional filter 2 times (the ones on the sides but not the corners), and some pixels overlap the convolutional filter 4 times (the ones in the corners). I wonder if using ConvNext's 2x2 convolutional layers still results in this sort of gradient artifacts.

  • @azatahmedov4308
    @azatahmedov4308 8 месяцев назад

    Can you explain how will you pixel_unshuffle, if resolution is 4000x3000 (WxH) and downscale_factor is 16?

  • @fqidz
    @fqidz Год назад

    theres zero explanation about how this would work with real images

    • @djmips
      @djmips Год назад

      It's not working on actual pixels. The 'depth' or input to the shuffle is the feature maps generated from the low res image and it's at this last stage that the image is upsampled. This is in contrast to older methods that would upsample the image straight away and then try and process that into the super resolution output which was both less efficient and potentially introduced the artifacts mentioned in the video. For more information see the paper referenced in the video. "Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network" by Shi et al.

  • @yinwong667
    @yinwong667 7 месяцев назад

    But why is it necessary to do pixel shuffle? Why can't we just output a rH x rW x 3 matric directly?

  • @周胜辉
    @周胜辉 6 месяцев назад

    有点想到了亚像素插值

  • @ucngominh3354
    @ucngominh3354 Год назад

    hi

  • @krum3155
    @krum3155 Год назад +2

    jif

  • @muthukamalan.m6316
    @muthukamalan.m6316 Год назад

    super cool. waiting for Transformers and BN,LN