Animated AI
Animated AI
  • Видео 11
  • Просмотров 191 216
Multihead Attention's Impossible Efficiency Explained
If the claims in my last video sound too good to be true, check out this video to see how the Multihead Attention layer can act like a linear layer with so much less computation and parameters.
Patreon: www.patreon.com/Animated_AI
Animations: animatedai.github.io/
Просмотров: 4 597

Видео

What's So Special About Attention? (Neural Networks)
Просмотров 6 тыс.5 месяцев назад
Find out why the multihead attention layer is showing up in all kinds of machine learning architectures. What does it do that other layers can't? Patreon: www.patreon.com/Animated_AI Animations: animatedai.github.io/
Pixel Shuffle - Changing Resolution with Style
Просмотров 8 тыс.10 месяцев назад
Patreon: www.patreon.com/Animated_AI Animations: animatedai.github.io/#pixel-shuffle
Source of confusion! Neural Nets vs Image Processing Convolution
Просмотров 4,2 тыс.11 месяцев назад
Patreon: www.patreon.com/Animated_AI All Convolution Animations are Wrong: ruclips.net/video/w4kNHKcBGzA/видео.html My Animations: animatedai.github.io/ Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/ Bee image: catalyststuff on Freepik www.freepik.com/free-vector/cute-bee-flying-cartoon-vector-icon-illustration-animal-nature-icon-concept-isolated-premium-vector_31641108.htm#q...
Groups, Depthwise, and Depthwise-Separable Convolution (Neural Networks)
Просмотров 32 тыс.Год назад
Patreon: www.patreon.com/Animated_AI Fully animated explanation of the groups option in convolutional neural networks followed by an explanation of depthwise and depthwise-separable convolution in neural networks. Animations: animatedai.github.io/ Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/
Stride - Convolution in Neural Networks
Просмотров 7 тыс.Год назад
Patreon: www.patreon.com/Animated_AI A brief introduction to the stride option in neural network convolution followed by some best practices. Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/
Convolution Padding - Neural Networks
Просмотров 8 тыс.Год назад
Patreon: www.patreon.com/Animated_AI A brief introduction to the padding option in neural network convolution followed by an explanation of why the default is named "VALID". Intro sound: "Whoosh water x4" by beman87 freesound.org/s/162839/
All Convolution Animations Are Wrong (Neural Networks)
Просмотров 59 тыс.Год назад
Patreon: www.patreon.com/Animated_AI All the neural network 2d convolution animations you've seen are wrong. Check out my animations: animatedai.github.io/
Filter Count - Convolutional Neural Networks
Просмотров 15 тыс.2 года назад
Patreon: www.patreon.com/Animated_AI Learn about filter count and the realistic methods of finding the best values My Udemy course on High-resolution GANs: www.udemy.com/course/high-resolution-generative-adversarial-networks/?referralCode=496CFB7F680D78F02798
Kernel Size and Why Everyone Loves 3x3 - Neural Network Convolution
Просмотров 27 тыс.2 года назад
Patreon: www.patreon.com/Animated_AI Find out what the Kernel Size option controls and which values you should use in your neural network.
Fundamental Algorithm of Convolution in Neural Networks
Просмотров 20 тыс.2 года назад
Patreon: www.patreon.com/Animated_AI See convolution in action like never before!

Комментарии

  • @ananyapamde4514
    @ananyapamde4514 5 дней назад

    Such a beautiful video

  • @user-zb9ub5nd1z
    @user-zb9ub5nd1z 5 дней назад

    Thank you. Great job

  • @minecraftermad
    @minecraftermad 10 дней назад

    wanna bet e by e would be somehow mathematically optimal?

  • @thrisharamkumar9566
    @thrisharamkumar9566 12 дней назад

    Coming here after "All convolution animations are wrong", brilliant work! thank you very much!

  • @thecodegobbler2179
    @thecodegobbler2179 12 дней назад

    The other drawings and visuals can't keep up with this! Great content! I love the visualizations!

  • @harrydawitch
    @harrydawitch 17 дней назад

    My second favourites 3Brown1Blue channel

  • @keihoag6467
    @keihoag6467 28 дней назад

    Please do a video on backpropagation (since its another convolution)

  • @feddyxdx272
    @feddyxdx272 28 дней назад

    thx

  • @bibimblapblap
    @bibimblapblap Месяц назад

    Why is your input tensor so many dimensions? Shouldn’t the depth be only 3 (1 for each color channel)?

  • @lizcoultersmith
    @lizcoultersmith Месяц назад

    These videos are outstanding! Finally, true visualisations that get it right. I'm sharing these with my ML Masters students. Thank you for your considerable effort putting these together.

  • @captainjj7184
    @captainjj7184 Месяц назад

    I like it, really, love it! But... I don't see what's wrong with other illustrations and peculiarly I think yours just iterates what they already clearly illustrate. I was even expecting CNN representations in XYZ visuals. Am I missing some points here? Honest question, would appreciate any enlightenment! (btw, thank you for sharing the world with your own version of splendid animation!) PS: If you're up for the challenge, do Spiking NN, I'll buy you a beer in Bali!

  • @mayapony
    @mayapony Месяц назад

    thx!!

  • @afrolichesmain777
    @afrolichesmain777 Месяц назад

    Its funny you mention that the number of kernels is the least exciting part, my thesis was an attempt on finding a systematic way to reduce the number kernels by correlating them and discarding kernels that “extract roughly the same features”. Great video!

  • @alexvillalobos8245
    @alexvillalobos8245 2 месяца назад

    jiff

  • @nikilragav
    @nikilragav 2 месяца назад

    The reason the filter at 3:00 being 2D gets glossed over is because most image signal processing is taught in grayscale

  • @nikilragav
    @nikilragav 2 месяца назад

    2:13 how does it stay at the same size? Padding the edges of the original image?

  • @nikilragav
    @nikilragav 2 месяца назад

    What actually is the 3rd dimension in this context for the source giant cube? Is that multiple colors? A batch of multiple images?

  • @sensitive_machine
    @sensitive_machine 2 месяца назад

    Lol grayscale is a real thing still! Medical and microscopy imaging

  • @sensitive_machine
    @sensitive_machine 2 месяца назад

    this is awesome and is inspiring me to learn blender!

  • @hieuluc8888
    @hieuluc8888 2 месяца назад

    0:16 If filters are stored in a 4-dimensional tensor and one of them represents the number of filters, then what does the depth represent?

  • @FrigoCoder
    @FrigoCoder 2 месяца назад

    I have some hobbyist signal processing experience of a few decades, and these new methods seem so amateurish compared to what we had in the past. FFT, FHT, DCT, MDCT, FIR filters, IIR filters, FIR design based on frequency response, edge adapted filters (so no need for smaller outputs), filter banks, biorthogonal filter banks, window functions, wavelets, wavelet transforms, laplacian pyramids, curvelets, counterlets, non-separable wavelets, multiresolution analysis, compressive sensing, sparse reconstruction, SIFT, SURF, BRISK, FREAK, yadda yadda. Yes we even had even length filters, and different filters for analysis than for synthesis.

    • @equationalmc9862
      @equationalmc9862 Месяц назад

      There are Equivalents in AI Model Development and Inference for those though. Many of these signal processing techniques have analogs or are directly applicable in AI and machine learning: - **FFT, FHT, DCT, and MDCT:** Used in feature extraction and preprocessing steps for machine learning models, especially in audio and image processing. - **FIR and IIR Filters:** Used in preprocessing steps to filter and clean data before feeding it into models. - **Wavelets and Wavelet Transforms:** Applied for feature extraction and data compression, useful in handling time-series data. - **Compressive Sensing and Sparse Reconstruction:** Important in developing models that can work with limited data and in reducing the dimensionality of data. - **SIFT, SURF, BRISK, and FREAK:** Feature detection and description techniques that are foundational in computer vision tasks like object recognition and image matching. In AI, techniques like convolutional neural networks (CNNs) often use concepts from signal processing (like filtering and convolutions) to process data in a way that mimics these traditional methods. Signal processing principles help in designing more efficient algorithms and models, improving performance in tasks such as image recognition, speech processing, and time-series analysis.

  • @rafa_br34
    @rafa_br34 2 месяца назад

    Incridibly helpful, keep up the good work!

  • @commanderlake7997
    @commanderlake7997 2 месяца назад

    I'm confused because you make it look like an attention layer could be used as a drop-in replacement for a linear layer but GPT-4o says: "No, an attention layer cannot be used as a direct drop-in replacement for a linear layer due to the fundamental differences in their functionalities and operations."?

    • @animatedai
      @animatedai 2 месяца назад

      That’s correct that an attention layer is not functionally equivalent to a linear layer. This efficiency comes with its own trade-offs. But it’s going to make more sense to talk about those trade-offs a couple more videos down the line in this series, so I didn’t go over them in this video.

    • @commanderlake7997
      @commanderlake7997 2 месяца назад

      @@animatedai Thanks for clearing that up, also I ran some quick tests comparing the performance of a pytorch MultiheadAttention layer with a Linear layer and the linear layer is significantly faster on CPU and GPU in every test i can run so i hope that's something you could clarify in a future video. Looking forward to the next one!

  • @hieuluc8888
    @hieuluc8888 2 месяца назад

    Sir, Thanks for doing god's work!!! I wonder why this channel has so few viewers; it deserves to be known by more people. Deep Learning is much simpler if learned from this guy. Honestly, I truly admire you for taking the time to research and visualize something so complex, making it easy for everyone to understand.

  • @rafa_br34
    @rafa_br34 2 месяца назад

    This is so unfairly underrated, I have never seen such a good video about CNNs.

  • @jameshopkins3541
    @jameshopkins3541 2 месяца назад

    Which is correct?????

  • @honourable8816
    @honourable8816 2 месяца назад

    Stride value was 2 pixel

  • @architech5940
    @architech5940 2 месяца назад

    You did not introduce convolution in any informative way, nor define any terms for your argument, and you didn't explain the purpose of 3D convolution or why 2D convolution is inaccurate in the first place. There is also no closing argument for what appears to be your proposition for the proper illustration of CNN. This whole video is completely open ended and thus ambiguous.

  • @AdmMusicc
    @AdmMusicc 3 месяца назад

    Loved the animation thank you!!

  • @martinhladis1941
    @martinhladis1941 3 месяца назад

    Excelent!!

  • @kage-sl8rz
    @kage-sl8rz 3 месяца назад

    cool even better add names to the objects like kernel etc would be helpful to new people

  • @edsparr2798
    @edsparr2798 3 месяца назад

    I adore your content, genuinely can’t wait for more videos of your visualizations. Feels like I’m building real intuition about what I’m doing watching you :)

  • @trololollolololololl
    @trololollolololololl 3 месяца назад

    keep it up great videos

  • @rubiczhang5593
    @rubiczhang5593 3 месяца назад

    That's a really good job, you have save me from struggling with the AI. Thank you from China

  • @zukofire6424
    @zukofire6424 3 месяца назад

    Thanks! great explanation :)

  • @____-gy5mq
    @____-gy5mq 3 месяца назад

    best generalization ever, covers all the corner cases

  • @bengodw
    @bengodw 3 месяца назад

    Hi Animated AI, thanks for your great video. I have below question: 4:45 indicated the color of filters (i.e. red, yellow, green, blue) represent the "Features". A filter (e.g. the red one) itself in 3-dimension (Height, Width, Feature) also include "Feature". Thus, the "Feature" appear twice. Please could you advise why we need "Feature" twice?

  • @danielprovder
    @danielprovder 3 месяца назад

    udiprod for ai

  • @yousrakateb2383
    @yousrakateb2383 3 месяца назад

    Please continue make such amzing videos....they really helped me

  • @PurnenduPrabhat
    @PurnenduPrabhat 3 месяца назад

    Good job

  • @happyTonakai
    @happyTonakai 3 месяца назад

    This is so great!

  • @mdnaseif7599
    @mdnaseif7599 3 месяца назад

    You are a legend keep it up!

  • @leonardommarques
    @leonardommarques 3 месяца назад

    You got a new subscriber. You are 3b1b of AI. Thanks for existing.

  • @user-el1hd3iz6m
    @user-el1hd3iz6m 3 месяца назад

    I felt like a lot of work has been put into making the animations for the series and I should leave learning something, but somehow I am left more confused after watching the entire series than before I started. Not sure if this is due to a need to visualize something that cannot be represented in 3D space, knowledge gap created due to assumptions used during the explanation process, or I am simply too stupid.

  • @macewindont9922
    @macewindont9922 3 месяца назад

    sick

  • @Karmush21
    @Karmush21 4 месяца назад

    Maybe someone can help me understand this. If I have just one 3d volume. Would ever make sense to do a 3D convolution in say PyTorch? Because doing a 2D convolution will work all the slices right? So say that I have a volume that's 300,300,100. Should I just move the slice dimension to the channel dimension and apply a 2D convolution? What would a 3D convolution even do here?

  • @wilfredomartel7781
    @wilfredomartel7781 4 месяца назад

    😊

  • @__-de6he
    @__-de6he 4 месяца назад

    It would be good to know the rational behind such way of calculation besides computational efficiency.

  • @coryfan5872
    @coryfan5872 4 месяца назад

    Saying that Multihead Attention has less parameter than a token-wise linear is true for NLP models but not true for ViT. Additionally, simply creating a mechanism which incorporates the entirety of the features does not explain away the success of attention mechanisms -- looking again at computer vision tasks, MLP Mixer also incorporates the entirety of the features in its computations, but is still less successful than the attention based ViTs. One part of the strength of the attention layer is its adaptability -- which you can see the value of in things like GAT. Otherwise, it could just be replaced with a generic low-rank linear.

  • @jaredtweed7826
    @jaredtweed7826 4 месяца назад

    I have been waiting for this video! Very much worth the wait!