The fastest matrix multiplication algorithm

Поделиться
HTML-код
  • Опубликовано: 14 май 2024
  • Keep exploring at ► brilliant.org/TreforBazett. Get started for free, and hurry-the first 200 people get 20% off an annual premium subscription.
    0:00 Multiplying Matrices the standard way
    2:05 The Strassen Method for 2x2 Matrices
    3:52 Large matrices via induction
    7:25 The history and the future
    10:19 brilliant.org/TreforBazett
    In this video we explore how to multiply very large matrices as computationally efficiently as possible. Then standard algorithm from linear algebra results in n^3 multiplications to multiply nxn matrices. But can we do better? The Strassen algorithm improved this to about n^2.8, first in the 2x2 case and then we can prove via induction it works in general. This improvement has seen a range of improvements over the last 50 years inching closer - but still far away - to the theoretically limit of n^2.
    Further Reading:
    Going into the details of the laser method (what happened after the Strassen algorithm I showed): theoryofcomputing.org/article...
    The AlphaTensor AI paper: www.nature.com/articles/s4158...
    A nice summary from Quanta: www.quantamagazine.org/mathem...
    Check out my MATH MERCH line in collaboration with Beautiful Equations
    ►beautifulequations.net/pages/...
    COURSE PLAYLISTS:
    ►DISCRETE MATH: • Discrete Math (Full Co...
    ►LINEAR ALGEBRA: • Linear Algebra (Full C...
    ►CALCULUS I: • Calculus I (Limits, De...
    ► CALCULUS II: • Calculus II (Integrati...
    ►MULTIVARIABLE CALCULUS (Calc III): • Calculus III: Multivar...
    ►VECTOR CALCULUS (Calc IV) • Calculus IV: Vector Ca...
    ►DIFFERENTIAL EQUATIONS: • Ordinary Differential ...
    ►LAPLACE TRANSFORM: • Laplace Transforms and...
    ►GAME THEORY: • Game Theory
    OTHER PLAYLISTS:
    ► Learning Math Series
    • 5 Tips To Make Math Pr...
    ►Cool Math Series:
    • Cool Math Series
    BECOME A MEMBER:
    ►Join: / @drtrefor
    MATH BOOKS I LOVE (affilliate link):
    ► www.amazon.com/shop/treforbazett
    SOCIALS:
    ►Twitter (math based): / treforbazett
    ►Instagram (photography based): / treforphotography

Комментарии • 225

  • @planaritytheory
    @planaritytheory Год назад +612

    It's important to note that, although additions _are_ considered cheaper than multiplications, what we _aren't_ doing is making a tradeoff between the two operations (accepting many more additions to save a few multiplications). You didn't explicitly say that's what's going on, but someone could get the impression from this video that that's the deal---I myself used to think that. Instead, the number of additions, multiplications, and total number of operations are all lower for Strassen's algorithm, and it's because the number of "multiplications" at each step is counting the number of recursive calls that are made.

    • @DrTrefor
      @DrTrefor  Год назад +158

      Thanks for the clarification. Indeed, I left the induction proof for additions as an "exercise" so this got glossed over.

    • @samueljehanno
      @samueljehanno Год назад +1

      69th

    • @gregorymorse8423
      @gregorymorse8423 Год назад +4

      Considering that addition is O(n) and multiplication is O(n log n) in best case, given a 2s complement number system, treating multiplication and addition as like operations is completely incorrect especially as the values of the matrix grow larger. So it's not just the recursion that is relevant.

    • @destiny_02
      @destiny_02 Год назад +5

      If I'm not wrong, multiplication is cheaper than addition in case of floating point number on a modern CPU.

    • @gregorymorse8423
      @gregorymorse8423 Год назад +5

      @Jotaro Kujo just because it is simpler to implement floating point multiplication doesn't make it cheaper. It's not faster even if addition requires resources to shift the smaller argument and the result. There is still an integer multiply or add in the middle. My point was only for unbounded big integers as well. Treating addition and multiplication as like ops when using a finite bit width like 32 or 64 be they integers or floats does make sense. Even if their clock cycles are slightly different. The story just gets more complicated gived a fused multiply add (FMA) is used in modern hardware to improve accuracy and speed. And applies to complex number of matrix multiplication very specifically in fact. But unbounded data types mist treat addition and multiplication separately.

  • @Mattes_H
    @Mattes_H Год назад +282

    I think it should be noted that all of these faster algorithms are so called galactic algorithms, meaning the constant that is hidden by the phrase "order of" is so big they become useless for any practical application.
    In practice the challange is more about efficent and parallel implemetation. In fact GEMM the subroutine most often used for matrix-matrix-multiplication usally implements either the naive or Strassen-based algorithm but uses all the hardware tricks available on modern processors to get their high perforamce.

    • @maxrush206
      @maxrush206 Год назад +1

      nice icon lol

    • @vaff69420
      @vaff69420 Год назад

      I don't exactly understand your comment. Why aren't library developers implementing these algorithms if they are simply faster than the native or other slower algorithms?

    • @pablote325
      @pablote325 Год назад +29

      @@vaff69420 Because of the constants.. take for example the last algorithm, this one does O(n^2.37285) multiplications. What this means is that the number of multiplications is exactly gonna be:
      Mult = A* n^2.37285 + B , for some A and B, usually unknown.
      thats what the big O notations means...
      So like the previous comment said, this weird galactic algorithms always have HUGE constants, so is not even practical to use it to multiply a 64x64 matrix (i'm just putting numbers here, but that's usually what happens, the real benefits of using these algorithms is when n is literally an astronomical number xd)

    • @porschepanamera92
      @porschepanamera92 Год назад +2

      @@pablote325 ... and in that case it might be much faster but still extremely slow using current hardware?

    • @luckyw4ss4bi
      @luckyw4ss4bi Год назад +30

      @@porschepanamera92 I think the idea he's hinting at in general here is that the size of matrices you'd have to be multiplying to gain a performance boost from the current fastest algorithm is too large for modern computers to handle, anyway.

  • @michaelsaba6769
    @michaelsaba6769 Год назад +37

    Just read about this algorithm in CLRS. It's a perfect teaching tool to show why constant factors might not matter in asymptotic notations, but how they often do matter when working with recurrences. Like in this case, where 7 multiplications are preffered over 8, even if we're only multiplying the runtime by a constant factor. Strassen's yields a Theta(n^log(7)) time complexity, while the standard divide and conquer approach yields Theta(n^3) time.

  • @magicflour
    @magicflour Год назад +38

    I had a class where a whole third of it was dedicated to these linear algebra algorithms. The most I learned really was "optimize the hardware and optimize with hardware" to get good results lol. This was a computer engineering class so it made sense.

    • @DrTrefor
      @DrTrefor  Год назад +13

      Ha, yes that has been my sense too even as mathematicians have fun with asymptotic limits:D

    • @marcotroster8247
      @marcotroster8247 9 месяцев назад

      Finally someone said it so that I don't need to comment it myself 😂

  • @cbunix23
    @cbunix23 Год назад +10

    I wrote a cache friendly matrix multiplication algorithm that improves performance by 40% when matrices are very large; basically, the multiplications and additions are the same, but the order is done in a way that respects cache memory.

  • @frfr1022
    @frfr1022 Год назад +68

    It reminds me of this algorithm for natural numbers multiplication, which is theoretically a lot faster than any other algorithm we know of, but gets worth using only at ridiculously huge numbers. I can't remember how its called, but i think it was mentioned in 3b1b's latest video on convolutions

    • @DrTrefor
      @DrTrefor  Год назад +24

      If I recall correctly, some of the ideas of large number multiplications are related to ideas from this very Strassen algorithm.

    • @adamel-sawaf4045
      @adamel-sawaf4045 Год назад +7

      @@DrTrefor Strassen also worked on a multiplication algorithm for numbers (the Schönhage-Strassen algorithm), so that would make sense!

    • @oyaoya2468
      @oyaoya2468 Год назад +3

      Karatsuba algorithm, is this what you mean? Because it uitlizes the same technique as Strassen algorithm

  • @eti-iniER
    @eti-iniER Год назад +24

    As a first year CS student, I'm completely blown away by the fact that humans know and understand concepts like this. Amazing video, btw
    Cue the mild depression

  • @discreet_boson
    @discreet_boson Год назад +42

    Please make more computational math videos like this!
    Programming documentation never explains the math like you do

  • @alphalunamare
    @alphalunamare Год назад +13

    Fascinating! It brings back old memories. In 1977 I was mulling over the problem and managed, over a few days, to create an algorithm for 3x3 multiplications requiring 24 multiplications and 66 additions only. It reduced to strassen's method in the 2x2 case but was no improvement as that would have required 21 multiplications. I did this longhand over yards of line printer paper. Unfortunately a Canadian published an algorithm with 23 multiplications and 96 additions. it pissed me off no end, especially as it had no information regarding its derivation. (If I could trade 1 multiplication for 29 additions I could improve but I left it on the shelf) Sulking I guess. Over the years I wondered if it was just a novelty, then I realised that in an object oriented sense that the savings are manifestly more than for just simple arithmetic multiplications. An intriguing thought is whether the intermediate Pn multiplications of 'objects' have a real meaning in the physics of the model under analysis that perhaps could shed some light? Perhaps in the Geometric Algebraic treatment of Spacetime?

  • @kruksog
    @kruksog Год назад +34

    I remember learning the Karatsuba algorithm for multiplication. It is also computationally cheaper than the standard way of multiplying numbers. It had a very similar feel to this.
    Cool stuff. Thanks!

    • @adamel-sawaf4045
      @adamel-sawaf4045 Год назад +7

      This also reminded me of Karatsuba! I learned it in my algorithms class. I implemented it in a C++ program for extra credit.

  • @wcdeich4
    @wcdeich4 Год назад +7

    Interesting! I previously knew some algorithms could do matrix multiplication faster for large sparse matrices, but those algorithms are actually slower for large dense matrices.

  • @L4ZY2137
    @L4ZY2137 Год назад +5

    There is a thing calleg "Hierarchical matrix", that allows to change a matrix to a more "compact" form that allows to compute many operations like inversion or multiplication in almost O(n) time. If I remember correctly if we remove the constant (k) that controls approximation the complexity is about O(n * (logn) ^ 2)

  • @mohammadhomsee8640
    @mohammadhomsee8640 Год назад +3

    That's a great videos, I really love to hear your lessons. Good job professor!

  • @adamel-sawaf4045
    @adamel-sawaf4045 Год назад +10

    Great vid! Just as informative as it needed to be, while sparking inspiration to learn more. One thing I think you should’ve mentioned was how much the space complexity grows for the trade-off in speed.

  • @airsquid8532
    @airsquid8532 Год назад +17

    The amount of power I wield is unrivaled with this knowledge, thank you for yet another interesting video full of passion!!

  • @ddognine
    @ddognine Год назад +3

    Interesting video of an area of active research of which I was unaware. I would be interested in a similar video covering the state of the art in matrix inversion which I believe is much more problematic for large matrices than multiplication and is typically the first operation before any multiplication.

  • @rezamiau
    @rezamiau Год назад +3

    Absolutely brilliant!
    One of the major problems that engineers like me are dealing with is the time consuming eigenvalues calculations, because of increasing the precession, due to Ill-Conditioning of a matrix. Please let us know if there is a good solution to it.
    Thank you so much.

  • @prashantsingh6370
    @prashantsingh6370 Год назад +17

    As someone who absolutely loved solving matrices & determinants problems in higher secondary mathematics, way back in 2011-12, a huge thumbs up for the excellent video. 👍

    • @DrTrefor
      @DrTrefor  Год назад +3

      Thanks for sharing!

    • @freshrockpapa-e7799
      @freshrockpapa-e7799 5 месяцев назад

      enjoying computations doesn't make you special in any way, there was no reason to mention that

  • @cmilkau
    @cmilkau Год назад +6

    One important question with this algorithm: how many additions does the recursive algorithm use? Because in the end, we want a faster algorithm, not just a reduction in elementary multiplications. It's also a bit counterintuitive that this strategy *reduces* the total number of additions asymptotically.

  • @guigazalu
    @guigazalu Год назад +2

    While watching, I asked myself if you would cite the AlphaTensor article and results, as I read it in the recent months.
    It turned out you did! It is a very cool article, even though it's a "heavy reading" for me.

  • @gooball2005
    @gooball2005 Год назад +2

    I do enjoy a good video on complexity theory! Also, your thumbnail game is on point!

  • @quarkonium3795
    @quarkonium3795 Год назад +5

    One thing to keep in mind is that most of these improvements from the Strassen algorithm are theoretical. Most only improve on the Strassen algorithm for extremely large matrices-so large that these algorithms will probably never be practical

  • @christossymA3A2
    @christossymA3A2 Год назад +4

    Man it feels almost illegal such great knowlegde being available for free

  • @noot3316
    @noot3316 4 месяца назад +1

    I just finished my first term of Engineering at UVic (you were an amazing calc prof by the way!) and I can FINALLY understand how cool this video is!

    • @DrTrefor
      @DrTrefor  4 месяца назад +1

      Haha that’s awesome, thank you!

  • @lebdesmath2510
    @lebdesmath2510 Год назад +5

    nice vid, keep up the good work

  • @taotaotan5671
    @taotaotan5671 Год назад +2

    Very very interesting topic! Thanks for sharing this, Trefor!
    Wondering if the algorithm applies to a more general case of matrix multiplication, e.g. (m x n) * (n x p)
    Another thing I found interesting is, after some research, matrix inversion and matrix multiplication have the same time complexity. The optimization method you mentioned in this video also applies to matrix inversion. That contradicts my impression that matrix inversion causes more headaches than multiplication.

    • @DrTrefor
      @DrTrefor  Год назад +4

      For non-square matrices you can just imagine they are square with extra zeros so the same upper bound applies here.

  • @kristas-not
    @kristas-not Год назад +1

    ok, you officially kick ass in my book.

  • @garrettkajmowicz
    @garrettkajmowicz Год назад

    One area I found missing from your video was in showing that using block matrices got us the correct result. That is, that treating a 4x4 matrix as a 2x2 matrix of 2x2 matrices would, when multiplied, get us the same result as we would otherwise expect.

  • @dvir-ross
    @dvir-ross Год назад +5

    According to wikipedia, there is already an improvment from 2022.

    • @DrTrefor
      @DrTrefor  Год назад +5

      Oh nice! Good to be obsolete already!

  • @gzhang207
    @gzhang207 Год назад

    One specialization of matrix multiplication is the matrix multiplication with a vector (or filter) that is widely used in convolution based ML. Winograd convolution seems to follow the same optimization idea.

  • @yutubl
    @yutubl Год назад

    Thanks for this very interesting information.

  • @streampunksheep
    @streampunksheep Год назад +1

    ok this is cool. it makes me want to study matrices again

  • @MrHaggyy
    @MrHaggyy Год назад +2

    A huge deal in practical matrix multiplication is taking the naive approach: look for redundant operations, look for dependencies in computation, group up independent operations and parallelize them.
    A 3x3 matrix on a GPU would schedule every single one of that 27 multiplies on one 32 core chunk. Then you would copy the data in another 32 core chunk doing the addition. You could also do the addition of three matrix multiply in one execution of 32 additions.
    Thats the reason why Strasson is such a big deal. 8 multiply and 4 additions fit beautifully in 32 core chunks. And once you have setup your execution pipeline you can feed it any matrix. If a rank is odd the CPU will make it even by adding zeros befor passing it to the GPU.

  • @dr.rahulgupta7573
    @dr.rahulgupta7573 Год назад +1

    Excellent presentation 👌

  • @changjeffreysinto3872
    @changjeffreysinto3872 6 месяцев назад

    Solution to 7:17
    Let the number of addition to the matrix product of 2^n*2^n sized matrixes be f(n). Since we calculate the multiplication of 7 smaller matrixes of size 2^(n-1)*2^(n-1), we have 7f(n-1) operations. Also, we calculate the sum and differences of these 2^(n-1)*2^(n-1) matrixes for 18 times in total, amounting to 18*(2*2^(n-1)) times overall. Therefore, the recursive relation is f(n)=7f(n-1)+(9/2) 4^n, which is a first order linear DE. We now substitute f(n)=7^n * f_n, obtaining f_n = f_(n-1) + (9/2) (4/7)^n. We have f_n = 9/2* sum 1-> n (4/7)^n, which is less than 9/2 * sum 1-> inf (4/7)^n = 21/2. Finally, f(n)=7^n * f_n

  • @U.Inferno
    @U.Inferno Год назад +2

    "This doesn't sound like a big improvement."
    Me, a computer scientist: "No. This does."

    • @GreatJO
      @GreatJO Год назад

      Is SIMD faster?

  • @johnniefujita
    @johnniefujita Год назад +3

    Single operations are not really what is expensive and increase algorithm complexity, the "O"s, actually complexity is direclty relates to nested computations. Usually we want a algorithm to be linear or linearithm. Polynomial can be tractable. But never exponential and factorial, those cannot scale. Also algorithms may vary worst, average and best case scenarios for each different implementations, and sometimes, like in the case of sorting algorithm, the best choice can vary based on how well organized the data already is.

  • @user-yk7kq3rf7z
    @user-yk7kq3rf7z 10 месяцев назад +2

    Congratulation for a 100pi subscribers!)

    • @DrTrefor
      @DrTrefor  10 месяцев назад +2

      Oh cool, thank you!

  • @gofaonepaul
    @gofaonepaul Год назад +1

    This is so cool. ❤

  • @jebarijihed
    @jebarijihed 11 месяцев назад

    This reminds a lot of the FFT and the DFT algorithems !

  • @Moinsdeuxcat
    @Moinsdeuxcat Год назад +1

    8:40 THIS HAS TO BECOME A SUMMONING SALT VIDEO

  • @flexeos
    @flexeos Год назад +4

    I am not sure that the pure optimization of operations is very relevant. The limiting factor is often moving the data in and out of memory. What you have to optimize is how may operations you can do for each load.

    • @Miguel_Noether
      @Miguel_Noether Год назад +1

      Same amount of memory and speed of allocation, doing faster operations is going to be the most efficient way to optimize everything else (that's all the hype of quantum computing)

  • @Oscar1618033
    @Oscar1618033 Год назад +1

    I have the feeling that such algorithms might involve some kind of matrix log and fourier transforms. Could it be?

  • @DiabloQFDB
    @DiabloQFDB Год назад

    I need to research and see if these methods can be implemented with AVX.

  • @pyropulseIXXI
    @pyropulseIXXI Год назад +1

    I know the *proof by induction* works absolutely, but the 'step' of _finding_ the formula in question is often skipped. I know it is not strictly necessary for the 'formal proof,' but it is still very nice to get this infos via intuitive 'derivations.'

  • @D_VAULTZ
    @D_VAULTZ Год назад +2

    does this only apply to large ints? Im wondering if there is an improvement available in much smaller values, such as 8bit integers.

    • @jonatanlind5408
      @jonatanlind5408 Год назад +2

      These algorithms only tend to make sense for large ints. The algorithms presented in this video were developed to speed up computation, your 8bit example can already be calculated increadible quickly and even if you have a large amount of calculations it is possible to calculate every single 8bit multiply possible and simply look up the answer when you need it. I am unaware of any algorithms that can beat such a method.

  • @Dawg1561
    @Dawg1561 Год назад +1

    Didn't knew Dr. Strange left sorcery and started teaching mathematics. Btw, This was a great explanation

  • @jimwinchester339
    @jimwinchester339 Год назад

    Interesting! I was class of 1980, so I learned only of the Strassen algorithm.

  • @swagatochatterjee7104
    @swagatochatterjee7104 Год назад

    Hey the Alpha Tensor record in 9:50 has been beaten by two researchers at the university of Linz who showed that you can multiply 5*5 matrices in 95 multiplications instead of 96 multiplications which the paper showed.

  • @realdragon
    @realdragon Год назад

    Speaking from hardwere perspective if possible try to avoid jumping between columns. Even 2D array is very long row so jumping between columns means jumping through huge parts of array while calling next number is this row is very fast

  • @Yupppi
    @Yupppi 6 месяцев назад

    Since this is specifically computational related, it might be worth talking about not only the algorithm for calculating sub matrices, but cache memory (its layout and use) and how transposing the matrix gives better performance due to the elements being laid out in the memory more efficiently and also how the relevant ones stay in the cache over the time you need to use them, which avoids cache misses. Furthermore copying the matrices (to local buffer) interestingly enough helps as well. And lastly if you split it to multiple threads. Which leads to performance doing such a leap that you can for example go from 2000 ms to 20 ms execution time. I feel like n^3 and n^2,4 aren't quite as concrete demonstrations compared to that. But of course Intel has a Basic Linear Algebra Subprogram implementation that does it in roughly 2 ms. These are in my understanding matrices the size of 1000.

  • @notgabby604
    @notgabby604 Год назад

    The Walsh Hadamard Transform is kind of the fastest dense matrix multiply you can do.
    It's a fixed thing but you can somewhat unfix it with cheap operations like prior sign flipping of the input information or prior permutations etc.
    The number of multiples required then is zero.

  • @aliuncu2815
    @aliuncu2815 Год назад +1

    You forgot to mention that not a week after the AlphaTensor group's paper, Kauers and Moosbauer improved their results on 5x5 matrices.

  • @ericphantri96734
    @ericphantri96734 Год назад

    Use divider into 2×2 or 3×3 the do all the way to use the grid point inverter and addition to perform by electronic across the board mean reducing to the determinant of each small matrix then move them together into smaller matrix with all the determinant and repeat until final determinant each is only single number then multiple ply out. And find the angle between the two number by trigonometric value then we had final vector so if n = 50 numbers each matrix then use the 2× 2 because even so far total operations about 50×49 estimate 2450 calculation by electronic of 20 mhz then it take 0.00000085 second and if use slower CPU like 1 mhz then it take 0.005 second for 50×50

  • @wilurbean
    @wilurbean Год назад +1

    I feel like I've made it in my physics education when that slightly extended decimal exponential got me excited

    • @DrTrefor
      @DrTrefor  Год назад +2

      lol

    • @wilurbean
      @wilurbean Год назад

      @@DrTrefor fr tho
      I went back to school at 30. Started at a tiny community College in rural northern Michigan a few and got accepted to a top tier engineering physics program (#1 in the US in my particular sub-field).
      My current physics classes are Modern Physics and lab where we're doing a lot of modeling/data analysis with big matrices. And, E&M based around Griffiths Electrodynamics.
      Wouldn't have passed Vector Calc or Linear Algebra without your videos. It's coming full circle now with Modern and Linear Algebra and EM with the vector Calc.
      Ty!

  • @rumasc9854
    @rumasc9854 Год назад

    dude, I was waiting for the cool dance that the thumbnail portrayed was about to happen

  • @codethinki
    @codethinki Год назад

    title: fastest matrix multiplication algorithm. Video explains: The first improvement of matrix multiplication 😆. Oh and on top of clickbait a fat sponsorship. I love quality content

  • @jcsjcs2
    @jcsjcs2 Год назад

    The reason why you don't care about additions is not because additions are "much faster" than multiplications. An FPU will usually take the same amount of time for either and will probably even feature combined "multiply and accumulate" functions.
    The reason is that numbers of operations is of the order n^3. Therefore, adding both together is still of the order n^3.

  • @deeveshjain4246
    @deeveshjain4246 Месяц назад

    is this the same thing as batch matrix multiplication?

  • @pierreabbat6157
    @pierreabbat6157 Год назад

    What happens to the accuracy of computation with these matrix multiplication algorithms? I use the standard n³ algorithm, with pairwise addition for accuracy.

    • @calvindang7291
      @calvindang7291 Год назад +2

      It's assumed that you want an exact result (and so won't get hit by rounding errors) when doing this - if you only need approximate answers and so are using an approximate data type, that's something else entirely.

  • @dickybannister5192
    @dickybannister5192 Год назад

    a nice work-through. for some heavy lifting, there is a nice new video from IAS called "On Matrix Multiplication and Polynomial Identity Testing" on here. I didn't get all of it, but I'm fairly sure there are some nice concepts in there which could be explained: the uses for MM are various after all. but PIT always fascinates me as a subject.

  • @lunalildragon241
    @lunalildragon241 Год назад

    Well calculations in itself don't really need a long time, the biggest problem is the memory read time that you have if you read and write more data than the cpu can store.
    Meaning if you are calculating that huge operations you would have to care about spliting the data into chunks that fit in that area without having to reread.

  • @lucavogels
    @lucavogels Год назад

    Can you also explain the Alman, Willimas Algorithm please! 🤓

  • @jaimeduncan6167
    @jaimeduncan6167 Год назад +1

    If one is working with not so big numbers (in terms of number of relevant digits ) , like 64 bit b
    floating point numbers modern computers will produces multiplications in the same time of addition for this kind of problems (fully pipelined) in fact many will produce Ax + y 1 per cycle per pipeline. That means That additions can’t be ignored.

    • @ultimatedude5686
      @ultimatedude5686 Год назад +1

      I think the reason reducing multiplications is so important is that multiplying matrices is way more computationally expensive than adding them. That means when you’re working with block matrices, you want the minimum number of multiplications of the components; extra additions are fine and won’t really affect the asymptotic complexity.

  • @rob876
    @rob876 6 месяцев назад

    Here's a better version of Strassen's algorithm due to Winograd for A.B = C:
    a = A21 + A22
    b = a - A11
    c = B12 - B11
    d = B22 - c
    s = A11.B11
    t = a.c
    u = s + b.d
    v = u + (A11 - A21).(B22 - B12)
    C11 = s + A12.B21
    C12 = u + t + (A12 - b).B22
    C21 = v - A22.(d - B21)
    C22 = v + t
    This has 7 multiplications and 15 additions/subtractions.

  • @soonts
    @soonts Год назад

    Modern computers don't necessarily do additions much faster than multiplications. A computer I use to watch your video has AMD Zen 3 CPU. For 32-bit floating point numbers, on my computer the performance is absolutely equal. Specifically, both vaddps (adds 8 numbers) and vmulps (multiplies 8 numbers) AVX instructions have 3 cycles latency, and 0.5 cycles throughput.

  • @frankyin8509
    @frankyin8509 Год назад

    Love his T-shirt, 👕 topology meme🎉

  • @ivanxdxd
    @ivanxdxd Год назад +2

    which algorithm is implemented on MatLab?

    • @DrTrefor
      @DrTrefor  Год назад +3

      The normal one by default I would presume. It takes truly gigantic matrices for this kind of asymptotic difference to make a measurable advantage.

  • @Petch85
    @Petch85 Год назад

    What if you have a matrix x vector, are there similar tricks?

  • @David-gu8hv
    @David-gu8hv Год назад

    No wonder donuts and coffee go so well together...

  • @jimjacobs2817
    @jimjacobs2817 Год назад

    Is there a quicker way to calculate a 4 x 4 matrix? That is, there a similar approach to increasing the speed of 2 x 2 matrices?

  • @dougpark1025
    @dougpark1025 Год назад

    An interesting set of algorithms. However, as it turns out getting the numbers from memory into the CPU is a bottleneck that is easy to forget. If you can arrange to get data to load into the cache of the processor in contiguous blocks there is a great deal of performance gain to be had over just pulling the numbers from memory without any consideration. This was a technique used decades ago to do matrix multiplies efficiently when computer memory wasn't large enough to hold all of the matrices. Blocks would be loaded from disk, multiplied and then blocks written back out to disk. Same idea, but memory is a lot faster than writing to disk. Today we usually don't run into anything that big, but the concept still works.
    Next step, enter the multiple core processor. You can take the same idea and subdivide the work of multiplication into parts that run independently on separate cores. In the end you can approach an N times speed up where N is number of cores. (Subject to Amdahl's law of course). The nice thing about the matrix multiply is it falls into the category of being embarrassingly parallel. Which is to say, it doesn't require much in the way synchronization constructs.
    I have not tried any of these faster algorithms in a parallel implementation. It might be interesting to benchmark that sort of implementation.

  • @cantkeepitin
    @cantkeepitin Год назад

    What about rounding errors?

  • @ankurraj6560
    @ankurraj6560 Год назад +3

    Can we somehow use the concept of a Matrix as a linear transformation of a space to simplify our Matrix multiplication problems!? 🤔🤔

    • @DrTrefor
      @DrTrefor  Год назад +4

      Well composition of linear transformation is just matrix multiplication so they are definitely related.

  • @jessstuart7495
    @jessstuart7495 Год назад +1

    Do modern CPU's detect when you are trying to multiply by zero and return a result (zero) faster than a multiplying two non-zero numbers?

    • @DrTrefor
      @DrTrefor  Год назад

      Yes I believe so, but likely depends on exact algorithm one uses.

    • @jonatanlind5408
      @jonatanlind5408 Год назад

      Yes speed of computation is dependent on the data itself. Floating point numbers actually have 5 distinct data representations they can assume each of which must be handled seperately. These are Zero, denormal, normal, infinity and NaN. I believe computations regarding denormal numbers usually are slower that normal or zero and significantly so.

  • @jimnewton4534
    @jimnewton4534 Год назад

    i'm surprised that if the number of rows is not a power of 2, then you can simply add rows and columns of zeros. I've always thought I needed adjoin rows and columns of zeros but 1's on the main diagonal.

  • @saitougin7210
    @saitougin7210 Год назад +1

    7:05 "using some log rules": starting from base change formula (log_b c) / (log_b a) = (ln c) / (ln a)
    (log_b c) * (ln a) = (log_b a) * (ln c)
    ln[a^(log_b c)] = ln[c^(log_b a)]
    a^(log_b c) = c^(log_b a) q.e.d.
    You're welcome. Hm, maybe someone should add this formula on Wikipedia or something. It might come in handy, if one just knew it.

  • @jawadshinwari7365
    @jawadshinwari7365 Год назад

    hi sir! plz sir this algorithm is explined is in example..🙏🙏🙏

  • @abdulshabazz8597
    @abdulshabazz8597 Месяц назад

    I would think the computational lower bound is less than O(N^2) if any digit repeats in more than one quadrant...

  • @joshualickey3426
    @joshualickey3426 Год назад

    I think i found a mistake at 2:44. In the bottom lower right it might be p1-p2+p3+p6.

  • @telotawa
    @telotawa Год назад

    9:09 but you can do integer multiplication in n log n tho

  • @iwersonsch5131
    @iwersonsch5131 Год назад +3

    At 5:32, if I look at matrices of size n=2k+1 for k approaching infinity, the complexity with this naive extension approaches (2n-1)^log2(7) = (n-0.5)^log2(49) > (n-1)^5, doesn't it? Of course there are ways to reconcile this but this argument appears to be insufficient for the claim

  • @chxyike
    @chxyike Год назад

    you can improve from n^3 to n^2.807 which is not significant, but the complexity increases a lot. The gain is small with this algorithm, to my opinion.

    • @MrHamof
      @MrHamof Год назад

      Say you have a matrix that's 100x100. With N^3, you need 1 000 000 multiplications to solve it. With n^2.807 you need 411 150 multiplications, and with the current record holder of 2.3728596 we're down to 55 683 multiplications. I don't know enough to speak to real world performance difference, but it definitely has a huge impact on the number of multiplications needed for large matrix multiplication.
      Depending on how much harder implementation is, I would agree that it's probably not worth it for smaller matrices, like most of these the difference between O(n^3). O(n^2) and O(n) isn't that big if n is a small number.

    • @chxyike
      @chxyike Год назад

      ​@@MrHamofyou are right. It makes sense for large matrix.

  • @artus9022
    @artus9022 Год назад

    But what about SIMD? Isnt this n²? How would the algorithms perform in this case?

  • @gnomeba12
    @gnomeba12 Год назад +1

    Ok now show us the best algorithm for SVD!

    • @DrTrefor
      @DrTrefor  Год назад +1

      ha actually I've been wanting to do a SVD video

  • @holyshit922
    @holyshit922 Год назад

    Standard multiplication maybe is not the fastest but it is still not bad because it has polynomial growth
    Compare it with such algorithms like calculating determinant by cofactor expansion of by summing products over all permutations which have factorial growth
    If we want to calculate coefficients of characteristic polynomial by expanding determinant via minors to avoid symbolic computation we will have exponential growth
    2:16 and here I catch you we accelerate algorithm but with cost of memory but both RAM and disk space or paper and ink cost in dollars
    Was it worth , acceleration isnt significant you wont be able to accelerate algorithm to O(n^2)

  • @drwizard2875
    @drwizard2875 Год назад

    Any advancement in non commutative behavior of the matrix multiplication.

  • @alekthpariso8606
    @alekthpariso8606 Год назад +1

    still trying to figure out where i am gonna need this.

    • @DrTrefor
      @DrTrefor  Год назад +2

      haha probably never tbh:D

  • @cmilkau
    @cmilkau Год назад +6

    I don't like title cheating. I somehow expected this would present a state-of-the-art algorithm, not just explain Strassen and then continue with "there are better ones this fast".

    • @Miguel_Noether
      @Miguel_Noether Год назад

      It's the fastest, where is the cheating part?

    • @SimGunther
      @SimGunther Год назад +2

      @@Miguel_Noether An explanation on the latest Williams equation would've sufficed for this video, but he chooses not to elaborate on its mechanics whatsoever.

  • @Rin-qj7zt
    @Rin-qj7zt Год назад +1

    apparently the link for brilliant is still active cuz it gave me the discount lol

    • @DrTrefor
      @DrTrefor  Год назад +1

      Awesome! Hope you enjoy:)

  • @iwersonsch5131
    @iwersonsch5131 Год назад +2

    Asymptotically could mean "asymptotically if the number of trials with 3x3 matrices approaches infinity" but I think i get the idea

  • @seanpe8474
    @seanpe8474 Год назад

    came cause of the thumbnail, stayed for the math

  • @mathematicia
    @mathematicia Год назад

    This could be easier method but learning , memorising and use this in practica scenerios is really painfull . As long as calculations is concerned it could be somewhere usefull but when we do any type of analysis with metrices , it couldn't replace the ordinary multiplication.

  • @dmitrykazakov2829
    @dmitrykazakov2829 Год назад

    This misses the question of accuracy. Additions (and subtractions) suffer precision loss in floating point arithmetic. If you have accumulating additions in your algorithm you are risking catastrophic precision loss for some inputs. So shuffling multiplication into a bunch of additions might be not such a bight idea as it appears.
    In general floating-point operations are not associative, this all optimization methods based of reshuffling operations like this one must be done with great care.

  • @unlucky-777
    @unlucky-777 Год назад

    As a computer engineer, this didn't help me at all because my coworkers are still using the most basic methods in our codes

  • @farfa2937
    @farfa2937 Год назад

    If you hardcode a matix, then you can get the result in constant time, and you'll be right as long as you only try to multiply matrices that result in your chosen one.

    • @tissuepaper9962
      @tissuepaper9962 Год назад

      This is the well-known "divine intervention" algorithm which is also employed in "miracle sort" and the "birthday attack" on hashing algorithms.

  • @dmitryincog7455
    @dmitryincog7455 Год назад

    Is there a matrix multiplication library that uses this algorithms? And if not - what would that men?

  • @GradientAscent_
    @GradientAscent_ Год назад

    Can you show a proof of n^2 being the theoretical minimum?

    • @L2M2K2
      @L2M2K2 Год назад +1

      Depends on what you mean with the minimum. The trivial proof: There are n^2 unique elements to the input and they all are used at least once to compute the output. As such, one cannot possibly compute all n^2 output elements with less than n^2 operations in total (this would require a fixed number of operations for each input element to generate all n^2 output elements, a very tough task). Note, this does not say anything about the ability to get anywhere near this limit of n^2 operations. It is simply a strict lower limit one cannot possibly beat. (Improving the theoretical lower limits, as in a proof that it must be worse than here the trivial n^2 limit, is its own field of mathematics. And, once the theoretical lower limit and the best known algorithm meet, we complete a problem. Well, kind of, one can still improve the practical applications a lot. And, not all problems are even completable.)

  • @MrTyty527
    @MrTyty527 Год назад +1

    I can come up with an O(1) algorithm for this, but the result might be less accurate!

  • @Petch85
    @Petch85 Год назад

    What if I know most of the numbers in my matrix is just 0?
    Sparce matrix