For the people who have worked with assembly programming they will be really use to these. In the past CPUs did not have multiply and you often had a table of the fastest way to multiply a numbers. Which you guessed it was shifts (which is like a square) and addition
Johnny Ball covered this on Numberphile before - search "Russian multiplication". Let's call the two numbers A and B. With number A, we shift all the bits to the right once. The rightmost bit "falls out" of the number and typically gets shifted into a flag on most CPUs - let's say our CPU shifts the bit out of the right into the carry flag. So, the carry flag now contains the rightmost (least significant) bit of number A. If it's a one, then we add number B to our running total. If it's a zero, carry on (so you could code that as a "branch if carry not set" over a "add B to total" instruction). Then we shift B left one, which doubles it. Then we do it again. Shift A to the right. The rightmost bit falls out into the carry flag. If it's a one, then we add B (which we've just doubled, remember) to the running total. Shift B left one, doubling it again. Shift A to the right. If it's a one, then add B (now four times bigger than it originally was, as we're shifting it left every round) to the running total. You keep doing this until you've shifted every original bit of A out of the right side. Stop. You've multiplied the two numbers together and your answer is in the running total register. And all we did was shifting bits left and right, and simple addition. And there's only as many "rounds" of this as there are bits in A. Another bit of useful binary maths is that the result cannot be more than twice the number of bits in B. That is, if A and B are 8-bit numbers, the result register only needs to be 16-bits - as it's just not possible for two 8-bit numbers to multiply to more than a 16-bit number. Indeed, if you're implementing this algorithm, then stick B in a register with twice as many bits and have your "running total" register be twice as many bits. Then you can run the algorithm blindly. Which is, as you may have guessed, what the CPU's actually doing in the circuitry with a hardware multiply. It's just multiplying by 2 - which, in binary, is nothing more than shifting all the bits to the left once - and addition. So, yeah, it's basically the same algorithm as this video, but working in a higher order. So multiplication -> exponential. And multiplying by 2 -> squaring. And addition -> multiplication.
@@RegrinderAlert multipliers are simple enough to be just logic gates what I was talking about is "check if top 48 bits are 0, so we don't need to wait/use most of the circuit" that is too, simple enough to be just logic about compilers... idk how common the explicit "square" command is in processors
Note that for RSA and similar, the modular multiplication operation itself can be quite expensive, so modern implementations typically convert the numbers involved to an intermediate representation, called a Montgomery form, after Peter Montgomery. The binary exponentiation method can use Montgomery forms throughout, so only at the end is the result converted back to a conventional representation. Montgomery multiplication is also resistant to the side channel attacks mentioned at the end of the video.
I think the last bit of the video is facinating - that you could perform an attack to work out a key based on the CPU time to calculate a square vs a square & multipy. A great example of the theoretical mathematics being ideal vs. the real world implementation being fundamentally vulnerable.
Yes! And the technical term for it is a "timing attack". Timing attacks can be so insidious that you need to resort to assembly language just to get everything out of the way. Dr Pound's example has us doing an "always multiply", multiplying by one (the multiplication identity) after squaring for an unset bit in the exponent, so that we execute the algorithm in constant time. However, using a statement like [if (bit is zero) { multiplicand = one; } else { multiplicand = base; }] to do this "always multiplication" can end up in the *branch predictor's way.* If there are lots of zeroes in your key, it's going to take the "if" path more of the time; conversely, if there are lots of ones in your key, it's going to take the "else" path more of the time. Either way, the branch predictor will execute it faster than if the key were evenly-distributed ones and zeroes. To execute *that* in constant time, the CPU has to have a branchless test instruction to set the new multiplicand. Either you validate that the compiler uses the branchless instruction in your C code (say), or you write at least that part of the algorithm in assembly language. Edit: Or use the Montgomery form of the numbers, per David Gillies's comment, which makes it easier to have constant-time algorithms
In data centers that have servers that the government uses, the government requires that their servers are plugged into an air-gapped power supply, unconnected to the power that every other server uses, so that a spy couldn't surreptitiously measure changes in their power usage. These are surprisingly effective attacks.
@@nebuleon Multiply by base^binary. Thus if there is a 0, it will be multiplied by 1, and for 1 it's the ordinary multiply. As both x^n where n is 0 or 1 is easily calculated that should be same time (?).
Why not just simply add a random sleep at the end of the algorithm? The size of the sleep would be a question, but if it’s a random amount each time that is sufficiently large to mask any work being done (or lack there of), it would remove this timing attack issue. It’s certainly wasteful to simply sleep, but it’s also wasteful to do unnecessary calculations to remain in constant time, no?
Absolutely, but this specific piece of knowledge is pretty much common knowledge for anyone in CS. I studied this in college about 7 years ago, it was NOT a fun time since we had a pretty bad professor...
Interesting algorithm. At first I thought it were just going to be a simple, "first we build our list of binary equivalents and then just multiply them all together in the end." As an example, calculate 3^1, 3^2, 3^4, 3^8, 3^16, etc. And then choose the values our exponent actually contains. Then the slight of hands of mathematicians came in at 9:40 and made things far far simpler and much easier to execute in practice.
You might have noticed that Mike said that this was the left to right method. Yours (with taking the modulus) is the right to left variant and is just as quick.
I always liked calculating that recursively. For example, 2^6 is (2^3)*(2^3), 2^3 is (2^2)*(2^1) and 2^2 is (2^1)*(2^1). It's extremely simple to code it too. Here's the algorithm that performs the calculation: 1) If exponent is zero, return 1 2) Divide exponent by two, and save both the quotient and the remainder 3) Call algorithm recursively with (exponent = quotient) and save the result 4) If remainder is zero, return result*result 5) If remainder is one, return result*result*base
@@longlostwraith5106 you have log n layers, but these layer get more and more calculations each level. And they get an exponential growth until the log n barrier from the amount of layers
9:34 It's actually not the minimum number of operations. For example, to make 31 by this method takes 8 operations (SMSMSMSM), whereas the minimum is only 7 operations (N^2, N*(N^2), (N^3)^2, (N^6)^2, (N^12)^2, (N^6)*(N^24), N*(N^30)). However, in general finding the minimum number for a given exponent is NP-complete, so in practice square and multiply is presumably what you'd do. Otherwise, great video!
The NP-completeness is a common misconception: this is only proven for sets of numbers, not single numbers. In practice I believe a window method is used, for which one precomputes some values so one can "combine" some multiplications.
Efficient exponentiation is a very interesting topic. The minimum number of multiplications required for N is an open problem in mathematics, you can read more about that on OEIS A003313, "Length of shortest addition chain for n".
That example seems to use more space (you need to remember N^6 until after you finish (N^12)^2). May be the minimum operations in fixed space, where the space is exactly the size of the input string. Also important for cryptographic libraries which shouldn’t be allocating memory dynamically.
I read binary numbers left to right by starting at 1, doubling for each bit, and adding 1 if the bit was a 1. Very cool to see this pattern coming up in exponents
Also called russian peasant multiplication. It works for any power operation tbh, not only scalar multiplication. The power operator on matrix can for instance be used to compute large fibonacci numbers very quickly using the matrix 2x2 [1, 1, 1, 0]
binary to decimal: go left to right, start from 0 and double if it's a 0 and double and add one if it's a 1. e.g. for 101010 you do 0 -> 1 -> 2 -> 5 -> 10 -> 21 -> 42. so 101010 = 42. decimal to binary: halve your number rounding down until you reach 1, e.g. 42 -> 21 -> 10 -> 5 -> 2 -> 1. now go backwards through this sequence and put a 1 if it's odd and put a 0 if it's even. so 42 = 101010.
This is also called "exponentiation by squaring" and it's super useful in many cases. One quick example is in computing the nth Fibonacci number using the 2x2 matrix formula, where one raises a matrix to the nth power. But using this method, the number of multiplications is greatly reduced. There is also a closed form expression using a power of the golden ratio but that requires a lot of numerical precision for large n.
I did the 3^45 mod 7 in my head fairly simply. 3 and 7 are coprime, so you know that 3 will cycle through all 7 numbers. Then we can do 3^42 * 3^3 mod 7, which is just 1*3^3 or 27 mod 7 which is 6. Still a very useful algorithm though
Well, the algorithm you use is efficient for human, unfortunately computers don't see the same thing as us and know instinctively to do it, and it is just one particular case, this algorithm allows to be applied on all case scenario with the complexity of O(2*floor(log2(n)) +1) (worst case scenario, so big O) which n is the exponent of the number, so very efficient in terms of complexity. Anyways the method you use is very useful too, just for humans, not pc
In encryption, properties like this are what makes the selection of values so necessary. In this case, the modulo value is generally thousands of bits, while the base is either 3 or 65537 (and the exponent is the message and must be less than the modulo)
So, it turns out Square and Multiply on its own is not the greatest scheme for cryptography, because it lends itself to side channel attacks (timing and power usage). The way this is addressed is through a Montgomery Ladder, where every square operation is performed, and every multiply is performed, but the bit that determines whether it's a simple square or a multiplication actually determines where the output is placed. If it's intended to be used, it goes in the "correct" output location and mixed usefully in with the result. If it's not, it goes into an incorrect output location, and mixed in with all the other side-effect garbage from the function. This results in the power draw and time being constant, which defeats those side-channel attacks.
Excellent video! Alternative summarised explanation: exponentiation in a sense "converts" addition to multiplication (see 3Blue1Brown's excellent intro to group theory and e^(iπ) = -1 for an exploration of this). The algorithm for converting a bitstring to a number (or equivalently, the binary representation of a number to its decimal representation) is to start with zero and read the number from left to right, doubling when you see a new digit, and then adding the value of that digit (i.e. add noting if it's "0", or add 1 / increment if it's "1"). For example, the binary number 101110101 is equal to decimal 373, as follows, reading the digits of the binary representation from left to right: • Start with 0. • Read a digit, "1": double, then increment, giving 1. • Read "0": double, giving 2. • Read "1": double, then increment, giving 5. • Read "1": double, then increment, giving 11. • Read "1": double, then increment, giving 23. • Read "0": double, giving 46. • Read "1": double, then increment, giving 93. • Read "0": double, giving 186. • Read "1": double, then increment, giving 373. The square and multiply algorithm just starts off with the base of the exponent (i.e. 23 as in the video) rather than 0, and replaces each doubling operation with a squaring, and each increment operation with a multiplication by the base. That is, exponentiation with base 23 has converted addition of 1 into multiplication by the base, 23. Likewise, doubling a number (which is the same as adding a number to itself) has been converting into squaring a number (which is the same as multiplying a number by itself).
You could explain this much more easily and without binary like this: You take the exponents and apply two rules until you get to 1: - if it's odd, subtract one - if it's even, divide by two Then you just do the squares/multiplies in reverse order of these operations.
@@deanjohnson8233 That's probably true, if we're talking about something like assembly (those bit shifts are single opcodes, aren't they?), but if i were doing this is in python, it probably wouldn't matter...
@@user-vn9ld2ce1s you would implement it like this in assembly, c, c++, go, rust, c#, Java and many more. Bit shifting is not a rare and unusual thing. In python it would be strange because python does not have fixed numeric sizes. Using bitwise operations on something like that can easily lead to the wrong result if you don’t carefully study what Python does in various cases. Also, this video was about how to efficiently do this math. If you are concerned with the efficiency of math operations, you probably aren’t going to be using python.
It's true that using this algorithm on the private key exponent is more expensive than the specially chosen public exponent. (2048 bit exponent -> 4096 multiplies). However since for the RSA algorithm, the private key holder knows the factors used for the key, an algorithm based on the Chinese Remainder Theorem can reduce the cost of the private key operations.
yes, but CRT reduces the cost by a factor of about 3, so the private key operations are still slower than the public key operations which need to calculate power by a 16 bit number
For people not understanding binary, here is a way to do it without using binary: 1)Let the power be x, 2a)If x is odd subtract 1 from it and write M(multiply) on a paper 2b)If x is even half it and write S(square) on paper 3) with the new x(either half or one less), repeat step 1) until x becomes 1 4) if x is 1, do the steps shown in the video, but start from the bottom of the list For example: For 373, M 372 S 186 S 93 M 92 S 46 S 23 M 22 S 11 M 10 S 5 M 4 S 2 S 1 Then the correct step would be S S M S M S M S S M S S M(this is just the order of operations, you would still have to do all the mod and other things) Note: S=Square; M=Multiply
The one involving modulo 7 can be done relatively easily- as it’s prime we know 3^6 is congruent to 1 mod 7 ( Fermat’s little theorem ), then do 45 mod 6 and hence get to (3^6)^7*(3^3) mod 7 which is (1)^7*(3^3) mod 7 which is of course 6 mod 7.
I always did square and multiply from the last significant bit first. In C, this is: double pow(double x, int exp) { unsigned int e = (exp >= 0) ? exp : -exp; double result = 1.0; while (e) { if (e & 1) { result *= x; } x *= x; e >>= 1; } if (exp < 0) return (1.0 / result); return (result); } When I do it on paper by hand, I just calculate all squares first. Then I multiply every result corresponding to a 1 bit, starting from the least significant bit. Of course, the order in which I multiply does not matter, but it is how I always did it.
Fantastic! I watched the same Numberphile video and did the high power mod p calculation on my calculator. And during the process I realised that to get the right number of squares, you turn the power into its binary number and then square the same number of times as the power of two. (And mod every time the answer is bigger than 747). I even checked it by finding the nearest actual primes, which are 743 and 751. Perfect 1s for both!
Saw an implementation of this in a programming tutorial video but they just rushed over the details. Computerphile does a wonderful job at filling this gap (as always I might add!)
@@trejkaz That depends on you knowing the totient of the modulus / order of the multiplicative group, which is hard if you don't know the prime factorisation of the modulus.
@@JivanPal I don't know much about groups at all and didn't really use any group theory to do that solution, just normal modular arithmetic. Although, in the video he did say that the modulus is usually prime for these examples, so I don't think I'd have too much trouble determining the factors either.
@@trejkaz It depends. The hardness of that factorisation problem is what gives RSA its security. The totient function, denoted φ(x), counts how many numbers less than x are coprime to x. It is such that φ(p) = p-1 where _p_ is any prime, and φ(ab) = φ(a)·φ(b) where _a_ and _b_ are any two integers. The encryptor's/signer's secret is a pair of large primes, _p_ and _q,_ that serve as the private key, and the public knowledge that serves as the public key is their product, _n_ = _pq._ Thus, the encryptor/signer is always dealing with _p_ and _q,_ whereas the decryptor/verifier is always dealing with _n,_ whose prime factorisation he doesn't know. Without that knowledge, computing φ(n) is hard; with that knowledge, it is trivial: φ(n) = (p-1)(q-1). If he could figure out the prime factorisation, the encryption scheme is broken, precisely because he then knows φ(n) and can thus quickly compute these modular exponentials we're interested in: g^x mod n = g^[x mod φ(n)] mod n.
@@trejkaz can you generalized this method for all case scenario for computers? This method allows to be generalized for every case scenario with complexity of O(2*floor(log2(n))+1) and it is very efficient already (n being the exponent of the number to verify). Since I'm at it I'll also remind you that computer don't have instinct or intelligence like us
I know it was touched on toward the end of the video but I think a part II to this video where Dr. Pounds could go into a specific application example where this is used. Can always use more videos with him!
We call it binary exponent algorithm For example 3^10 =? we write its power in binary 10 = 1010 The bits which are set will be included in the final ans as we can calculate all exponent which are powers of two very quickly. 3^10= 3^8 * 3^2
Basically Russian Multiplication (covered by Johnny Ball on Numberphile), but square and multiply instead of double and add. Surprised that wasn't mentioned.
Indeed! That is the basis of a common efficient algorithm for converting string representations of integers expressed in base _n_ into actual integer datatypes, too, e.g. for decimal in C: char* input_string = "285657"; int result = 0; for (char* c = input_string; c != NULL; c++) { result += *c - '0'; result *= 10; } Or for capitalised hexadecimal: char* input_string = "45BD9"; int result = 0; for (char* c = input_string; c != NULL; c++) { result += isdigit(*c) ? *c - '0' : 10 + *c - 'A'; result *= 16; }
I love the step-by-step simplistic explanations you give and the focus on the core concepts. Thanks Mike! bloody champion! Peace and Love from Perth Australia!✌❤✌
nice how you built in the eyes-glazing-over effect into the video so my eyes didn't have to this time like they have in some other videos (because things are complex, not because they're boring) :D
Being half asleep when watching this, for a moment when he was going over all the steps I thought I was nodding off, but nope, it was just the camera. Perhaps get the camera some coffee in the future.
Dr. Pound-- @12:56 "I could have worked backwards, right?. If only we could that." You can! Call it the "subtract and divide" method. PSEUDO-CODE k=373 when k is even, k = k//2 and left-insert a 0 into result when k is odd, k = (k-1)//2 and left-insert a 1 into result # python code # assume k is positive. No error error checking included # longer code shown for readability. != pythonic. Just another way to convert k to binary k = int(373) result = [] # result.insert(i, v) # i=index to insert value (v) print(k) # printing k excessively to reflect the exact powers shown in video while k != 1: # checking for 1 is better than 0, but causes the obligatory ultimate left-insert 1. if k % 2 == 0: k //= 2 print(k) result.insert(0, 0) else: k -= 1 print(k) k //= 2 print(k) result.insert(0, 1) result.insert(0, 1) print(result)
A cool video and topic but I would like to nit pic the camera placement when Dr Mike writes on his paper. Surely the camera could have been on his left side?
Is there a proof that that's the fastest decomposition to exponentiate a number? The algorithm here clearly works in all cases in, at worse, 2 ln_2(m) for exponentiating by m, but it's not at all obvious to me that this is the fewest possible number of steps for all m. I vaguely recall hearing a few years ago that in general this was actually still an open problem.
That's because it's not. For instance, if you knew your exponent was x^y and X was odd, it would be fastest to be taking to the power of X repeatedly. However, this is useful with computers because they use binary, so finding which powers of 2 make up the exponent is trivial
It indeed is still an open problem. The fact that this is not the fastest way can be seen even for n=15. In that case one can compute (x^3)^5 and save a step.
The original comment(with 1 mistake and 1 misunderstanding): 23 (23 * 23) mod 747 === 529 (this number is ignored in the final step) (529 * 529) mod 747 === 463 (463 * 463) mod 747 === 727 (this number is ignored in the final step) then 400 142 742 25 (this number is ignored in the final step) 625 The final result is (23 + 463 + 400 + 142 + 742 + 625) mod 747 === 154(notice, this line is wrong) Edit: According to the 3rd comment in this thread from Jivan Pal, the last line is wrong, and the correct version is multiplication, not addition. Also, my version is not a in place algorithm. Jivan Pal shows what exactly the method in the video.
No; this appears to be a common misconception amongst viewers of this video. Mike is / we are *_not_* raising 23 to the powers of 2, and then multiplying together the values that correspond to the bits that are set to 1 in the binary representation of the exponent. That is, we are *_not_* computing 23, 463, 400, 142, etc. and then multiplying them together. That is an alternative algorithm which reads the binary digits from right to left (least significant first) and requires more memory, since you need to store all those intermediate values somewhere until you've got them all. _[Note also that you made a mistake in that final step, summing the values rather than multiplying them; you should've got 131 as the answer, not 154.]_ What we *_are_* doing is reading the binary representation of the exponent _from left to right (most significant bit first),_ doubling a working value (a.k.a. "accumulator" or "acc" for short) with each digit that is read, and subsequently multiplying the acc by the base if that digit happened to be a "1". The acc starts out equal to the base. The working is as follows for the example in the video, i.e base 23, exponent 373 (bitstring "101110101"), with all instances of "=" being congruences modulo 747: • Read the first (nonzero) digit, which is (of course, being nonzero) "1". Set acc ← base = 23. • Read the next digit, which is "0". Set acc ← acc² = 23² = 529. • Read "1": (a) Set acc ← acc² = 529² = 463. (b) We read a "1", so multiply by the base: acc ← acc × base = 463 × 23 = 191. *_[This is where square-and-multiply deviates from your method!]_* • Read "1": acc ← acc² × base = 191² × 23 = 182. • Read "1": acc ← acc² × base = 182² × 23 = 659. • Read "0": acc ← acc² = 659² = 274. • Read "1": acc ← acc² × base = 274² × 23 = 431. • Read "0": acc ← acc² = 431² = 505. • Read "1": acc ← acc² × base = 505² × 23 = 131. The answer is the final acc value: 23^373 mod 747 = 131.
Regarding the final point about reading a private key from power usage, wouldn't it already need to be on the hacker's computer anyway for this to work? In which, case they already have your private key
No; see "side-channel attack". For example, the encryption may be running in a trusted execution environment (like ARM TrustZone or Apple Secure Enclave) and the adversary is running untrusted code outside of that environment that can determine power consumption / voltage levels / timings. Another example: the adversary is sitting outside your house with a supersensitive probe in the electricity lines that run into your bedroom, where your desktop computer is plugged in, and is able to determine changes in voltage on the lines that way, which correspond to your computer's power consumption. In both examples, the adversary does _not_ have any access to the trusted environment, but is able to acquire information about the behaviour of that environment which can be reverse-engineered to determine what was actually happening in that environment.
If you can't do "mod" at any step of the way, you have to allocate enough memory for a multi-word number having, as a number of bits, at least the sum of the number of bits in both multiplicands at every multiplication. For example, if you're at a point where base^64 is 415 bits, you need at least 830 bits for a square (base^64 x base^64), since both multiplicands are 415 bits and 415 + 415 = 830. Then the multiplication proceeds as usual: on a 64-bit computer, bits 63 to 0 of each multiplicand contribute to partial sums at bits 127 to 0 of the result, and so on, until bits 447 to 384 contribute to partial sums at bits 895 to 768 of the result. You could always pre-allocate enough bits for the entire number in advance. Then you would execute a modified square and multiply algorithm that just calculates how many bits you're likely to need to hold the final result given all the multiplications involved, allocate that (say it's 16190 bits), and execute the proper square and multiply in multi-word arithmetic on 16190 bits.
I wonder if square and multiply would lose to a naive multiply-only method for a small modulo m and a large exponent x. The naive method would multiply everytime and mod m, then cache the result, quickly generating a repeating pattern of length at most m, and then x mod patternLength would give the place in the pattern which is the answer. So the number of operations is at most m, compared to a square and multiply which can be very expensive for very large x.
It is true that in base _n,_ you will determine which of _n_ operations to do (e.g. for _n_ = 3, these are multiply, square, or cube). However, cubing is just multiplying a number by itself twice, so _n_ = 3 actually gives *_worse_* performance. In fact, _n_ ≥ 4 also gives worse performance, since what would be squaring twice in the _n_ = 2 case would become multiplying a number by itself four times. Thus, _n_ = 2 gives the best average performance. Even more generally, square and multiply does not give the best possible performance / shortest possible algorithm for any given exponent, but it is the best we can do without solving a harder problem for the exponent first, which on average would give us much worse performance. It is this: if we know the totient of the modulus _m_ (or equivalently, the order of the multiplicative group { g^x mod m | x ∈ 𝐙 }, where _g_ is the generator of the group, i.e. the base of the modular exponential we're trying to compute, which was 23 in the video), which is denoted φ(m), then we have g^[φ(m)] mod m = 1 (by an extension of Fermat's Little Theorem), and so g^x mod m = g^[q φ(m) + r] mod m = ( g^[φ(m)] ^ q ) g^r mod m = g^r mod m, where _q_ and _r_ are the quotient and remainder of _x_ divided by φ(m), respectively. However, computing φ(m) is hard unless the prime factorisation of _m_ is known, so in practice this is not used often. The hardness of this problem is what underpins the security of RSA.
yeah! in my assembly class, when we had to program, there is no multiplication operator, so you'd have to shift the bits of the binary value in the registers to actually multiply two numbers. Don't even get me started with bringing in value into registers, and these values can't be stored in just any registers... and then pulling the value out.. storing elsewhere... god bless ..
MOD is one of the slowest operations on a computer, normally taking 32 or 64 cycles. Instead of MODing by the actual number each time, can you MOD by the next higher power of 2 and then do a MOD by the real value at the end? MOD by a power of 2 is simply an AND with a bit-mask, which is very fast.
No, we cannot lop off upper bits that way and mod by the real number at the end. Here's the answer if we rerun the square-and-multiply but use "mod 8" to reduce at intermediate steps. For those unfamiliar, reading this comment, this is "and 7", because "and 7" is a bitmask that keeps the lower 3 bits, but that's equivalent to "mod 8": 15 mod 8 = 7, 15 and 7 = 1111b and 0111b = 0111b = 7. Anyway, 3^45 mod 7 = 3^101101b mod 7 starts with 3 3 And 7 = 3 Square (3^1 -> 3^2): 3x3 = 9 9 And 7 = 1 Square (3^2 -> 3^4): 1x1 = 1 1 And 7 = 1 Multiply (3^4 -> 3^5): 1x3 = 3 3 And 7 = 3 Square (3^5 -> 3^10): 3x3 = 9 9 And 7 = 1 Multiply (3^10 -> 3^11): 1x3 = 3 3 And 7 = 3 Square (3^11 -> 3^22): 3x3 = 9 9 And 7 = 1 Square (3^22 -> 3^44): 1x1 = 1 1 And 7 = 1 Multiply (3^44 -> 3^45): 1x3 = 3 Final reduction with mod 7: 3 mod 7 = 3 The answer given by Dr Pound (and indeed WolframAlpha) is 6, but with "mod 8" intermediate reduction, we get 3. We could reduce "mod" a multiple of 7, so "mod 14" would be a valid intermediate reduction, as would "mod 21", "mod 28" and so on. However, none of those multiples of 7 are going to be a power of 2 and allow for fast lopping off of bits with AND: a power of 2 would have only 2s as factors, and not any 7.
OpenSSL has a big number library. If you look at their documentation look up BIGNUM and any function starting with BN_* . It dynamically allocates a buffer of however many bits it needs. So I guess the data type is an array of bytes. There are other bignum libs out there, that's just the one I've used.
Easier for humans, not for computer, please don't forget that computers don't have instinct like us humans, and to write each case scenario is a nightmare, this method allows to be generalized with the complexity of O(2*floor(log2(n))+1) which n is the exponent of the number, which is very efficient in terms of complexity (big O means worst case scenario)
For the people who have worked with assembly programming they will be really use to these. In the past CPUs did not have multiply and you often had a table of the fastest way to multiply a numbers. Which you guessed it was shifts (which is like a square) and addition
even modern cpus have shortcuts for squaring and for multiplying by small numbers, so this algo is still benefitial
Johnny Ball covered this on Numberphile before - search "Russian multiplication".
Let's call the two numbers A and B.
With number A, we shift all the bits to the right once.
The rightmost bit "falls out" of the number and typically gets shifted into a flag on most CPUs - let's say our CPU shifts the bit out of the right into the carry flag.
So, the carry flag now contains the rightmost (least significant) bit of number A. If it's a one, then we add number B to our running total. If it's a zero, carry on (so you could code that as a "branch if carry not set" over a "add B to total" instruction).
Then we shift B left one, which doubles it.
Then we do it again. Shift A to the right. The rightmost bit falls out into the carry flag. If it's a one, then we add B (which we've just doubled, remember) to the running total.
Shift B left one, doubling it again.
Shift A to the right. If it's a one, then add B (now four times bigger than it originally was, as we're shifting it left every round) to the running total.
You keep doing this until you've shifted every original bit of A out of the right side. Stop. You've multiplied the two numbers together and your answer is in the running total register.
And all we did was shifting bits left and right, and simple addition. And there's only as many "rounds" of this as there are bits in A.
Another bit of useful binary maths is that the result cannot be more than twice the number of bits in B. That is, if A and B are 8-bit numbers, the result register only needs to be 16-bits - as it's just not possible for two 8-bit numbers to multiply to more than a 16-bit number.
Indeed, if you're implementing this algorithm, then stick B in a register with twice as many bits and have your "running total" register be twice as many bits. Then you can run the algorithm blindly.
Which is, as you may have guessed, what the CPU's actually doing in the circuitry with a hardware multiply.
It's just multiplying by 2 - which, in binary, is nothing more than shifting all the bits to the left once - and addition.
So, yeah, it's basically the same algorithm as this video, but working in a higher order. So multiplication -> exponential. And multiplying by 2 -> squaring. And addition -> multiplication.
@@NoNameAtAll2 Are those tables actually part of the CPU (making use of microcode) or done by a compiler?
@@RegrinderAlert multipliers are simple enough to be just logic gates
what I was talking about is "check if top 48 bits are 0, so we don't need to wait/use most of the circuit"
that is too, simple enough to be just logic
about compilers... idk how common the explicit "square" command is in processors
yes, except that's a shift & add algorithm, & shifts aren't like squaring but like multiplying by a power of 2.
Note that for RSA and similar, the modular multiplication operation itself can be quite expensive, so modern implementations typically convert the numbers involved to an intermediate representation, called a Montgomery form, after Peter Montgomery. The binary exponentiation method can use Montgomery forms throughout, so only at the end is the result converted back to a conventional representation. Montgomery multiplication is also resistant to the side channel attacks mentioned at the end of the video.
Now I have to look up Montgomery forms - or wait for the Computerphile video. I do like internet rabbit holes.
@@andrewharrison8436 also look up Montgomery Ladder which is the similar algorithm for elliptic curves
fascinating, I had no idea this exists
I think the last bit of the video is facinating - that you could perform an attack to work out a key based on the CPU time to calculate a square vs a square & multipy. A great example of the theoretical mathematics being ideal vs. the real world implementation being fundamentally vulnerable.
Yes! And the technical term for it is a "timing attack".
Timing attacks can be so insidious that you need to resort to assembly language just to get everything out of the way.
Dr Pound's example has us doing an "always multiply", multiplying by one (the multiplication identity) after squaring for an unset bit in the exponent, so that we execute the algorithm in constant time. However, using a statement like [if (bit is zero) { multiplicand = one; } else { multiplicand = base; }] to do this "always multiplication" can end up in the *branch predictor's way.* If there are lots of zeroes in your key, it's going to take the "if" path more of the time; conversely, if there are lots of ones in your key, it's going to take the "else" path more of the time. Either way, the branch predictor will execute it faster than if the key were evenly-distributed ones and zeroes.
To execute *that* in constant time, the CPU has to have a branchless test instruction to set the new multiplicand. Either you validate that the compiler uses the branchless instruction in your C code (say), or you write at least that part of the algorithm in assembly language.
Edit: Or use the Montgomery form of the numbers, per David Gillies's comment, which makes it easier to have constant-time algorithms
In data centers that have servers that the government uses, the government requires that their servers are plugged into an air-gapped power supply, unconnected to the power that every other server uses, so that a spy couldn't surreptitiously measure changes in their power usage. These are surprisingly effective attacks.
That's why cryptographic implementation need to have constant time, no optimization allowed for multiplication by zero.
@@nebuleon Multiply by base^binary. Thus if there is a 0, it will be multiplied by 1, and for 1 it's the ordinary multiply. As both x^n where n is 0 or 1 is easily calculated that should be same time (?).
Why not just simply add a random sleep at the end of the algorithm? The size of the sleep would be a question, but if it’s a random amount each time that is sufficiently large to mask any work being done (or lack there of), it would remove this timing attack issue. It’s certainly wasteful to simply sleep, but it’s also wasteful to do unnecessary calculations to remain in constant time, no?
Dr Pounds breadth and depth of knowledge in computer science never cease to amaze me!!
"Man from the future"
Absolutely, but this specific piece of knowledge is pretty much common knowledge for anyone in CS. I studied this in college about 7 years ago, it was NOT a fun time since we had a pretty bad professor...
Not necessarily the breadth, but connecting the theoretical with the practical. Bit on timing attacks was pretty nice.
@@quincy2142 he has a very impeccable pedagogy
Even if he doesn't have the knowledge I still like his teaching style, it's engaging and fun.
Interesting algorithm.
At first I thought it were just going to be a simple, "first we build our list of binary equivalents and then just multiply them all together in the end."
As an example, calculate 3^1, 3^2, 3^4, 3^8, 3^16, etc. And then choose the values our exponent actually contains.
Then the slight of hands of mathematicians came in at 9:40 and made things far far simpler and much easier to execute in practice.
You might have noticed that Mike said that this was the left to right method. Yours (with taking the modulus) is the right to left variant and is just as quick.
I always liked calculating that recursively. For example, 2^6 is (2^3)*(2^3), 2^3 is (2^2)*(2^1) and 2^2 is (2^1)*(2^1).
It's extremely simple to code it too. Here's the algorithm that performs the calculation:
1) If exponent is zero, return 1
2) Divide exponent by two, and save both the quotient and the remainder
3) Call algorithm recursively with (exponent = quotient) and save the result
4) If remainder is zero, return result*result
5) If remainder is one, return result*result*base
Unless you cache the results, this is no faster than multiplying one-by-one (probably slower because of recursion overhead)
@@canaDavid1 I don't think you appreciate the difference between O(N) and O(logN).
@@longlostwraith5106 you have O(2^log(N)) so O(N)
@@schwingedeshaehers How, exactly? Are you taking the division into account?
@@longlostwraith5106 you have log n layers, but these layer get more and more calculations each level. And they get an exponential growth until the log n barrier from the amount of layers
9:34 It's actually not the minimum number of operations. For example, to make 31 by this method takes 8 operations (SMSMSMSM), whereas the minimum is only 7 operations (N^2, N*(N^2), (N^3)^2, (N^6)^2, (N^12)^2, (N^6)*(N^24), N*(N^30)). However, in general finding the minimum number for a given exponent is NP-complete, so in practice square and multiply is presumably what you'd do. Otherwise, great video!
The NP-completeness is a common misconception: this is only proven for sets of numbers, not single numbers. In practice I believe a window method is used, for which one precomputes some values so one can "combine" some multiplications.
@@pikasnoop6552 Thanks for the correction
Efficient exponentiation is a very interesting topic. The minimum number of multiplications required for N is an open problem in mathematics, you can read more about that on OEIS A003313, "Length of shortest addition chain for n".
That example seems to use more space (you need to remember N^6 until after you finish (N^12)^2). May be the minimum operations in fixed space, where the space is exactly the size of the input string.
Also important for cryptographic libraries which shouldn’t be allocating memory dynamically.
If you introduce division as an additional operation, the example of 31 can be reduced to 6 operations: SSSSSD
I read binary numbers left to right by starting at 1, doubling for each bit, and adding 1 if the bit was a 1. Very cool to see this pattern coming up in exponents
Also called russian peasant multiplication. It works for any power operation tbh, not only scalar multiplication. The power operator on matrix can for instance be used to compute large fibonacci numbers very quickly using the matrix 2x2 [1, 1, 1, 0]
This is also known as "Fast Binary Exponentiation", which calculates pow(a, b, mod) in logarithmic time.
I love how entertaining the video is given that I already know what the answer is and have used this quite a lot
We actually did this in school😅 This really helped me understand it
Thank you very much!
what a coincidence, i started studying RSA and y'all put out this banger, thanks Mike and Sean
binary to decimal: go left to right, start from 0 and double if it's a 0 and double and add one if it's a 1. e.g. for 101010 you do 0 -> 1 -> 2 -> 5 -> 10 -> 21 -> 42. so 101010 = 42.
decimal to binary: halve your number rounding down until you reach 1, e.g. 42 -> 21 -> 10 -> 5 -> 2 -> 1. now go backwards through this sequence and put a 1 if it's odd and put a 0 if it's even. so 42 = 101010.
This video was a lovely reminder of my time spent with number theorists in college, cryptography is so damn fascinating
I am currently purchasing master degree in cybersecurity and this guy summerize a 2h of lectures in literally 17min ;)
Why are you purchasing a degree? Maybe do it somewhere with better teaching then?
@@QuantumHistorian he probably meant pursuing
I’m afraid that is the model for university education these days
@@ait-gacemnabil9181yeah, you right. but, in some sense, it means actually a business activities for university.
In a way, he is right. Nowadays everything needs to be purchased -- even knowledge.
1) You rock, thank you for making world better.
2) Focus fail hurts eyes.
This was fascinating, so simple and intuitive once explained but so powerful
This is also called "exponentiation by squaring" and it's super useful in many cases.
One quick example is in computing the nth Fibonacci number using the 2x2 matrix formula, where one raises a matrix to the nth power. But using this method, the number of multiplications is greatly reduced. There is also a closed form expression using a power of the golden ratio but that requires a lot of numerical precision for large n.
i cant believe how brilliant this explanation actually was!!Kudoss
I did the 3^45 mod 7 in my head fairly simply. 3 and 7 are coprime, so you know that 3 will cycle through all 7 numbers. Then we can do 3^42 * 3^3 mod 7, which is just 1*3^3 or 27 mod 7 which is 6. Still a very useful algorithm though
Well, the algorithm you use is efficient for human, unfortunately computers don't see the same thing as us and know instinctively to do it, and it is just one particular case, this algorithm allows to be applied on all case scenario with the complexity of O(2*floor(log2(n)) +1) (worst case scenario, so big O) which n is the exponent of the number, so very efficient in terms of complexity. Anyways the method you use is very useful too, just for humans, not pc
In encryption, properties like this are what makes the selection of values so necessary. In this case, the modulo value is generally thousands of bits, while the base is either 3 or 65537 (and the exponent is the message and must be less than the modulo)
So, it turns out Square and Multiply on its own is not the greatest scheme for cryptography, because it lends itself to side channel attacks (timing and power usage).
The way this is addressed is through a Montgomery Ladder, where every square operation is performed, and every multiply is performed, but the bit that determines whether it's a simple square or a multiplication actually determines where the output is placed. If it's intended to be used, it goes in the "correct" output location and mixed usefully in with the result. If it's not, it goes into an incorrect output location, and mixed in with all the other side-effect garbage from the function. This results in the power draw and time being constant, which defeats those side-channel attacks.
You Sir saved my life before the exam, I can't thank you enough.
I love the idea of solving a complicated problem by solving a lot of simpler problems. Genius!
Excellent video! Alternative summarised explanation: exponentiation in a sense "converts" addition to multiplication (see 3Blue1Brown's excellent intro to group theory and e^(iπ) = -1 for an exploration of this). The algorithm for converting a bitstring to a number (or equivalently, the binary representation of a number to its decimal representation) is to start with zero and read the number from left to right, doubling when you see a new digit, and then adding the value of that digit (i.e. add noting if it's "0", or add 1 / increment if it's "1"). For example, the binary number 101110101 is equal to decimal 373, as follows, reading the digits of the binary representation from left to right:
• Start with 0.
• Read a digit, "1": double, then increment, giving 1.
• Read "0": double, giving 2.
• Read "1": double, then increment, giving 5.
• Read "1": double, then increment, giving 11.
• Read "1": double, then increment, giving 23.
• Read "0": double, giving 46.
• Read "1": double, then increment, giving 93.
• Read "0": double, giving 186.
• Read "1": double, then increment, giving 373.
The square and multiply algorithm just starts off with the base of the exponent (i.e. 23 as in the video) rather than 0, and replaces each doubling operation with a squaring, and each increment operation with a multiplication by the base. That is, exponentiation with base 23 has converted addition of 1 into multiplication by the base, 23. Likewise, doubling a number (which is the same as adding a number to itself) has been converting into squaring a number (which is the same as multiplying a number by itself).
You could explain this much more easily and without binary like this:
You take the exponents and apply two rules until you get to 1:
- if it's odd, subtract one
- if it's even, divide by two
Then you just do the squares/multiplies in reverse order of these operations.
This is basically how you create the Binary Number out of the 10-base ^^ So its the same
That might explain it, but that is not how it would be programmed efficiently
@@Loldemord True
@@deanjohnson8233 That's probably true, if we're talking about something like assembly (those bit shifts are single opcodes, aren't they?), but if i were doing this is in python, it probably wouldn't matter...
@@user-vn9ld2ce1s you would implement it like this in assembly, c, c++, go, rust, c#, Java and many more. Bit shifting is not a rare and unusual thing.
In python it would be strange because python does not have fixed numeric sizes. Using bitwise operations on something like that can easily lead to the wrong result if you don’t carefully study what Python does in various cases.
Also, this video was about how to efficiently do this math. If you are concerned with the efficiency of math operations, you probably aren’t going to be using python.
It's true that using this algorithm on the private key exponent is more expensive than the specially chosen public exponent. (2048 bit exponent -> 4096 multiplies). However since for the RSA algorithm, the private key holder knows the factors used for the key, an algorithm based on the Chinese Remainder Theorem can reduce the cost of the private key operations.
yes, but CRT reduces the cost by a factor of about 3, so the private key operations are still slower than the public key operations which need to calculate power by a 16 bit number
I remember using this algorithm for a competitive programming question on one of the codechef's monthly contests, didn't know it had a name.
i did this aswell in codeforces
i love the channel. the slowness of the math demonstrations made me itchy!
I literally submitted my rsa cryptography coursework in two weeks ago. This is all fresh in my mind. I'd love to see this go to the next step.
Whats the next step?
For people not understanding binary, here is a way to do it without using binary:
1)Let the power be x,
2a)If x is odd subtract 1 from it and write M(multiply) on a paper
2b)If x is even half it and write S(square) on paper
3) with the new x(either half or one less), repeat step 1) until x becomes 1
4) if x is 1, do the steps shown in the video, but start from the bottom of the list
For example: For 373,
M 372
S 186
S 93
M 92
S 46
S 23
M 22
S 11
M 10
S 5
M 4
S 2
S 1
Then the correct step would be S S M S M S M S S M S S M(this is just the order of operations, you would still have to do all the mod and other things)
Note: S=Square; M=Multiply
I love this (semi-)collaboration. You are computerphile version of Matt Parker in any way.
Matt Parker and Mike Pound. MP=MP.
The one involving modulo 7 can be done relatively easily- as it’s prime we know 3^6 is congruent to 1 mod 7 ( Fermat’s little theorem ), then do 45 mod 6 and hence get to (3^6)^7*(3^3) mod 7 which is (1)^7*(3^3) mod 7 which is of course 6 mod 7.
This is the coolest maths/CS video I've seen in a long time. Wow!
Dude, you’re such a fun guy to listen to. Keep it up, cheers!
The last minute was spot on - avoiding side attacks!
You can do multiplication this way, too (by doubling & adding). Very useful on CPUs that can only do addition.
This is also how I've seen multiplication done on mechanical calculators.
You can. But it's faster to subtract squares. You have to build the table first. But you don't need multiplies to do it.
@@PvblivsAelivs this depends on the speed of memory accesses, and how much memory space is available. But yes, table lookups are usually faster.
I always did square and multiply from the last significant bit first. In C, this is:
double pow(double x, int exp) {
unsigned int e = (exp >= 0) ? exp : -exp;
double result = 1.0;
while (e) {
if (e & 1) {
result *= x;
}
x *= x;
e >>= 1;
}
if (exp < 0)
return (1.0 / result);
return (result);
}
When I do it on paper by hand, I just calculate all squares first. Then I multiply every result corresponding to a 1 bit, starting from the least significant bit. Of course, the order in which I multiply does not matter, but it is how I always did it.
Truly interesting lesson, just studied this at school!
Fantastic! I watched the same Numberphile video and did the high power mod p calculation on my calculator. And during the process I realised that to get the right number of squares, you turn the power into its binary number and then square the same number of times as the power of two. (And mod every time the answer is bigger than 747). I even checked it by finding the nearest actual primes, which are 743 and 751. Perfect 1s for both!
Saw an implementation of this in a programming tutorial video but they just rushed over the details. Computerphile does a wonderful job at filling this gap (as always I might add!)
This guy should have his own channel lol great stuff tho love it
Such a cool method! The very fact that you can calculate 3^45 mod 7 on paper in a few minutes is awesome considering 3^45 has 22 digits!
You can do it faster. For instance, observe that 3^6 mod 7 is 1. So 3^45 mod 7 is going to be the same as 3^3 mod 7, which is 6.
@@trejkaz That depends on you knowing the totient of the modulus / order of the multiplicative group, which is hard if you don't know the prime factorisation of the modulus.
@@JivanPal I don't know much about groups at all and didn't really use any group theory to do that solution, just normal modular arithmetic. Although, in the video he did say that the modulus is usually prime for these examples, so I don't think I'd have too much trouble determining the factors either.
@@trejkaz It depends. The hardness of that factorisation problem is what gives RSA its security. The totient function, denoted φ(x), counts how many numbers less than x are coprime to x. It is such that φ(p) = p-1 where _p_ is any prime, and φ(ab) = φ(a)·φ(b) where _a_ and _b_ are any two integers. The encryptor's/signer's secret is a pair of large primes, _p_ and _q,_ that serve as the private key, and the public knowledge that serves as the public key is their product, _n_ = _pq._
Thus, the encryptor/signer is always dealing with _p_ and _q,_ whereas the decryptor/verifier is always dealing with _n,_ whose prime factorisation he doesn't know. Without that knowledge, computing φ(n) is hard; with that knowledge, it is trivial: φ(n) = (p-1)(q-1). If he could figure out the prime factorisation, the encryption scheme is broken, precisely because he then knows φ(n) and can thus quickly compute these modular exponentials we're interested in: g^x mod n = g^[x mod φ(n)] mod n.
@@trejkaz can you generalized this method for all case scenario for computers? This method allows to be generalized for every case scenario with complexity of O(2*floor(log2(n))+1) and it is very efficient already (n being the exponent of the number to verify). Since I'm at it I'll also remind you that computer don't have instinct or intelligence like us
Thank you for this video. One of the most interesting things I've ever done in school, and I'd almost forgotten about it.
You guys and numberphile my 2 favorite channels
I know it was touched on toward the end of the video but I think a part II to this video where Dr. Pounds could go into a specific application example where this is used. Can always use more videos with him!
Always wondered how the key reading timing/power attacks worked, that makes a lot of sense, cheers!
god Dr Pound is so good at explaining things
We call it binary exponent algorithm
For example 3^10 =?
we write its power in binary 10 = 1010
The bits which are set will be included in the final ans as we can calculate all exponent which are powers of two very quickly.
3^10= 3^8 * 3^2
This is just mind-blowing! Awesome video!
Basically Russian Multiplication (covered by Johnny Ball on Numberphile), but square and multiply instead of double and add. Surprised that wasn't mentioned.
Indeed! That is the basis of a common efficient algorithm for converting string representations of integers expressed in base _n_ into actual integer datatypes, too, e.g. for decimal in C:
char* input_string = "285657";
int result = 0;
for (char* c = input_string; c != NULL; c++) {
result += *c - '0';
result *= 10;
}
Or for capitalised hexadecimal:
char* input_string = "45BD9";
int result = 0;
for (char* c = input_string; c != NULL; c++) {
result += isdigit(*c) ? *c - '0' : 10 + *c - 'A';
result *= 16;
}
Great video, thanks! He belongs in Numberphile videos too
I love the step-by-step simplistic explanations you give and the focus on the core concepts. Thanks Mike! bloody champion! Peace and Love from Perth Australia!✌❤✌
5:24 - cleaner explanation to convert 45 -> 101101 -> 32 + 8 + 4 + 1.
that's backward of this algorithm
you did calculation of powers of 2 and multiply them
in the video 101101 -> (((((1)*2+0)*2+1)*2+1)*2+0)*2+1
@@NoNameAtAll2 Or in postfix notation to avoid all those parentheses: 1 2× 0+ 2× 1+ 2× 1+ 2× 0+ 2× 1+.
@@JivanPal why not prefix then?
+ * + * + * + * + * 1 2 0 2 1 2 1 2 0 2 1
:)
12:55 you can work backwards: if it's odd, subtract 1, if it's even, devide by 2 ;)
Interesting algorithm and great video as usual
This man is a legend
I LOVE this type of content !!!! Thanks a million for making and sharing :)
Waiting for those follow-up videos!
This was a fascinating video even if the algorithm was rather obvious.
nice how you built in the eyes-glazing-over effect into the video so my eyes didn't have to this time like they have in some other videos (because things are complex, not because they're boring) :D
Being half asleep when watching this, for a moment when he was going over all the steps I thought I was nodding off, but nope, it was just the camera. Perhaps get the camera some coffee in the future.
Dr. Pound-- @12:56 "I could have worked backwards, right?. If only we could that."
You can! Call it the "subtract and divide" method.
PSEUDO-CODE
k=373
when k is even, k = k//2 and left-insert a 0 into result
when k is odd, k = (k-1)//2 and left-insert a 1 into result
# python code
# assume k is positive. No error error checking included
# longer code shown for readability. != pythonic. Just another way to convert k to binary
k = int(373)
result = []
# result.insert(i, v) # i=index to insert value (v)
print(k)
# printing k excessively to reflect the exact powers shown in video
while k != 1: # checking for 1 is better than 0, but causes the obligatory ultimate left-insert 1.
if k % 2 == 0:
k //= 2
print(k)
result.insert(0, 0)
else:
k -= 1
print(k)
k //= 2
print(k)
result.insert(0, 1)
result.insert(0, 1)
print(result)
A cool video and topic but I would like to nit pic the camera placement when Dr Mike writes on his paper. Surely the camera could have been on his left side?
Nice use of symmetry and multiplication.
cool trick, didn't realize that relation
i love this series
Is there a proof that that's the fastest decomposition to exponentiate a number? The algorithm here clearly works in all cases in, at worse, 2 ln_2(m) for exponentiating by m, but it's not at all obvious to me that this is the fewest possible number of steps for all m. I vaguely recall hearing a few years ago that in general this was actually still an open problem.
That's because it's not. For instance, if you knew your exponent was x^y and X was odd, it would be fastest to be taking to the power of X repeatedly.
However, this is useful with computers because they use binary, so finding which powers of 2 make up the exponent is trivial
It indeed is still an open problem. The fact that this is not the fastest way can be seen even for n=15. In that case one can compute (x^3)^5 and save a step.
Are you going to make a followup talking about addition chains?
My eyes hurt after watching the autofocus go berserk :D
The original comment(with 1 mistake and 1 misunderstanding):
23
(23 * 23) mod 747 === 529 (this number is ignored in the final step)
(529 * 529) mod 747 === 463
(463 * 463) mod 747 === 727 (this number is ignored in the final step)
then
400
142
742
25 (this number is ignored in the final step)
625
The final result is (23 + 463 + 400 + 142 + 742 + 625) mod 747 === 154(notice, this line is wrong)
Edit:
According to the 3rd comment in this thread from Jivan Pal, the last line is wrong, and the correct version is multiplication, not addition.
Also, my version is not a in place algorithm. Jivan Pal shows what exactly the method in the video.
No; this appears to be a common misconception amongst viewers of this video. Mike is / we are *_not_* raising 23 to the powers of 2, and then multiplying together the values that correspond to the bits that are set to 1 in the binary representation of the exponent. That is, we are *_not_* computing 23, 463, 400, 142, etc. and then multiplying them together. That is an alternative algorithm which reads the binary digits from right to left (least significant first) and requires more memory, since you need to store all those intermediate values somewhere until you've got them all. _[Note also that you made a mistake in that final step, summing the values rather than multiplying them; you should've got 131 as the answer, not 154.]_
What we *_are_* doing is reading the binary representation of the exponent _from left to right (most significant bit first),_ doubling a working value (a.k.a. "accumulator" or "acc" for short) with each digit that is read, and subsequently multiplying the acc by the base if that digit happened to be a "1". The acc starts out equal to the base. The working is as follows for the example in the video, i.e base 23, exponent 373 (bitstring "101110101"), with all instances of "=" being congruences modulo 747:
• Read the first (nonzero) digit, which is (of course, being nonzero) "1". Set acc ← base = 23.
• Read the next digit, which is "0". Set acc ← acc² = 23² = 529.
• Read "1":
(a) Set acc ← acc² = 529² = 463.
(b) We read a "1", so multiply by the base: acc ← acc × base = 463 × 23 = 191. *_[This is where square-and-multiply deviates from your method!]_*
• Read "1": acc ← acc² × base = 191² × 23 = 182.
• Read "1": acc ← acc² × base = 182² × 23 = 659.
• Read "0": acc ← acc² = 659² = 274.
• Read "1": acc ← acc² × base = 274² × 23 = 431.
• Read "0": acc ← acc² = 431² = 505.
• Read "1": acc ← acc² × base = 505² × 23 = 131.
The answer is the final acc value: 23^373 mod 747 = 131.
@@JivanPal Yeah, many thanks. I guess this could be described as in place calculation. Ok, I'm gonna edit my comment.
A comment that i've liked
"This man forgot things about computers more than what i will ever learn"
Huh. I was using a similar but substantially slower method. Good to know it can be improved upon.
That's a wicked smart algorithm.
So is modulo distributive over multiplication? Is that why we can keep the numbers small?
Amazing each time a watch your videos
Is there a video about modulo math commuting for both addition and multiplication? I don't remember one, and it seems to be explicitly required here.
What do you mean by "commuting" here?
This was fantastic!
Regarding the final point about reading a private key from power usage, wouldn't it already need to be on the hacker's computer anyway for this to work? In which, case they already have your private key
No; see "side-channel attack". For example, the encryption may be running in a trusted execution environment (like ARM TrustZone or Apple Secure Enclave) and the adversary is running untrusted code outside of that environment that can determine power consumption / voltage levels / timings.
Another example: the adversary is sitting outside your house with a supersensitive probe in the electricity lines that run into your bedroom, where your desktop computer is plugged in, and is able to determine changes in voltage on the lines that way, which correspond to your computer's power consumption.
In both examples, the adversary does _not_ have any access to the trusted environment, but is able to acquire information about the behaviour of that environment which can be reverse-engineered to determine what was actually happening in that environment.
@@JivanPal thanks for the reply, that clears things up
Does this represent the smallest number of steps for any given exponent?
12:58 why can't we walk beck? If you have the binery you know the sequans of sq and mol and there is easy revers...
This is really cool! I'm glad that i learned about binary numbers, because I wouldn't have been able to understand any of this otherwise.
Mike Pound - Always good value 8-)
Amazing video, I almost want to incorportate it to my program.
When making a really big number, if you can't do the intermediary mod trick, how would the computer handle overflow to another word?
If you can't do "mod" at any step of the way, you have to allocate enough memory for a multi-word number having, as a number of bits, at least the sum of the number of bits in both multiplicands at every multiplication.
For example, if you're at a point where base^64 is 415 bits, you need at least 830 bits for a square (base^64 x base^64), since both multiplicands are 415 bits and 415 + 415 = 830. Then the multiplication proceeds as usual: on a 64-bit computer, bits 63 to 0 of each multiplicand contribute to partial sums at bits 127 to 0 of the result, and so on, until bits 447 to 384 contribute to partial sums at bits 895 to 768 of the result.
You could always pre-allocate enough bits for the entire number in advance. Then you would execute a modified square and multiply algorithm that just calculates how many bits you're likely to need to hold the final result given all the multiplications involved, allocate that (say it's 16190 bits), and execute the proper square and multiply in multi-word arithmetic on 16190 bits.
I wonder if square and multiply would lose to a naive multiply-only method for a small modulo m and a large exponent x. The naive method would multiply everytime and mod m, then cache the result, quickly generating a repeating pattern of length at most m, and then x mod patternLength would give the place in the pattern which is the answer. So the number of operations is at most m, compared to a square and multiply which can be very expensive for very large x.
Intresting, so a different base (instead of 2) can give different performances. For base = 3, will it be called cube, square and multiply algorithm?
It is true that in base _n,_ you will determine which of _n_ operations to do (e.g. for _n_ = 3, these are multiply, square, or cube). However, cubing is just multiplying a number by itself twice, so _n_ = 3 actually gives *_worse_* performance. In fact, _n_ ≥ 4 also gives worse performance, since what would be squaring twice in the _n_ = 2 case would become multiplying a number by itself four times. Thus, _n_ = 2 gives the best average performance.
Even more generally, square and multiply does not give the best possible performance / shortest possible algorithm for any given exponent, but it is the best we can do without solving a harder problem for the exponent first, which on average would give us much worse performance. It is this: if we know the totient of the modulus _m_ (or equivalently, the order of the multiplicative group { g^x mod m | x ∈ 𝐙 }, where _g_ is the generator of the group, i.e. the base of the modular exponential we're trying to compute, which was 23 in the video), which is denoted φ(m), then we have g^[φ(m)] mod m = 1 (by an extension of Fermat's Little Theorem), and so
g^x mod m
= g^[q φ(m) + r] mod m
= ( g^[φ(m)] ^ q ) g^r mod m
= g^r mod m,
where _q_ and _r_ are the quotient and remainder of _x_ divided by φ(m), respectively. However, computing φ(m) is hard unless the prime factorisation of _m_ is known, so in practice this is not used often. The hardness of this problem is what underpins the security of RSA.
yeah! in my assembly class, when we had to program, there is no multiplication operator, so you'd have to shift the bits of the binary value in the registers to actually multiply two numbers. Don't even get me started with bringing in value into registers, and these values can't be stored in just any registers... and then pulling the value out.. storing elsewhere... god bless ..
Very helpful.
thank you guys🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻, it was really insightful.
♥️
Could you make a video about opaque.
MIND BLOWING !
i like the fact that he just watch numberphile casually
FINALLY SOMETHING I CAN APPLY MY MATH III KNOWLEDGE ON.
math rules, i love these kind of tricks
MOD is one of the slowest operations on a computer, normally taking 32 or 64 cycles. Instead of MODing by the actual number each time, can you MOD by the next higher power of 2 and then do a MOD by the real value at the end? MOD by a power of 2 is simply an AND with a bit-mask, which is very fast.
No, we cannot lop off upper bits that way and mod by the real number at the end.
Here's the answer if we rerun the square-and-multiply but use "mod 8" to reduce at intermediate steps. For those unfamiliar, reading this comment, this is "and 7", because "and 7" is a bitmask that keeps the lower 3 bits, but that's equivalent to "mod 8": 15 mod 8 = 7, 15 and 7 = 1111b and 0111b = 0111b = 7.
Anyway,
3^45 mod 7 = 3^101101b mod 7
starts with 3
3 And 7 = 3
Square (3^1 -> 3^2): 3x3 = 9
9 And 7 = 1
Square (3^2 -> 3^4): 1x1 = 1
1 And 7 = 1
Multiply (3^4 -> 3^5): 1x3 = 3
3 And 7 = 3
Square (3^5 -> 3^10): 3x3 = 9
9 And 7 = 1
Multiply (3^10 -> 3^11): 1x3 = 3
3 And 7 = 3
Square (3^11 -> 3^22): 3x3 = 9
9 And 7 = 1
Square (3^22 -> 3^44): 1x1 = 1
1 And 7 = 1
Multiply (3^44 -> 3^45): 1x3 = 3
Final reduction with mod 7: 3 mod 7 = 3
The answer given by Dr Pound (and indeed WolframAlpha) is 6, but with "mod 8" intermediate reduction, we get 3.
We could reduce "mod" a multiple of 7, so "mod 14" would be a valid intermediate reduction, as would "mod 21", "mod 28" and so on. However, none of those multiples of 7 are going to be a power of 2 and allow for fast lopping off of bits with AND: a power of 2 would have only 2s as factors, and not any 7.
Crazy impressive
fooking genius mate
What data types hold numbers that big? If this is so common why is it so hard to find libraries to do computations on numbers this big?
OpenSSL has a big number library. If you look at their documentation look up BIGNUM and any function starting with BN_* . It dynamically allocates a buffer of however many bits it needs. So I guess the data type is an array of bytes. There are other bignum libs out there, that's just the one I've used.
@@DFPercush thanks, I’ll give it a try.
Square is ab it limit Multiplication, while Multiplication is a bit like addition.
Just in the amount it will change the overall result.
Cubes will be easier in this case. 3^3=27 which is (-1) in mod 7. 45 is a 3*15. So the answer is -1 = 6 in mod 7.
Easier for humans, not for computer, please don't forget that computers don't have instinct like us humans, and to write each case scenario is a nightmare, this method allows to be generalized with the complexity of O(2*floor(log2(n))+1) which n is the exponent of the number, which is very efficient in terms of complexity (big O means worst case scenario)