I think thats pretty funny because most of the times the algorithm will stop not because it ran "the correct optimal algorithm" but just print the answers and they happen to be right... It would only start to do better than "print all pairs of 2 numbers and check" when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long.
"when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long" -- it's funny because a lot of programming is exactly that, writing a piece of code that's shorter than the answer
Presumably, there exists an algorithm that requires a finite amount of code to implement regardless of the size of the input or output. That algorithm may be longer than the shortest "print()" solutions up to a point, but again: it doesn't grow as the input size grows while "print()" does. Just like how the program Gzip ramains the same size whether you want to compress a Megabyte file or a Petabyte file. There will be some largest number where "print()" is more efficient, beyond that, there are infinite numbers where implementing the actual algorithm is more efficient. Therefore the phrase "most of the time" is misleading here, because finite
Let's also admire the fact that this algorithm implements and executes perfectly aligned AGI. And of course all the paperclip maximizers and alike. The slowest doomsday device
@@PolylogCS you try to catch it, but what about the escape key on your key chain, did you remember to remove it after locking the AGI in it's block chain.
It'll also eventually output a 4K video of you and the grandma next door starring in a hardcore adult film directed by Michael Bay and narrated by Morgan Freeman
@@user-dh8oi2mk4f If we're talking about things with finite memory, they're technically not Turing Machines, which have an infinite amount of memory, so it's technically a different problem (hence why saying this doesn't break math and win you a Turing award)
Seems like for composite numbers where the string containing the solution program is big enough, before reaching that solution, the universal search algorithm might at one step generate a program which is also a universal search algorithm itself, searching for the same answer, which would also create another nested universal search algorithm program before reaching the answer and so on. That would be pretty cool :D
Yes, that's absolutely correct! Unfortunately, that other nested instance of itself won't have any chance of finishing faster than the high-level universal search, because it needs to do at least as many steps. What could be even more funny is that it'll implement some better universal search that might use some better programming language and its simulator which will run much faster and generate the correct program faster...
This reminds me of the Library of Babel for some reason. Somewhere in there, anything you can think of exists. Good luck finding it lol. (Although in that case there is a straightforward mapping between text and location.) In the case of factoring primes, I'm pretty sure it would be faster to brute force all the numbers < sqrt(p*q) than to randomly generate a program that would do it. I also wonder if the universal search algorithm could be quantum-ified to check all possibilities of, say, length n, at once. But of course a sophisticated quantum computer like that could crack large primes in its sleep anyway. Perhaps other problems would benefit from it though.
Interesting connection with the library of babel! Regarding quantum computers, one (relatively boring) thing you can do is to adapt universal search for them so that it is as fast as any quantum algorithm. But as you say, factoring is one of a few problems that quantum computers are known to solve quickly, so the question about the fastest quantum algorithm for them is perhaps not so interesting.
It isnt the same. Universal search is an algorizhm that implements every algorithm a turing machime can simulate, and so it can solve every problem. Moduloing a large number by every number smaller then it WILL solve prime factorization, but it will only solve time factorization.
> _"Library of Babel"_ i was thinking same as well... aside: when i watched vid about that weeks ago; i never thought that i would face that concept again so soon.
Amazing video, I didn't know about the universal search algorithm before and found it very interesting. I wish you had done the derivation for why doing exponential steps on previous machines had complexity 2^L + f(n) since I would've imagined a log_2 to show up, but I managed to figure it out (I think) for anyone interested: Since each level grows by powers of 2 when the next machine gets added, you need to add log_2(f(n)) new machines to the search before the machine at level L will finish (2^log_2(f(n)) steps = f(n) steps). This means the total height you get to is L + log_2(f(n)) and since the total number of operations is the sum of many consecutive powers of 2 2^0 + 2^1 + 2^2 + ... + 2^(L - 1 + log_2(f(n))) = 2^(L + log_2(f(n)) ) = 2^L + f(n) You get the final complexity in the video.
Yes! One reason why we did not go into this is that there is one subtlety lurking there that Ondřej Bouchala promptly uncovered. Check out also his comment! :) Also, there is a small mistake in your derivation: 2^(L + log(f(n))) = 2^L * f(n).
Great video! I'd like to add that "Polynomial time?"-level of abstraction is useful in areas like complexity theory (well, at least "old school" style sub-areas of it) since you might want to study time complexity as a property of a problem and not of an algorithm that solves it, so you would like to use a measure of complexity that depends as little as possible on computational model that you use. Different models of computations often disagree on the power of a polynomial, but very rarely disagree on polylogarithimc/polynomial/superpolynomial/subexponential/exponential/etc. levels of precision.
Syntax errors are really the same thing as program termination. So, it isn't hard to come up with a system where any code is valid code, i.e. syntax errors are impossible. In Brainfcuk, there exists only one syntax error: Incorrect brackets, like for example "][". If one redefines [ and ] to have well defined behavior in these cases, any string of BF commands will be a valid program. For example, redefine ] such that in the case of no matching [ it moves the instruction pointer to the beginning of the program, and redefine [ such that in the case of no matching ] it moves to the end of the program, i.e. just terminates it.
Generating AST instead of code would cut a lots of non runnable code. Assymptotically the same, but makes a little bit more sense. Oh, and also its funny, that once in a while it is going to generate itself, so its a kinda quine)
the moment I got my head around it my brain instantly went to a version of universal search that uses assembly commands and numbers instead of all characters, with the goal of finding the most optimal algorithms for math functions through brute force.
That actually exists. Compilers encounter multiplication by constants (e.g. size * 13) or other math functions. Compilers that generate highly optimized code have libraries of small chunks of code that compute specific math functions. They build that library by trying all possible small chunks of machine language.
Asymptotic complexity is great most of the time, you only have to not throw out the constant factor and it might not be the best idea to lexicographically order all programs by it
This is why we don't use big O in our course about algorithms. We use something like tilde notation witch adds the coefficient to the highest exponent term and you get a way better representation to the performance of an algorithm to compare to other algorithms.
You might enjoy looking at the video description! :) turns out you can make the multiplicative constant 1.01 and hide the horrible stuff to additive constant!
Any notation that hides information has to be assumed to only be useful in certain cases. The tilde that gives you slightly more information is nice, but is a level of abstraction that you sometimes don't need when simply analyzing an algorithm; I really don't care if an algorithm is 1000(n log n), since that's worse than being n^2 regardless, and the coefficient isn't going to be higher in any reasonable algorithm.
Well, worse bogo sort for all algorithms. It's basically a random output generator where for each output you check if it is correct. If you get a correct output from an algorithm, it doesn't mean it will give correct output for a different input. You could optimise it for the problem you are solving, for example generating pairs of 2 INTEGERS which would never output 'banana' but that is no fun given the context of universal search. This also works only for problems which have a constant time complexity for checking the solution, otherwise it wouldn't be "asymptoticaly optimal".
It's actually better than bogosort, because this algorithm is at least guaranteed to give you the correct answer in a finite amount of time while the time complexity for bogosort is unbounded, meaning it could just never land on the correct permutation if you're "lucky"
@@postmodernist1848 It's better than random bogosort, but roughly equivalent to deterministic bogosort. Deterministic bogosort works in effectively the same way as universal search, by iterating through all the possible solutions.
A simple way to speed up the search in this specific case, is to simply start from the other end. We know that in encryption, it will be two fairly large numbers of fairly similar magnitude, there won't be a factor of, say, 17. So, just start at the sqrt(x) and work downwards. To speed it up a bit more, remove simple to detect factors (2, 3, 5...). It won't speed up the general factorization case, but for cryptography, it'll make a big difference.
I find it weird that the most common measure of complexity in computer science is big O, ignoring the constant factor of the highest order term. In mathematics, I would only ever mean the equivalence relation "~" by the phrase "assymptotic to...", i.e. the ratio of the two functions tends to 1 in some limit (which could be n tends to infinity, as is always meant in computer science). And I think this is a much more generally useful term for algorithms where the behaviour is well understood (i.e., all the constants are easily measured and/or bounded). Even just giving an order of magnitude upper/lower bound to the coefficient of the leading term is enough information to tell you if the algorithm is practical or impractical. For example, someone might see that heap sort and merge sort are both O(n log n) comparison sorts, so would just think "they are both optimally fast, so I'll just use heap sort because merge sort takes a higher space complexity". But this misses the point that merge sort takes ~ n*log_2(n) comparisons, while heap sort takes ~ 2n*log_2(n) comparisons, in the worst case (can be optimised, but this is the simple case to demonstrate the point). So the truth is that heap sort takes O(n log n) *more* comparisons that merge sort, which might help explain the trade off that merge sort is making with space for time.
I totally agree that this way, you get more information about the algorithm! But it also gets hard to compute the leading constant, unless you decide to measure something as simple as "number of comparisons". Also, you don't capture cache behavior (important for quicksort) etc. Finally, it turns out (check the video description) that there is a universal search with the property that if there is an f(n) algorithm, the search has complexity 1.01f(n) + O(1), so even in your way of measuring complexity, universal search looks like an amazing algorithm.
You gave so little spotlight to the fact that Universal Search is an asymptotic limit to all those sequences of better and better algorithms... It is truly a wonder problems don't define inferior open boundaries in the algorithms space with the asymptotic complexity topology!
So you can theoretically apply this idea to find the asymptotically fastest algorithm for Matrix Multiplication? Take the input size to be N = n^2 where we are dealing with square matrices, then we know that there exists an (probabilistic) algorithm running in linear time (linear in N, the input size) that can check whether the resulting matrix C is in fact the solution to A * B (two matrices of size N. Then we can simply use this search to generate C and use this linear time check and get the "best" algorithm!
not really - what if you got lucky and found a random program that happened to find the solution for A*B, but it's useless for multiplying other matrices? you would need to run Universal Search for each new pair of matrices A and B you need to compute.
Imo it would be fun to use Symmetric Interaction Nets (which are useful for implementing something like Lamping's Abstract Algorithm) as the basis for Universal Search.
Nice catch! If you want to get rid of this, you have to think of checking together with the simulated program as one program. By that I mean that your allocated time is counted also for checking, so maybe you run out of it and you have to wait until the next iteration to continue checking. If you do it this way, everything sums up nicely and you get O(f(n)). We really did not want to get into that, since anyway for f(n) > n log(n) this term is dominated by O(f(n)), so you can also think of this imperfect search as achieving O(f(n) + n log n).
So the guarantee on any problem (say P problems) should be O(f(n)*log(f(n))) right? Checking can only be as hard as just solving the problem all over again!
@@yanntal954 We assume that, in this case, checking works in linear time and f(n) is the running time of the algorithm we compare ourselves with. Then, I claim the universal search with exponentially allocated steps with the trick I mentioned above is O(f(n)). Does it make sense?
@@PolylogCS I was talking more in general not just factoring, You should get this as a guarantee. It may not be tight but it will always be a valid upper bound! I think assuming Factoring is Omega(n*log(n)) is pretty reasonable and thus for factoring you should get O(f(n)). In general the checking process can take at most O(f(n)) (yet can't be more than that!) thus you get the aforementioned guarantee I mentioned
How much would it be possible to optimize Universal Search with AST-based code generation rather than text strings? If you start iterating over the input space of all possible grammatically valid programs rather than all possible text strings, this would cut down your constant factor by a fair degree. Of course, Python's not a particularly metaprogramming-friendly language, but I bet it could be done with some package somewhere...
Let us know if you manage to make this work :) Perhaps you can reach the naive factoring algorithm in such a universal search, but not much farther, I guess...
This is basically what Program Synthesis has been doing since the 1980s. How it usually works in practice is that Python is too big, you usually define a custom DSL - a specialized mini language - tailored for the algorithm you're trying to generate. For example, if you're trying to find a Sorting Algorithm, you define a mini langauge that has lists and common operations on them (swapping, iterating,...) as primitives. More importantly, the mini language *doesn't* have anything else not relevant to Sorting, no exceptions, no OOP, etc... You're essentially 'cheating', making things easier for the search algorithm by restricting its search space. Come to think of it, this bears resemblance to neutral network architectures as well, they have certain human-designed hyper parameters, and the training algorithm can focus on merely tweaking trainable parameters within those constraints. Program Synthesis is fascinating, with plenty of applications and analogs. SQL databases already have a Program Synthesizer that millions use everyday (the query planner).
I think this is analogous to asking a neural network to do it. The reward function would be checking if the output is correct and rewarding the network accordingly
yes, because the output of training is a model, and those models could represent the outputs of any function on a finite domain (to even put the input into a computer requires that the domain is finite)
I love how it's just a matter of having an infinite time span filling an infinite memory bank processed through an infinite computanional unit amount to have the best search algorithm. 👍
A problem I see with this is that as the number to be factored gets larger, any program using actual variables will start producing wrong results due to variable size limitations. So while L is a constant for each program, it might diverge towards infinity if there is no program that will accurately factor _all_ numbers at the fastest possible time complexity. If there is a variable type that can handle any unlimited amount of digits, and that can be used in a program with the fastest possible time complexity, then that might be solved. But I don't know whether such a variable exists and can be used in the programming language this algorithm runs in
This is a great question! It ultimately leads to "what is your mathematical model of computer?" Let's say that our model is Turing machine. There, there are no "variable types" and so on, so I hope that with this model of computer, everything we say in the video makes sense and you can see that we prove a mathematically correct statement there. Turing machine is not a good model of computers, more usual model is the so-called WordRAM model, which is basically C, but you assume that if the input has n bits, the basic variables you use have log(n) bits, so that you can e.g. index your input. For this model, our statement is also correct. Your problem as I see it is basically that for very large inputs, maybe the WordRAM model is not a good model of computer. And in that you are right, similarly to asymptotic complexity, wordRAM is just another model. Not sure how much sense I make :)
@@PolylogCS yeah, if you have a variable type that can have any power of 2 as its size (and thereby display any number in a finite variable) then this works out
Год назад+3
Python's supports arbitrarily large integers out of the box.
@ likely its not real if you can get beyond 512,1024 bit integers. Meaning that its likely in part software giving you the abstraction of super larger integers by in some form or fashion a collection of more standard 32,64bit ones. Reason i say likely is, you likely do have some 128 bit registers now adays, theres a flag you can throw at your C compiler to see. And as well since we do have sse,avx and their variants it may be python if when its built on a system passes some flags to the C compiler to build with avx extensions and maybe you can get some 512 to 1024 ones. But typically this is something ive seen when installing pandas/numpy. More broadly though, you cant address smaller than a byte on modern systems so an 8 bit integer. It would be technically impossible. So if you can have "odd" bit integers or anything beyond the max register size among all the avx/sse registers, then i do think it has to be a software abstraction. Probably 0 padded ints for bit sizes less than 64, and some arrays/linked lists as a software implementation with software defined operators of integer operations under the hood for crazy large beyond your max avx/sse register size.
@@stevepa3416 it doesn't matter if those numbers are not like that in hardware, as long as they are useful in software, so calling them "not real" is a disservice to them =). those numbers being able to be "arbitrary large" basically means that instead of being limited by architecture-defined word size, these numbers are limited by how much RAM you (can) give to the program. So they can be useful for universal search.
What is the smallest number for which this algorithm ends with the result of a sub algorithm that didn't simple guessed the right result but did really use some calculation? ^^
Good question. My guess is that the naive algorithm we showed, written as concisely as possible, could be among the first nontrivial algorithms to finish.
My favorite universal search algorithm is very simple and very powerful and it runs in the same time complexity as it takes to check if an answer is correct. step 1, create a universe for every possible input. step 2, check if the answer is correct. step 3, if not correct. Destroy the universe. Leaving you with only one universe with the correct answer right away
I've been thinking a bit about this vid and while I do appreciate the concept, I'd like to point out a flaw. It's important to note that any brainf*ck program can be represented exactly by a one-tape turing machine (so is bound in speed). The extended church turing thesis only yields that a 1-tape turing machine can implement an algorithm in any other computational system to some polynomial slowdown, and that polynomial slowdown can be pretty trivially shown to be problematic for even basic problems like classifying palindromes where any matching 1-tape machine is at best O(n^2) where with a multi tape setup we can see a simple O(1) method. You can say that if factoring is in P, you've found a poly time algo for it, but you can't be more fine grained than that without more care.
Small correction: the typical one-tape turing machine model involves pre-pasting the input onto the work tape, so brainf*ck could be faster in some very specific instances due to its relative freedom in i/o. We can properly bound the problem with a 3-tape turing machine with a work tape, input tape and output tape where the head on i/o tapes can only go right, though the analysis does not change (and the example of palindromes is not deprecated either due to our restriction on the input tape head movement and lack of knowledge of the string length)
What if the algorithm, by pure chance, found the RSA factorization to the bank account with the most money in it, and then, by pure chance, moved all that money into your account, and then, by pure chance, made it seem like no money ever left (just duplicated it), and then, by pure chance, any flags that would pick up on this on your account were 'disabled,' and then, by pure chance, any possible way for this to be noticed was 'programmed out,' all via 'pure chance.' So... what then?
@@PolylogCS Hopefully we find a 'computer virus' that hacks the world banks and "clones" money directly into your bank account... all without being traced!
This is interesting! I always thought of running some kind of Genetic Algorithm for a long time on a PC to generate a code to solve a very simple problem. I was curious how long it takes for an evolutionary process to come up with a solution (read find a species that fits the environment). Maybe that idea can be used here?!
I implemented this idea in lisp a few years ago, i can give you the github if interested. Althought it was very poorly implemented, say, for example, finding a program that could solve aquadratic equation took something like 20 minutes if i remember correctly
Let's just run this algorithm on a supercomputer with something other than python. After all we need to find an algorithm for it only once to break encryption.
This is universal valid language generator, valid on one criteria and one leanguage set. Many program make some, but not always you think. Universal search is anyway very interesting thing. My noob opinion in searching.
if you wanted to generate a program which multiplies two large prime numbers, you would have to factorise each result you get, but you don't know to do itz so you can run a universal search for it
I feel like if you actually write this program, the simplest program to “factor” a number will literally just print its factors without actually factoring
If you apply this algorithm to a case where there is no answer, (like factoring 5) your program would never finish. We know that none of its trial programs can ever output a factorization of 5, i.e. the algorithm will keep spawning new trial programs indefinitely; and it would never tell you that 5 is not factorizable. This means that the algorithm would be limited to inputs of which we _know_ ahead of time that they can be factored into at least two integers (i.e. only non-primes). And it doesn't seem like the primality question is solvable with a universal search. It would only work for tasks like sorting, where we know that every input has _an_ answer that satisfies the "is this sorted" test. One further bonus problem that you would care about if you wanted to run this algorithm in practice is that the algorithm would also eventually write all _viruses_ that are possible in python or brainfuck. I.e. your computer might crash before it gets to the factorization of 4. [Insert "In soviet Russia, factoring numbers hacks YOU" style joke.]
There are no viruses in Brainfuck, since the only operations that interact with the environment are "get a character from the input" and "write a character to the output." And remember, we can't use `eval`, since we need to run things one step at a time, so we actually need to write our own interpreter. So, just leave out all the file/network access, and everything that can be used to make a virus.
What if you find a non asymptotically optimal program so long before you find an asymptotically optimal one, that it succesfully factors your number before you have a chance to find the better one?
To avoid syntax errors being generated from your random program you simply need to define a way to store the program such that syntax errors aren't possible byte code for instance, can be good at this.
9:19 Wouldn't you get something that's roughly O(f(n)*log(f(n)))? Perhaps even O(f(n)*log^2(f(n))) by analogy to the previous construction, even though you won't get a triangle as before but you could perhaps rearrange the runs in such a manner?
@@PolylogCS I am not sure why that extra n*log(n) addition came from? but to my knowledge, log-RAM machines aren't physically possible because log(n) for every n means your machine has an unbounded word size that it can get constant time access to. For that reason I think the O(nlogn) time you get from the Harvey and Van DER Hoeven algorithm is (currently) your best bet, meaning the assumption that multiplication takes linear time probably isn't yet realizable?
@@yanntal954 If we take this discussion further, even Harvey and and der hoeven algorithm is not physically realizable, since it assumes you can do random access, which is not possible in our 3D space! (if you put too much information to one place, it collapses to a black hole) So maybe the Turing machine after all is the right model? But for large enough n anyway the universe suffers heat death before you finish computing. What I want to say is that for every model of computer, you can make an argument that it is not physically possible. In practice, Word RAM or pointer machine (there is btw no assumption on log(n) word size in the pointer machine model) are good models and maybe just view our video as a mathematical statement about these models.
There's a paper called "A note on probabilistically verifying integer and polynomial products" by Michael Kaminski from 1989 that basically let's us test in linear time probabilistically (with high probability of course), given integers a, b and c whether a * b = c. The reason why it's possible faster than Harvey and Van Der Hoevens algorithm is because it simply checks, it doesn't actually perform the calculation (at least not over Z). So, if a good enough pseudo-random generator is created, you can, in linear time do this using the mutitape Turing machine model!
I wonder if you can improve Universal Search? if you have N=p*q, then factoring N will have a program of the sort "print(p,q)" show up. Since the length of p and q is roughly the length of N, that means the order of the algorithm is faster than O(n). (I mean, of course we knew that - even guessing each number is O(sqrt(n)) ) But if you knew that a factoring algorithm existed and ran in "K" steps, you could provide a lower bound on time, and any program running more than K steps can be stopped. But i suppose the idea of that would be to find the holy grail factoring algorithm, and not just have to search every time. if you set a bound with the exponential growth as mentioned previously, you would have a runtime of K*L + 2*f(n) While L is ungodly big, as in the video, the factor in front isn't as bad.
If the upper bound K is based on some known factoring algorithm (e.g. guessing each number) then K is a function of n. Let's call that K(n). Then the runtime bound you give for the search algorithm is K(n)*L + 2*f(n) = O(K(n)), i.e. this expression is actually dominated by K(n). The algorithm with the upper bound is actually still faster than universal search, it's just that you don't actually run all those (L-1) bad programs the full K(n) steps, since the optimal algorithm L may terminate before you simulate all those extra steps in all those bad programs. This makes a big difference, if n is large enough. So the runtime is still O(f(n)) overall, and the constant factor is probably still something like 2^L if I had to guess.
yes, i think that's mostly correct. However, I don't think the constant factor is of order 2^L. The number of operations is bounded by K(n)*L + 2*f(n), and since a simple algortihm is to guess, K(n)=n in this scenario. So n*L + 2f(n) is the time, and of course, n*L dominates, but is far less than 2^L. (Would it be appropriate to call this O(nL)?)
@@purplenanite n is variable, 2^L is constant, so n gets much larger than 2^L as n gets arbitrarily large (i.e. asymptotically). In general, K(n) is asymptotically larger than f(n), so a*K(n) + b*f(n) is on the order of K(n), regardless of constants a and b. The point is that this bound isn't better than the original bound of 2^L * f(n), which still applies.
For an algorithm that works, there will be numbers such that writing print(p, q) is an even longer query thanks to how many digits it takes to represent p and q.
I wonder if it would be "useful" to have some sort of hyper threaded supercomputer somewhere just constantly running universal search looking for solutions to famous problems. And it just runs forever. And spits out what it finds for us to look at.
Problems for which no optimal solution exists aren't necessarily bizarre/pathological: it's thought that matrix multiplication is such a problem. Strassen like algorithms have been able to push the complexity down from O(n^3) to O(n^2.3728596). However, it is know that the infimum O(n^w) complexity is not achieved by any algorithm. I don't know enough to know if this precludes optimal algorithms with complexities like O(n^a * log(n)) or others not of the form O(n^a)
Hm, you can actually check the result of matrix multiplication in quadratic time, modulo randomness, and probably throwing one log factor to the complexity. This should imply via our universal search that there exists asymptotically optimal algorithm, modulo randomness, and one log factor. Is this contradicting your claim or not? Do you have a link to a discussion about this topic?
@@PolylogCS I don't have a better source than wikipedia / Computational_complexity_of_matrix_multiplication, which references a 1981 paper by Coppersmith and Winograd. And yeah as far as I can tell, this doesn't mean there's no optimal algorithm with a log factor or something like that. There's a few ways that the impossibility of achieving the infimum w could be reconciled with the universal optimality of universal search: - w is a more rough measure than O since it ignores log factors - I only know of a randomized algorithm to certify matrix multiplcation in quadratic time like you alluded to, and I'm not sure how that interacts with universal search - Universal search has the runtime of the optimal algorithm, so if there is no optimal algorithm, ie for every algorithm with a runtime of O(n^a) with a > w, there is another algorithm with runtime O(n^b) with a > b > w, then the time complexity of universal search simply wouldn't be well defined
@@hacatu Regarding your last point: the complexity of universal search is always well-defined in the sense that it is some function of n (I am not claiming it is a simple function or anything like that, just that it is well defined in the sense that it makes sense to talk about it). So, you can view the argument in our video as saying that under certain circumstances I can actually prove that there exists an asymptotically optimal algorithm for a given problem. For example, whenever you give me a sequence of algorithms with complexities n^a1, n^a2, ... with a1>a2>... for matrix multiplication, I look at my Universal search (checking by Freivald's trick which I assume should be in O(n^2 log(n)) randomized complexity) and if I then go through the argument in our video, I conclude that there exists a concrete algorithm (universal search for this problem) that has asymptotic complexity O(n^a1 + n^2 logn), O(n^a2 + n^2 logn), ... In particular, assuming that all a_i's are strictly bigger than 2, my algorithm is asymptotically as fast as any of the algorithms in the list. I conclude that there exists a "limit" algorithm at least as good as any algorithm in your list. Does it make sense?
@@PolylogCS I think that the closer an algorithm gets to the limit w, the longer the length of the shortest implementation L, and so if a program actually achieved the limit w, it would have infinite length. Universal search usually sweeps the length of the program under the rug, because it hugely affects the runtime but is independent from the input size, but it can't sweep an infinite length under the rug. But that's not a fully satisfying argument, because universal search matrix multiplication obviously has some fixed length K, so how could L(a) ever exceed K? If any program has length L ≥ K and is O(n^a), then by construction the complexity of universal search is at most O(n^a). So I'm not sure
@@hacatu Think of it in this way. Given an algorithm A, denote with A(n) the maximum time it takes to solve any instance of a problem of size n. The construction presented in the video shows that the algorithm US has the property limsup US(n)/A(n) < +oo, for every algorithm A. This implies that there cannot be any algorithm which is better than US in the sense that lim A(n)/US(n) = 0, i.e. US is optimal.
What if it just finds the program: Printf("2 2") Wouldnt that also be considered a solution? Or do you just have to check the program with multiple numbers so it is unlikely that the simple printf cheat would work?
This is absolutely possible and very likely. The search will just terminate there and will be happy with the valid solution. Note that we are only making a worst-case argument about the time complexity of the algorithm, but it is very possible that it will coincidentally finish faster when it encounters mostly completely wrong algorithm that accidentally produces the correct answer. The point is that in the extremely unlikely case that no such wrong algorithm that produces the correct answer exists, it will still eventually get to the 'correct' algorithm that produces the correct answer.
@@PolylogCS I see. So an upper limit to how much time is needed to find the wrong or right solution since it only has a flawed acceptance function to check. This feels a lot like training a neural net with a naive fitness function 😂
funnily enough if i had to guess this algorithm will probably just generate a print function before actually generating something that solves the problem since the print would probably be shorter character wise
Hmm, in practice, how often does base US actually find a general algorithm? Ignoring the input and just emitting the constants that happen to be the factors for the given instance of factoring seems like it would be shorter in length + execution time than a general algorithm that reads the input case, does work to factor it, and then emits the factors. At least with the improved one and sufficiently large inputs, the actual size of an algorithm discovered by US is likely to be shorter than the length of the input, so, that is good I'm reminded of Permutation City, which, you have almost certainly read, but, if you haven't, check it out. It is an excellent book by Greg Egan
simply using the fastest multiplication algorithm for checking isn't good enough, since you're not just multiplying numbers of length n. You have to check numbers of any length that can be generated in the available number of steps, which will put the time taken around f(n)log(f(n)) (multiplication can be done in nlogn for numbers of length n) for the biggest multiplications. To get there your checking function can just count the length of the factors, since if they're too long they can't be correct. That will make it around nlogn, rather than f(n)logf(n). doing all checks puts you around n*logn*log(f(n)) (overestimate for log(f(n)) checks of numbers with length f(n), in reality the max output length is decreasing for the later programs). this gives you O(f(n)+n*logn*log(f(n))) in the very end, for factoring we can assume that the first term will outgrow the second, so O(f(n)).
Actually, multiplication can be done in linear time in the standard ram model or on pointer machine. But you are totally right that if you choose a more fine grained model, then we are not optimal and probably losing at least one log factor for multiplication.
To accurately say, that this is asymptotically accurate algorithm, there is a need to explain, why we wouldn't stop at some solution with complexity g(n), such that O(g(n)) != O(f(n)). Suppose L is the so called `number` of an algorithm with complexity g(n) within the list of all `compiling` programs, and L' is the similiar number of an algorithm with O(f(n)), where f is the optimal complexity. Number of steps, performed to algorithm g(n) is actually number of steps performed to f(n), multiplied by some constant (it is somewhere near 2^(L' - L) ) So, as n approaches infinity g(n) * C > f(n), due to O(g(n)) != O(f(n)) and f(n) - is the optimal algorithm. Also, to me it is unclear the latest statement, that there is no infinity sequence of decaying complexity functions. We assumed, that L in analysis is a finite value, but if there is exists such sequence, wouldn't be this assumption wrong? Correct me if I am mistaken somewhere. Also sorry for my poor English, it is not my native :sadge:.
Great question! I agree that the argument about the infinite sequence of better and better algorithms is not explained in very much detail. Let me give you a small hint: Imagine there would be such a sequence of algorithms, our "Universal Search" algorithm is asymptotically not slower (or faster) than all of them - well, then this very algorithm, which we implemented in Python, is also somewhere in the list and that algorithm itself would be the finite one algorithm with some very specific (maybe unknown) complexity. Does that help to understand better?
So, the only problem is O? You can solve that fairly easily, if you design the hardware specifically for the task at hand, even more than CPU and GPU are designed for graphics and computation, then you can cut down on O by huge amounts and even on the time complexity itself by streamlining processes. This is the exact method the NES used to avoid the lag so common on other systems of its time and it shows.
Amazing video! I have a question though... wouldn't it be possible to run into an asymptotically unoptimal code before the asymptotically optimal one? For example, the universal search might start simulating an implementation of selection sort (with O(n^2) time) before merge sort (with O(nlogn) time). In this case, the universal search wouldn't be asymptotically optimal.
This is the same principle thats behind the universal turing machine right? Especially the diagonalization part. I wonder if its called universal search because of "universal" turing machines
Yes, I believe these are very similar concepts. I am not sure if in the Universal Turing Machines you typically generate all the valid programs and run them in parallel like we do, but I don't see why you could not. I think the UTM was mostly a concept that the machine can execute any other program that it is given as an input. On the other hand the "Universal Search" does a bit more than that as it uses the UTM as a subroutine while generating all the possible programs and running them in parallel in a smart way. Interesting question about the origin of the name - this website seems to elaborate a bit more about the connection and also suggest that the name was inspired by the UTM: www.scholarpedia.org/article/Universal_search#Universal_search
@@PolylogCSthe dovetailing of operations is what made me think of the universal turing machine. If I recall correctly, the universal turing machine uses the concept to simulate branches of execution. If it didn't, then the whole machine would loop if only one branch looped. Interesting read as well, thanks!
Idk what needs two videos to make and explain when all you have to do to find all the factors is this: let num = 4; function betterFactorFinder(n){ let factors = new Set(); for (let i=1; i
If we assume that this algorithm will go over all possible programs, then for each composite number there exists a program that returns its factors... thus making the asymptotic time O(1) ? For example, in the case of 4, the algorithm might find the following program "return 2, 2"
Not quite - if your number is **really large**, the search will find a "correct" algorithm before any other one. It's not O(1) because going through all the algorithms to find it is really difficult.
I didn't really understand one thing: if the universal search has time complexity O(f(n)) if there is an algorithm with time complexity f(n), why should it "pick" the fastest algorithm? Like say there are two algorithms, A and B, to solve problem P. A has time complexity O(x) and B has time complexity O(x^2). Why would universal search have a time complexity of O(x) instead of O(x^2) when used to solve problem P?
wouldn't universal search in general keep finding hardcoded solutions before any actual algorith? Because if that's the case the algorithm should have the same complexity as a brute forcing algorithm
For practical purposes, yes. You have to imagine what happens when the number to factor is so large it does not fit into the universe!
Год назад+3
Think about what happens if hardcoding the answer takes more bytes than computing the answer. As an example, instead of thinking about factoring, think about finding the asymptotically optimal algorithm to decode an mp3 file to wav. For the sake of argument, let's assume that the optimal algorithm takes 1 GiB to write down, then you will find it before hitting the hardcoded solution, if your output wav-file is longer than 1 GiB.
Год назад
@Patrick Lenihan Well, hardcoding is just a special case of your 'incorrect algorithm'.
Great video. What's nice about this algorithm is that it also solves the Halting Problem for free; each time you run it you could add each program that completes to a database, then anyone who writes a program and wants to know if it doesn't have the halting problem can simply look it up in that DB and get their answer, and depending on the length of the program this could be a lot quicker than running it in a simulator.
Well, I fear like that you still and up with the same problem that you don't actually know how long should you wait for the program to halt before pronouncing it infinite.
This algorithm will also eventually simulate itself, and by tracking steps allocated to each successful algorithm, and keeping the fastest one, you can generate the fastest one possible! Shame it'd take until the next upload. :(
You should probably run it in a sandbox and simulate a language that is not allowed to execute any system commands and can only access a virtual memory :-)
Universal search is a neat way to solve any problem, but checking if each string has valid syntax is quite a bottleneck. Maybe Intel can be persuaded to add some hardware instructions to make this a single cycle operation
Programs that don't halt are no issue because while they loop endlessly, the algorithm continues to search new programs until the correct one is found. The halting problem is avoided by allowing the algorithm to prematurely pause programs throughout the search process. Another way to think about it is that in the beginning of the video, the algorithm presented searches for the proper program serially, so a single non-halting program will suspend the search. By the end of the video, the algorithm presented searches for the program in parallel, so whether each individual program halts is of no consequence, as long as there exists a correct algorithm which does halt.
And that's exactly why the hypothetical algorithm that i thought, capable of theoretically compressing the bible, the divine commedy and my elementary sience essay written back to back (yep, three very important text) into likely less than 20 pages of word font 12 without any loss of information due to compression, is not available Just to make clear, the algorithm i'm talking about (that i won't explain in detail) can compress pretty much any enormous file into an incredibly small one refered to it's original size The idea is to use the data to generate pseudo random new data (all without any loss in data) till you find one of two specific condition (that i won't explain since i would have to explain the whole algorithm) where the file get compressed to an extreme amount (kinda as if you get randomly 10000000000000000000000000000000000000 and compress it by telling the number of zero but more efficient) The problem is, obviously, a computation time that put to shame most algorithm (probably not the one of this video) even if the decompression is a bit faster... Another problem is that short enough files become much more massive instead so, while it could theoretically compress all information on the internet in a surprisingly small size (for the original information size) a file of word with written only Hello would become bigger instead... Probably Well it's fun knowing you've created an incredibly powerful and useless algorithm no? Edit: Thanks to circuit craft that responded to my comment i readed this again and have noticed i wrote a few part in a wrong way. Sadly english isn't my first language and i didn’t properly express that this algorithm working is my assumption, or hypothesis if you prefer, based on a few facts The functioning of the algorithm and the continuation of the discussion is found below, thanks again to circuit craft for making me realize my poorly stated comment
The algorithm you are proposing cannot possibly exist, due to information theory. If you do have such an algorithm, you will either find that it does not terminate on some inputs, or the random data it generates is, on average, at least as long as the information you are trying to compress.
@@circuitcraft2399 the concept is that it generates random data till it find a prime that is then represented as a 0 followed by a separator and a number that indicates how many primes there are before The final compression also has a counter of how many operations are needed and another counter that imply an addition of the term to the original file written in numbers The compression works by association of the file to a single number (but not all numbers have a correspondence) What you've said is partially true, but it doesn't get stuck and the problem could be at most that some files exceed their size The logic is that, for a given size N of compressed file, it associate (N/4)^4 possible correspondences of files with a tendence of them being bigger. It's true that not all of the number represent a file but every file has a 1 on 1 correspondence to a number For an arbitrary big original size there should be an N after wich the compression tend to make file smaller and where the file will most likely be found. Well this is just an idea i had and i've not studied it too throughfully so there is a good possibility that it's wrong but that is to say only after a more rigorous study I would love to have it studied and have the proof i'm a fool but don't have time, exact knowledge and motivation to do this myself sadly
@@silver_3552 As I understand it, you search for least (greatest) prime number that is greater than (less than) your file's representation, then pair it with the difference between the two? It's true that the prime's index is logarithmic with respect to the input length, but your offest will grow with the size of the files. After all, as primes become more sparse, the nearest one will be farther away. Here's why it must be impossible: Suppose that the average compressed size of a file of length n is less than n. Then, there is some N where the compressed size of every n
@@circuitcraft2399 i trully didn't want to write this all... Ok The method is an iteration of factorization You can use the base you want but let's say you use base 10 for the example You start writing 0c that indicate 0 itaration and C is a simple separator You write a 1 or a 0 to differentiate even and odd numbers You then write the power of the prime factor with an A to indicate the next one and B to indicate a repetition For example 559021 becomes 1C 12A1B3A3 and it means 0C first iteration 1 odd 2A 2^2 1B3A 3×5×7 3 11^3 When you get this you reiterate the factorization on the number you got in base 12 (1C becomes 2C and isn't part of hte factorization due to it being the counter) A prime is written a nC 10Bm Where n is the number of iterations and m is the number of primes with a value less than half the original number Obviously you don't only need a prime but any number with the format xC (1/0)yBz Is ok where the parentesis if for even or odd; x is still the counter, y is the power of all the prime factors and z is the number of the prime factor "used" that represent a prime if y is 0 To assure that there is no loop you can then add a new number before the C like this 0D157C125A0B17 (not a real compression, just a random valid combination i wrote) Where the number before D can be signed and represent a number to add or subtract to the input. This allow a parallel computation that will give solutions since, at worst, you just add 1 till you find a prime And if there is a repetitions then you simply go to the next D Finally you could also add a E number to imply that at some point you transformed a compression represented in base 14, convert it back to base 12 and start from the start A little note, when i made the example with a number base 10 it was slightly wrong since the factorizarion is always done in base n+2 and the result are written in base n, in our example A and B are needed as separator and compressor but it's similar for any given base. The example started in base 10 and the result was also in base 10 only to make it more readable to everyone. If you wish to try and prove me wrong then i would be happy no matter the result since knowing i was wrong would be an interesting result in and of itself. If you have other questions then i'm all ears. Edit: it should obvious that if 125Cx is a valid compression then so are 124Cx, 123Cx and so on since they were steps of the compression But not all numbers are input since you can always try to decompress only to find AB somewhere. You could make it so that AB means A0B but we are interessed in compressing and then decompressing not just decompressing And, as you can see, all steps are done with the possibility to uniquely convert the successive step to the previous one so there is no loss in this i'll also add that the step with an E could be called at any moment and you could ideally make an F with the same function of the E and so on
@@silver_3552 Okay, I can see that this algorithm would translate any input number into some other number, in a reversible way. What I'm questioning is that the result is *shorter* than the input, on average. Can you justify that?
Só this is basically a Carnot Engine. But instead of thermodynamical machines it applies for computing ones. Basically a metric for efficiency but can't be built in reality.
@@deltamico If an algorithm that solves the problem exists, eventually (for high enough n) the universal search will reach said algorithm and thus won't check the stuff past that point. And we don't care about small n for an asymptotic analysis.
Blog post with more clarifications/details: vasekrozhon.wordpress.com/2024/09/03/the-most-powerful-and-useless-algorithm/
I think thats pretty funny because most of the times the algorithm will stop not because it ran "the correct optimal algorithm" but just print the answers and they happen to be right... It would only start to do better than "print all pairs of 2 numbers and check" when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long.
"when the code for a better algorithm is shorter than just one that prints the answer bc the answer is too long" -- it's funny because a lot of programming is exactly that, writing a piece of code that's shorter than the answer
Presumably, there exists an algorithm that requires a finite amount of code to implement regardless of the size of the input or output. That algorithm may be longer than the shortest "print()" solutions up to a point, but again: it doesn't grow as the input size grows while "print()" does. Just like how the program Gzip ramains the same size whether you want to compress a Megabyte file or a Petabyte file. There will be some largest number where "print()" is more efficient, beyond that, there are infinite numbers where implementing the actual algorithm is more efficient.
Therefore the phrase "most of the time" is misleading here, because finite
Waiting for this algorithm to finish is more akin to gambling than anything else. I love it 0_0
gambling addictions can be very srious 0_0
@@wadswa6958 there is a reason why we do computer science
Well..... doooof! It is 'randomly generating,' so of course it is 'akin to gambling.'
I would love to introduce you to ✨Bogo Sort✨
@@abraxas2658 bogos sorted 👽
Let's also admire the fact that this algorithm implements and executes perfectly aligned AGI. And of course all the paperclip maximizers and alike. The slowest doomsday device
This is why we added the try catch block to the checking function, we don't want the paperclip maximizers to escape!
Shoot I didn't think of that! QUICK someone stop the factoring machine!
@@PolylogCS you try to catch it, but what about the escape key on your key chain, did you remember to remove it after locking the AGI in it's block chain.
Only if they are generated before an algorithm that generates a valid solution (even in an invalid way), and execute before that altogether outputs
It'll also eventually output a 4K video of you and the grandma next door starring in a hardcore adult film directed by Michael Bay and narrated by Morgan Freeman
at 4:09, you can simply write a function that inputs the program and outputs if it runs infinitely in a loop or not ;)
One problem is that there is no program that could determine whether an input program halts or not (this is a so called halting problem)
this is one of the funniest either dumb or bait comments I've read in a while. "At this point, you could have simply solved the halting problem ;)"
@@PolylogCS Actually, it's possible for computers with finite memory!!!!!!
@@user-dh8oi2mk4f If we're talking about things with finite memory, they're technically not Turing Machines, which have an infinite amount of memory, so it's technically a different problem (hence why saying this doesn't break math and win you a Turing award)
Haha, good one, lol
Seems like for composite numbers where the string containing the solution program is big enough, before reaching that solution, the universal search algorithm might at one step generate a program which is also a universal search algorithm itself, searching for the same answer, which would also create another nested universal search algorithm program before reaching the answer and so on. That would be pretty cool :D
Yes, that's absolutely correct! Unfortunately, that other nested instance of itself won't have any chance of finishing faster than the high-level universal search, because it needs to do at least as many steps. What could be even more funny is that it'll implement some better universal search that might use some better programming language and its simulator which will run much faster and generate the correct program faster...
This reminds me of the Library of Babel for some reason. Somewhere in there, anything you can think of exists. Good luck finding it lol. (Although in that case there is a straightforward mapping between text and location.)
In the case of factoring primes, I'm pretty sure it would be faster to brute force all the numbers < sqrt(p*q) than to randomly generate a program that would do it. I also wonder if the universal search algorithm could be quantum-ified to check all possibilities of, say, length n, at once. But of course a sophisticated quantum computer like that could crack large primes in its sleep anyway. Perhaps other problems would benefit from it though.
Interesting connection with the library of babel! Regarding quantum computers, one (relatively boring) thing you can do is to adapt universal search for them so that it is as fast as any quantum algorithm. But as you say, factoring is one of a few problems that quantum computers are known to solve quickly, so the question about the fastest quantum algorithm for them is perhaps not so interesting.
Well, maybe you can use quantum computers to find the fastest algorithm so that it can later on be used by traditional computers
Because it is exactly the same. Which proves the P, NP problems are all bunk science.
It isnt the same. Universal search is an algorizhm that implements every algorithm a turing machime can simulate, and so it can solve every problem. Moduloing a large number by every number smaller then it WILL solve prime factorization, but it will only solve time factorization.
> _"Library of Babel"_
i was thinking same as well...
aside: when i watched vid about that weeks ago; i never thought that i would face that concept again so soon.
Amazing video, I didn't know about the universal search algorithm before and found it very interesting.
I wish you had done the derivation for why doing exponential steps on previous machines had complexity 2^L + f(n) since I would've imagined a log_2 to show up, but I managed to figure it out (I think) for anyone interested:
Since each level grows by powers of 2 when the next machine gets added, you need to add log_2(f(n)) new machines to the search before the machine at level L will finish (2^log_2(f(n)) steps = f(n) steps). This means the total height you get to is L + log_2(f(n)) and since the total number of operations is the sum of many consecutive powers of 2
2^0 + 2^1 + 2^2 + ... + 2^(L - 1 + log_2(f(n))) =
2^(L + log_2(f(n)) ) =
2^L + f(n)
You get the final complexity in the video.
Yes! One reason why we did not go into this is that there is one subtlety lurking there that Ondřej Bouchala promptly uncovered. Check out also his comment! :) Also, there is a small mistake in your derivation: 2^(L + log(f(n))) = 2^L * f(n).
@@PolylogCS could you pin it?
Great video! I'd like to add that "Polynomial time?"-level of abstraction is useful in areas like complexity theory (well, at least "old school" style sub-areas of it) since you might want to study time complexity as a property of a problem and not of an algorithm that solves it, so you would like to use a measure of complexity that depends as little as possible on computational model that you use. Different models of computations often disagree on the power of a polynomial, but very rarely disagree on polylogarithimc/polynomial/superpolynomial/subexponential/exponential/etc. levels of precision.
8:04 It's fun to think about how no syntax errors is required for this "worst case" scenario.
Syntax errors are really the same thing as program termination. So, it isn't hard to come up with a system where any code is valid code, i.e. syntax errors are impossible.
In Brainfcuk, there exists only one syntax error: Incorrect brackets, like for example "][".
If one redefines [ and ] to have well defined behavior in these cases, any string of BF commands will be a valid program.
For example, redefine ] such that in the case of no matching [ it moves the instruction pointer to the beginning of the program, and redefine [ such that in the case of no matching ] it moves to the end of the program, i.e. just terminates it.
This is a chronically underrated channel. Keep up the great videos!
Thanks!
Generating AST instead of code would cut a lots of non runnable code. Assymptotically the same, but makes a little bit more sense.
Oh, and also its funny, that once in a while it is going to generate itself, so its a kinda quine)
the moment I got my head around it my brain instantly went to a version of universal search that uses assembly commands and numbers instead of all characters, with the goal of finding the most optimal algorithms for math functions through brute force.
That actually exists. Compilers encounter multiplication by constants (e.g. size * 13) or other math functions. Compilers that generate highly optimized code have libraries of small chunks of code that compute specific math functions. They build that library by trying all possible small chunks of machine language.
Asymptotic complexity is great most of the time, you only have to not throw out the constant factor and it might not be the best idea to lexicographically order all programs by it
Python obeys a grammar. You could write an algorithm to generate valid python syntax and drastically reduce the search space
This is why we don't use big O in our course about algorithms. We use something like tilde notation witch adds the coefficient to the highest exponent term and you get a way better representation to the performance of an algorithm to compare to other algorithms.
You might enjoy looking at the video description! :) turns out you can make the multiplicative constant 1.01 and hide the horrible stuff to additive constant!
Any notation that hides information has to be assumed to only be useful in certain cases. The tilde that gives you slightly more information is nice, but is a level of abstraction that you sometimes don't need when simply analyzing an algorithm; I really don't care if an algorithm is 1000(n log n), since that's worse than being n^2 regardless, and the coefficient isn't going to be higher in any reasonable algorithm.
This is like the physics troll meme of computer science, I love it.
So… it’s bogo sort for factoring algorithms… but even worse. I love it!
Well, worse bogo sort for all algorithms. It's basically a random output generator where for each output you check if it is correct.
If you get a correct output from an algorithm, it doesn't mean it will give correct output for a different input.
You could optimise it for the problem you are solving, for example generating pairs of 2 INTEGERS which would never output 'banana'
but that is no fun given the context of universal search.
This also works only for problems which have a constant time complexity for checking the solution, otherwise it wouldn't be "asymptoticaly optimal".
No, you gotta sell it. It's the All purpouse general Genetic algorithm
It's actually better than bogosort, because this algorithm is at least guaranteed to give you the correct answer in a finite amount of time while the time complexity for bogosort is unbounded, meaning it could just never land on the correct permutation if you're "lucky"
@@postmodernist1848 It's better than random bogosort, but roughly equivalent to deterministic bogosort. Deterministic bogosort works in effectively the same way as universal search, by iterating through all the possible solutions.
i used to wonder why bogo sort even had a name - now i know.
what's more; i genuinely believed that it was just the youtube channel's given name
A simple way to speed up the search in this specific case, is to simply start from the other end. We know that in encryption, it will be two fairly large numbers of fairly similar magnitude, there won't be a factor of, say, 17. So, just start at the sqrt(x) and work downwards. To speed it up a bit more, remove simple to detect factors (2, 3, 5...).
It won't speed up the general factorization case, but for cryptography, it'll make a big difference.
I find it weird that the most common measure of complexity in computer science is big O, ignoring the constant factor of the highest order term. In mathematics, I would only ever mean the equivalence relation "~" by the phrase "assymptotic to...", i.e. the ratio of the two functions tends to 1 in some limit (which could be n tends to infinity, as is always meant in computer science). And I think this is a much more generally useful term for algorithms where the behaviour is well understood (i.e., all the constants are easily measured and/or bounded). Even just giving an order of magnitude upper/lower bound to the coefficient of the leading term is enough information to tell you if the algorithm is practical or impractical.
For example, someone might see that heap sort and merge sort are both O(n log n) comparison sorts, so would just think "they are both optimally fast, so I'll just use heap sort because merge sort takes a higher space complexity". But this misses the point that merge sort takes ~ n*log_2(n) comparisons, while heap sort takes ~ 2n*log_2(n) comparisons, in the worst case (can be optimised, but this is the simple case to demonstrate the point). So the truth is that heap sort takes O(n log n) *more* comparisons that merge sort, which might help explain the trade off that merge sort is making with space for time.
I totally agree that this way, you get more information about the algorithm! But it also gets hard to compute the leading constant, unless you decide to measure something as simple as "number of comparisons". Also, you don't capture cache behavior (important for quicksort) etc. Finally, it turns out (check the video description) that there is a universal search with the property that if there is an f(n) algorithm, the search has complexity 1.01f(n) + O(1), so even in your way of measuring complexity, universal search looks like an amazing algorithm.
Man finally got a channel which actively discusses CS problems and their algorithms passionately, your explanation of analysis is very good.
You gave so little spotlight to the fact that Universal Search is an asymptotic limit to all those sequences of better and better algorithms... It is truly a wonder problems don't define inferior open boundaries in the algorithms space with the asymptotic complexity topology!
So you can theoretically apply this idea to find the asymptotically fastest algorithm for Matrix Multiplication?
Take the input size to be N = n^2 where we are dealing with square matrices, then we know that there exists an (probabilistic) algorithm running in linear time (linear in N, the input size) that can check whether the resulting matrix C is in fact the solution to A * B (two matrices of size N. Then we can simply use this search to generate C and use this linear time check and get the "best" algorithm!
not really - what if you got lucky and found a random program that happened to find the solution for A*B, but it's useless for multiplying other matrices? you would need to run Universal Search for each new pair of matrices A and B you need to compute.
@@tantarudragos If you get lucky you still run in at most n^ω steps where ω is probably 2.
Imo it would be fun to use Symmetric Interaction Nets (which are useful for implementing something like Lamping's Abstract Algorithm) as the basis for Universal Search.
Lovely video about a fascinating concept, thanks!
Very good video, i'm a computer science student from Brazil!!
In the improved case we need to run the checking of the result log(f(n)) times. That gives log(f(n))*n. How do we know that it is O(f(n))?
Nice catch! If you want to get rid of this, you have to think of checking together with the simulated program as one program. By that I mean that your allocated time is counted also for checking, so maybe you run out of it and you have to wait until the next iteration to continue checking. If you do it this way, everything sums up nicely and you get O(f(n)). We really did not want to get into that, since anyway for f(n) > n log(n) this term is dominated by O(f(n)), so you can also think of this imperfect search as achieving O(f(n) + n log n).
So the guarantee on any problem (say P problems) should be O(f(n)*log(f(n))) right?
Checking can only be as hard as just solving the problem all over again!
@@yanntal954 We assume that, in this case, checking works in linear time and f(n) is the running time of the algorithm we compare ourselves with. Then, I claim the universal search with exponentially allocated steps with the trick I mentioned above is O(f(n)). Does it make sense?
@@PolylogCS I was talking more in general not just factoring, You should get this as a guarantee. It may not be tight but it will always be a valid upper bound!
I think assuming Factoring is Omega(n*log(n)) is pretty reasonable and thus for factoring you should get O(f(n)). In general the checking process can take at most O(f(n)) (yet can't be more than that!) thus you get the aforementioned guarantee I mentioned
ny favourite non-constant time complexity function is the disjoint-set data structure search, which is amortized O(inverse_Ackermaan(n))
Ah yes, the classic case where big O hides a dependence on another quantity than the input size
3:05 As someone who loves writing in Raku (perl 6), I hate to say it but you are correct.
I did a spit-take when you said to 'simply allocate exponentially more simulation steps to earlier programs'. Grotty! Imagine how slow that is!
It works! Until you hit a program that infinite loops, like +[]. (The universal search algorithm they implemented is in brainfuck...)
"Is often obvious just by looking at it" oh I wish
Great video, love seeing accessible cs theory on yt
Glad you enjoyed!
7:44 or more likely, it would terminate because of an algorithm that says something like this:
return 2,2
was waiting for it
Just out of curiosity, did your program ever give you a result?
No.
How much would it be possible to optimize Universal Search with AST-based code generation rather than text strings? If you start iterating over the input space of all possible grammatically valid programs rather than all possible text strings, this would cut down your constant factor by a fair degree.
Of course, Python's not a particularly metaprogramming-friendly language, but I bet it could be done with some package somewhere...
Let us know if you manage to make this work :) Perhaps you can reach the naive factoring algorithm in such a universal search, but not much farther, I guess...
This is basically what Program Synthesis has been doing since the 1980s.
How it usually works in practice is that Python is too big, you usually define a custom DSL - a specialized mini language - tailored for the algorithm you're trying to generate. For example, if you're trying to find a Sorting Algorithm, you define a mini langauge that has lists and common operations on them (swapping, iterating,...) as primitives. More importantly, the mini language *doesn't* have anything else not relevant to Sorting, no exceptions, no OOP, etc... You're essentially 'cheating', making things easier for the search algorithm by restricting its search space. Come to think of it, this bears resemblance to neutral network architectures as well, they have certain human-designed hyper parameters, and the training algorithm can focus on merely tweaking trainable parameters within those constraints.
Program Synthesis is fascinating, with plenty of applications and analogs. SQL databases already have a Program Synthesizer that millions use everyday (the query planner).
"... and in the remaining cases PERL programs" :D haha ❤
Love the "lessons" at the end!
Thanks, Przemek!
I think this is analogous to asking a neural network to do it. The reward function would be checking if the output is correct and rewarding the network accordingly
yes, because the output of training is a model, and those models could represent the outputs of any function on a finite domain (to even put the input into a computer requires that the domain is finite)
Its definitely no genetic algorithm.
I love how it's just a matter of having an infinite time span filling an infinite memory bank processed through an infinite computanional unit amount to have the best search algorithm. 👍
A problem I see with this is that as the number to be factored gets larger, any program using actual variables will start producing wrong results due to variable size limitations. So while L is a constant for each program, it might diverge towards infinity if there is no program that will accurately factor _all_ numbers at the fastest possible time complexity.
If there is a variable type that can handle any unlimited amount of digits, and that can be used in a program with the fastest possible time complexity, then that might be solved. But I don't know whether such a variable exists and can be used in the programming language this algorithm runs in
This is a great question! It ultimately leads to "what is your mathematical model of computer?" Let's say that our model is Turing machine. There, there are no "variable types" and so on, so I hope that with this model of computer, everything we say in the video makes sense and you can see that we prove a mathematically correct statement there. Turing machine is not a good model of computers, more usual model is the so-called WordRAM model, which is basically C, but you assume that if the input has n bits, the basic variables you use have log(n) bits, so that you can e.g. index your input. For this model, our statement is also correct. Your problem as I see it is basically that for very large inputs, maybe the WordRAM model is not a good model of computer. And in that you are right, similarly to asymptotic complexity, wordRAM is just another model. Not sure how much sense I make :)
@@PolylogCS yeah, if you have a variable type that can have any power of 2 as its size (and thereby display any number in a finite variable) then this works out
Python's supports arbitrarily large integers out of the box.
@ likely its not real if you can get beyond 512,1024 bit integers. Meaning that its likely in part software giving you the abstraction of super larger integers by in some form or fashion a collection of more standard 32,64bit ones.
Reason i say likely is, you likely do have some 128 bit registers now adays, theres a flag you can throw at your C compiler to see. And as well since we do have sse,avx and their variants it may be python if when its built on a system passes some flags to the C compiler to build with avx extensions and maybe you can get some 512 to 1024 ones. But typically this is something ive seen when installing pandas/numpy.
More broadly though, you cant address smaller than a byte on modern systems so an 8 bit integer. It would be technically impossible. So if you can have "odd" bit integers or anything beyond the max register size among all the avx/sse registers, then i do think it has to be a software abstraction. Probably 0 padded ints for bit sizes less than 64, and some arrays/linked lists as a software implementation with software defined operators of integer operations under the hood for crazy large beyond your max avx/sse register size.
@@stevepa3416 it doesn't matter if those numbers are not like that in hardware, as long as they are useful in software, so calling them "not real" is a disservice to them =). those numbers being able to be "arbitrary large" basically means that instead of being limited by architecture-defined word size, these numbers are limited by how much RAM you (can) give to the program. So they can be useful for universal search.
What is the smallest number for which this algorithm ends with the result of a sub algorithm that didn't simple guessed the right result but did really use some calculation? ^^
Good question. My guess is that the naive algorithm we showed, written as concisely as possible, could be among the first nontrivial algorithms to finish.
My favorite universal search algorithm is very simple and very powerful and it runs in the same time complexity as it takes to check if an answer is correct.
step 1, create a universe for every possible input.
step 2, check if the answer is correct.
step 3, if not correct. Destroy the universe.
Leaving you with only one universe with the correct answer right away
that's called a nondeterministic turing machine
7:30 nice arpeggio
Should have written a program that generates valid python asts to make it faster :D
well it generates valid brainfuck which is easier to generate
I've been thinking a bit about this vid and while I do appreciate the concept, I'd like to point out a flaw.
It's important to note that any brainf*ck program can be represented exactly by a one-tape turing machine (so is bound in speed). The extended church turing thesis only yields that a 1-tape turing machine can implement an algorithm in any other computational system to some polynomial slowdown, and that polynomial slowdown can be pretty trivially shown to be problematic for even basic problems like classifying palindromes where any matching 1-tape machine is at best O(n^2) where with a multi tape setup we can see a simple O(1) method.
You can say that if factoring is in P, you've found a poly time algo for it, but you can't be more fine grained than that without more care.
Small correction: the typical one-tape turing machine model involves pre-pasting the input onto the work tape, so brainf*ck could be faster in some very specific instances due to its relative freedom in i/o. We can properly bound the problem with a 3-tape turing machine with a work tape, input tape and output tape where the head on i/o tapes can only go right, though the analysis does not change (and the example of palindromes is not deprecated either due to our restriction on the input tape head movement and lack of knowledge of the string length)
What if the algorithm ended up checking and running a dangerous program before it found the factorisation?
You definitely need to be careful with the sandboxing since you are iterating also over all computer viruses in universal search... :)
What if the algorithm, by pure chance, found the RSA factorization to the bank account with the most money in it, and then, by pure chance, moved all that money into your account, and then, by pure chance, made it seem like no money ever left (just duplicated it), and then, by pure chance, any flags that would pick up on this on your account were 'disabled,' and then, by pure chance, any possible way for this to be noticed was 'programmed out,' all via 'pure chance.'
So... what then?
@@PolylogCS Hopefully we find a 'computer virus' that hacks the world banks and "clones" money directly into your bank account... all without being traced!
@@pyropulseIXXI Then oof
How do we know this comment was not generated by Universal Search during the factorization of 42?
Amazing video!
When you realized that Genetic Algorithm is just a special case of Universal Search 😶
Your room is so cool!
This is interesting! I always thought of running some kind of Genetic Algorithm for a long time on a PC to generate a code to solve a very simple problem. I was curious how long it takes for an evolutionary process to come up with a solution (read find a species that fits the environment). Maybe that idea can be used here?!
I implemented this idea in lisp a few years ago, i can give you the github if interested. Althought it was very poorly implemented, say, for example, finding a program that could solve aquadratic equation took something like 20 minutes if i remember correctly
Neat video, reminds me of the library of babel
Let's just run this algorithm on a supercomputer with something other than python. After all we need to find an algorithm for it only once to break encryption.
This is universal valid language generator, valid on one criteria and one leanguage set. Many program make some, but not always you think. Universal search is anyway very interesting thing. My noob opinion in searching.
What's the memory complexity of this algorithm?
if you wanted to generate a program which multiplies two large prime numbers, you would have to factorise each result you get, but you don't know to do itz so you can run a universal search for it
I thought you would continue by addressing p vs np
I feel like if you actually write this program, the simplest program to “factor” a number will literally just print its factors without actually factoring
If you apply this algorithm to a case where there is no answer, (like factoring 5) your program would never finish. We know that none of its trial programs can ever output a factorization of 5, i.e. the algorithm will keep spawning new trial programs indefinitely; and it would never tell you that 5 is not factorizable.
This means that the algorithm would be limited to inputs of which we _know_ ahead of time that they can be factored into at least two integers (i.e. only non-primes). And it doesn't seem like the primality question is solvable with a universal search.
It would only work for tasks like sorting, where we know that every input has _an_ answer that satisfies the "is this sorted" test.
One further bonus problem that you would care about if you wanted to run this algorithm in practice is that the algorithm would also eventually write all _viruses_ that are possible in python or brainfuck. I.e. your computer might crash before it gets to the factorization of 4.
[Insert "In soviet Russia, factoring numbers hacks YOU" style joke.]
There are no viruses in Brainfuck, since the only operations that interact with the environment are "get a character from the input" and "write a character to the output." And remember, we can't use `eval`, since we need to run things one step at a time, so we actually need to write our own interpreter. So, just leave out all the file/network access, and everything that can be used to make a virus.
Video description contains an additional trick with which you can generalize universal search even to these cases.
What if you find a non asymptotically optimal program so long before you find an asymptotically optimal one, that it succesfully factors your number before you have a chance to find the better one?
Good stuff, my friend
Oh my god you even got a haircut to match Levin
confused about everything, but great video!
Somebody really liked bogosort and wanted to apply it to every other problem in computer science...
To avoid syntax errors being generated from your random program you simply need to define a way to store the program such that syntax errors aren't possible byte code for instance, can be good at this.
Nevermind, you used BrainF which happens to be just as effective.
9:19 Wouldn't you get something that's roughly O(f(n)*log(f(n)))?
Perhaps even O(f(n)*log^2(f(n))) by analogy to the previous construction, even though you won't get a triangle as before but you could perhaps rearrange the runs in such a manner?
Check out the answers to comments from TheDmviper and Ondřej Bouchala!
@@PolylogCS I am not sure why that extra n*log(n) addition came from?
but to my knowledge, log-RAM machines aren't physically possible because log(n) for every n means your machine has an unbounded word size that it can get constant time access to.
For that reason I think the O(nlogn) time you get from the Harvey and Van DER Hoeven algorithm is (currently) your best bet, meaning the assumption that multiplication takes linear time probably isn't yet realizable?
@@yanntal954 If we take this discussion further, even Harvey and and der hoeven algorithm is not physically realizable, since it assumes you can do random access, which is not possible in our 3D space! (if you put too much information to one place, it collapses to a black hole) So maybe the Turing machine after all is the right model? But for large enough n anyway the universe suffers heat death before you finish computing. What I want to say is that for every model of computer, you can make an argument that it is not physically possible. In practice, Word RAM or pointer machine (there is btw no assumption on log(n) word size in the pointer machine model) are good models and maybe just view our video as a mathematical statement about these models.
There's a paper called "A note on probabilistically verifying integer and polynomial products" by Michael Kaminski from 1989 that basically let's us test in linear time probabilistically (with high probability of course), given integers a, b and c whether a * b = c.
The reason why it's possible faster than Harvey and Van Der Hoevens algorithm is because it simply checks, it doesn't actually perform the calculation (at least not over Z).
So, if a good enough pseudo-random generator is created, you can, in linear time do this using the mutitape Turing machine model!
Love this stuff, but i might not be your regular audience, as this is the first video of yours I've watched.
I wonder if you can improve Universal Search?
if you have N=p*q, then factoring N will have a program of the sort "print(p,q)" show up. Since the length of p and q is roughly the length of N, that means the order of the algorithm is faster than O(n).
(I mean, of course we knew that - even guessing each number is O(sqrt(n)) )
But if you knew that a factoring algorithm existed and ran in "K" steps, you could provide a lower bound on time, and any program running more than K steps can be stopped.
But i suppose the idea of that would be to find the holy grail factoring algorithm, and not just have to search every time.
if you set a bound with the exponential growth as mentioned previously, you would have a runtime of K*L + 2*f(n)
While L is ungodly big, as in the video, the factor in front isn't as bad.
Nice idea!
If the upper bound K is based on some known factoring algorithm (e.g. guessing each number) then K is a function of n. Let's call that K(n). Then the runtime bound you give for the search algorithm is K(n)*L + 2*f(n) = O(K(n)), i.e. this expression is actually dominated by K(n).
The algorithm with the upper bound is actually still faster than universal search, it's just that you don't actually run all those (L-1) bad programs the full K(n) steps, since the optimal algorithm L may terminate before you simulate all those extra steps in all those bad programs. This makes a big difference, if n is large enough. So the runtime is still O(f(n)) overall, and the constant factor is probably still something like 2^L if I had to guess.
yes, i think that's mostly correct. However, I don't think the constant factor is of order 2^L.
The number of operations is bounded by K(n)*L + 2*f(n), and since a simple algortihm is to guess, K(n)=n in this scenario.
So n*L + 2f(n) is the time, and of course, n*L dominates, but is far less than 2^L.
(Would it be appropriate to call this O(nL)?)
@@purplenanite n is variable, 2^L is constant, so n gets much larger than 2^L as n gets arbitrarily large (i.e. asymptotically).
In general, K(n) is asymptotically larger than f(n), so a*K(n) + b*f(n) is on the order of K(n), regardless of constants a and b. The point is that this bound isn't better than the original bound of 2^L * f(n), which still applies.
For an algorithm that works, there will be numbers such that writing print(p, q) is an even longer query thanks to how many digits it takes to represent p and q.
I wonder if it would be "useful" to have some sort of hyper threaded supercomputer somewhere just constantly running universal search looking for solutions to famous problems. And it just runs forever. And spits out what it finds for us to look at.
Problems for which no optimal solution exists aren't necessarily bizarre/pathological: it's thought that matrix multiplication is such a problem. Strassen like algorithms have been able to push the complexity down from O(n^3) to O(n^2.3728596). However, it is know that the infimum O(n^w) complexity is not achieved by any algorithm. I don't know enough to know if this precludes optimal algorithms with complexities like O(n^a * log(n)) or others not of the form O(n^a)
Hm, you can actually check the result of matrix multiplication in quadratic time, modulo randomness, and probably throwing one log factor to the complexity. This should imply via our universal search that there exists asymptotically optimal algorithm, modulo randomness, and one log factor. Is this contradicting your claim or not? Do you have a link to a discussion about this topic?
@@PolylogCS I don't have a better source than wikipedia / Computational_complexity_of_matrix_multiplication, which references a 1981 paper by Coppersmith and Winograd. And yeah as far as I can tell, this doesn't mean there's no optimal algorithm with a log factor or something like that. There's a few ways that the impossibility of achieving the infimum w could be reconciled with the universal optimality of universal search:
- w is a more rough measure than O since it ignores log factors
- I only know of a randomized algorithm to certify matrix multiplcation in quadratic time like you alluded to, and I'm not sure how that interacts with universal search
- Universal search has the runtime of the optimal algorithm, so if there is no optimal algorithm, ie for every algorithm with a runtime of O(n^a) with a > w, there is another algorithm with runtime O(n^b) with a > b > w, then the time complexity of universal search simply wouldn't be well defined
@@hacatu Regarding your last point: the complexity of universal search is always well-defined in the sense that it is some function of n (I am not claiming it is a simple function or anything like that, just that it is well defined in the sense that it makes sense to talk about it). So, you can view the argument in our video as saying that under certain circumstances I can actually prove that there exists an asymptotically optimal algorithm for a given problem. For example, whenever you give me a sequence of algorithms with complexities n^a1, n^a2, ... with a1>a2>... for matrix multiplication, I look at my Universal search (checking by Freivald's trick which I assume should be in O(n^2 log(n)) randomized complexity) and if I then go through the argument in our video, I conclude that there exists a concrete algorithm (universal search for this problem) that has asymptotic complexity O(n^a1 + n^2 logn), O(n^a2 + n^2 logn), ... In particular, assuming that all a_i's are strictly bigger than 2, my algorithm is asymptotically as fast as any of the algorithms in the list. I conclude that there exists a "limit" algorithm at least as good as any algorithm in your list. Does it make sense?
@@PolylogCS I think that the closer an algorithm gets to the limit w, the longer the length of the shortest implementation L, and so if a program actually achieved the limit w, it would have infinite length. Universal search usually sweeps the length of the program under the rug, because it hugely affects the runtime but is independent from the input size, but it can't sweep an infinite length under the rug. But that's not a fully satisfying argument, because universal search matrix multiplication obviously has some fixed length K, so how could L(a) ever exceed K? If any program has length L ≥ K and is O(n^a), then by construction the complexity of universal search is at most O(n^a). So I'm not sure
@@hacatu Think of it in this way. Given an algorithm A, denote with A(n) the maximum time it takes to solve any instance of a problem of size n. The construction presented in the video shows that the algorithm US has the property limsup US(n)/A(n) < +oo, for every algorithm A. This implies that there cannot be any algorithm which is better than US in the sense that lim A(n)/US(n) = 0, i.e. US is optimal.
What if it just finds the program:
Printf("2 2")
Wouldnt that also be considered a solution?
Or do you just have to check the program with multiple numbers so it is unlikely that the simple printf cheat would work?
This is absolutely possible and very likely. The search will just terminate there and will be happy with the valid solution. Note that we are only making a worst-case argument about the time complexity of the algorithm, but it is very possible that it will coincidentally finish faster when it encounters mostly completely wrong algorithm that accidentally produces the correct answer. The point is that in the extremely unlikely case that no such wrong algorithm that produces the correct answer exists, it will still eventually get to the 'correct' algorithm that produces the correct answer.
@@PolylogCS I see. So an upper limit to how much time is needed to find the wrong or right solution since it only has a flawed acceptance function to check. This feels a lot like training a neural net with a naive fitness function 😂
This is hillarious!
Love the video. US seems like an extremely clever hack, I love it.
funnily enough if i had to guess this algorithm will probably just generate a print function before actually generating something that solves the problem since the print would probably be shorter character wise
Hmm, in practice, how often does base US actually find a general algorithm? Ignoring the input and just emitting the constants that happen to be the factors for the given instance of factoring seems like it would be shorter in length + execution time than a general algorithm that reads the input case, does work to factor it, and then emits the factors. At least with the improved one and sufficiently large inputs, the actual size of an algorithm discovered by US is likely to be shorter than the length of the input, so, that is good
I'm reminded of Permutation City, which, you have almost certainly read, but, if you haven't, check it out. It is an excellent book by Greg Egan
simply using the fastest multiplication algorithm for checking isn't good enough, since you're not just multiplying numbers of length n.
You have to check numbers of any length that can be generated in the available number of steps, which will put the time taken around f(n)log(f(n)) (multiplication can be done in nlogn for numbers of length n) for the biggest multiplications.
To get there your checking function can just count the length of the factors, since if they're too long they can't be correct. That will make it around nlogn, rather than f(n)logf(n). doing all checks puts you around n*logn*log(f(n)) (overestimate for log(f(n)) checks of numbers with length f(n), in reality the max output length is decreasing for the later programs).
this gives you O(f(n)+n*logn*log(f(n))) in the very end, for factoring we can assume that the first term will outgrow the second, so O(f(n)).
Actually, multiplication can be done in linear time in the standard ram model or on pointer machine. But you are totally right that if you choose a more fine grained model, then we are not optimal and probably losing at least one log factor for multiplication.
@@PolylogCS Ah, I was solely thinking about realistic models of computation. Thanks for bringing that up, I have now also seen the video description.
To accurately say, that this is asymptotically accurate algorithm, there is a need to explain, why we wouldn't stop at some solution with complexity g(n), such that O(g(n)) != O(f(n)).
Suppose L is the so called `number` of an algorithm with complexity g(n) within the list of all `compiling` programs, and L' is the similiar number of an algorithm with O(f(n)), where f is the optimal complexity.
Number of steps, performed to algorithm g(n) is actually number of steps performed to f(n), multiplied by some constant (it is somewhere near 2^(L' - L) )
So, as n approaches infinity g(n) * C > f(n), due to O(g(n)) != O(f(n)) and f(n) - is the optimal algorithm.
Also, to me it is unclear the latest statement, that there is no infinity sequence of decaying complexity functions.
We assumed, that L in analysis is a finite value, but if there is exists such sequence, wouldn't be this assumption wrong?
Correct me if I am mistaken somewhere.
Also sorry for my poor English, it is not my native :sadge:.
Great question! I agree that the argument about the infinite sequence of better and better algorithms is not explained in very much detail. Let me give you a small hint: Imagine there would be such a sequence of algorithms, our "Universal Search" algorithm is asymptotically not slower (or faster) than all of them - well, then this very algorithm, which we implemented in Python, is also somewhere in the list and that algorithm itself would be the finite one algorithm with some very specific (maybe unknown) complexity. Does that help to understand better?
this sounds like it comes VERY close to the halting problem.
So, the only problem is O? You can solve that fairly easily, if you design the hardware specifically for the task at hand, even more than CPU and GPU are designed for graphics and computation, then you can cut down on O by huge amounts and even on the time complexity itself by streamlining processes. This is the exact method the NES used to avoid the lag so common on other systems of its time and it shows.
Yea it does seem slightly slow, have you tried caching?
Amazing video! I have a question though... wouldn't it be possible to run into an asymptotically unoptimal code before the asymptotically optimal one? For example, the universal search might start simulating an implementation of selection sort (with O(n^2) time) before merge sort (with O(nlogn) time). In this case, the universal search wouldn't be asymptotically optimal.
it doesn't matter what algorithms besides the one we want it simulates, recall it doesn't even care if those algorithms run forever.
This is the same principle thats behind the universal turing machine right? Especially the diagonalization part.
I wonder if its called universal search because of "universal" turing machines
Yes, I believe these are very similar concepts. I am not sure if in the Universal Turing Machines you typically generate all the valid programs and run them in parallel like we do, but I don't see why you could not. I think the UTM was mostly a concept that the machine can execute any other program that it is given as an input. On the other hand the "Universal Search" does a bit more than that as it uses the UTM as a subroutine while generating all the possible programs and running them in parallel in a smart way. Interesting question about the origin of the name - this website seems to elaborate a bit more about the connection and also suggest that the name was inspired by the UTM: www.scholarpedia.org/article/Universal_search#Universal_search
@@PolylogCSthe dovetailing of operations is what made me think of the universal turing machine. If I recall correctly, the universal turing machine uses the concept to simulate branches of execution. If it didn't, then the whole machine would loop if only one branch looped.
Interesting read as well, thanks!
Hey, but simulating takes n log n, or I missed some big breakthrough in TCS?
What do you mean by simulating?
Idk what needs two videos to make and explain when all you have to do to find all the factors is this:
let num = 4;
function betterFactorFinder(n){
let factors = new Set();
for (let i=1; i
If we assume that this algorithm will go over all possible programs, then for each composite number there exists a program that returns its factors... thus making the asymptotic time O(1) ?
For example, in the case of 4, the algorithm might find the following program "return 2, 2"
Not quite - if your number is **really large**, the search will find a "correct" algorithm before any other one. It's not O(1) because going through all the algorithms to find it is really difficult.
BTW. 7:30 it's totally sound effect from The Matrix. GOTCHA!
Reminds me of fuzzing for vulnerabilities
Okay but can Universal Search Algo be used to create a Universal Search Algorithm?
I didn't really understand one thing: if the universal search has time complexity O(f(n)) if there is an algorithm with time complexity f(n), why should it "pick" the fastest algorithm? Like say there are two algorithms, A and B, to solve problem P. A has time complexity O(x) and B has time complexity O(x^2). Why would universal search have a time complexity of O(x) instead of O(x^2) when used to solve problem P?
When brainf*** becomes relevant.
[HALLELUJAH]
wouldn't universal search in general keep finding hardcoded solutions before any actual algorith? Because if that's the case the algorithm should have the same complexity as a brute forcing algorithm
For practical purposes, yes. You have to imagine what happens when the number to factor is so large it does not fit into the universe!
Think about what happens if hardcoding the answer takes more bytes than computing the answer.
As an example, instead of thinking about factoring, think about finding the asymptotically optimal algorithm to decode an mp3 file to wav.
For the sake of argument, let's assume that the optimal algorithm takes 1 GiB to write down, then you will find it before hitting the hardcoded solution, if your output wav-file is longer than 1 GiB.
@Patrick Lenihan Well, hardcoding is just a special case of your 'incorrect algorithm'.
Great video. What's nice about this algorithm is that it also solves the Halting Problem for free; each time you run it you could add each program that completes to a database, then anyone who writes a program and wants to know if it doesn't have the halting problem can simply look it up in that DB and get their answer, and depending on the length of the program this could be a lot quicker than running it in a simulator.
Well, I fear like that you still and up with the same problem that you don't actually know how long should you wait for the program to halt before pronouncing it infinite.
This algorithm will also eventually simulate itself, and by tracking steps allocated to each successful algorithm, and keeping the fastest one, you can generate the fastest one possible! Shame it'd take until the next upload. :(
well you can check all numbers if they even have factors, by modular exponentiation algorithm of wilson theorem
you can do the factorial computation instantly
with expanded factors approximation with convergence
when you have the universal primality test, you also have the factors
logical proof programming, math statements, is simpler than programming languages, less permutations
you can check if the program runs infinitely looping, or ends
What if it deletes your drive first?
You should probably run it in a sandbox and simulate a language that is not allowed to execute any system commands and can only access a virtual memory :-)
Universal search is a neat way to solve any problem, but checking if each string has valid syntax is quite a bottleneck. Maybe Intel can be persuaded to add some hardware instructions to make this a single cycle operation
Maybe I it was in there and I wasn't paying enough attention, but how does this avoid the halting problem?
Programs that don't halt are no issue because while they loop endlessly, the algorithm continues to search new programs until the correct one is found. The halting problem is avoided by allowing the algorithm to prematurely pause programs throughout the search process. Another way to think about it is that in the beginning of the video, the algorithm presented searches for the proper program serially, so a single non-halting program will suspend the search. By the end of the video, the algorithm presented searches for the program in parallel, so whether each individual program halts is of no consequence, as long as there exists a correct algorithm which does halt.
@@denniszhang9278 Thanks! Clearly I was not paying attention
And that's exactly why the hypothetical algorithm that i thought, capable of theoretically compressing the bible, the divine commedy and my elementary sience essay written back to back (yep, three very important text) into likely less than 20 pages of word font 12 without any loss of information due to compression, is not available
Just to make clear, the algorithm i'm talking about (that i won't explain in detail) can compress pretty much any enormous file into an incredibly small one refered to it's original size
The idea is to use the data to generate pseudo random new data (all without any loss in data) till you find one of two specific condition (that i won't explain since i would have to explain the whole algorithm) where the file get compressed to an extreme amount (kinda as if you get randomly 10000000000000000000000000000000000000 and compress it by telling the number of zero but more efficient)
The problem is, obviously, a computation time that put to shame most algorithm (probably not the one of this video) even if the decompression is a bit faster...
Another problem is that short enough files become much more massive instead so, while it could theoretically compress all information on the internet in a surprisingly small size (for the original information size) a file of word with written only Hello would become bigger instead... Probably
Well it's fun knowing you've created an incredibly powerful and useless algorithm no?
Edit:
Thanks to circuit craft that responded to my comment i readed this again and have noticed i wrote a few part in a wrong way.
Sadly english isn't my first language and i didn’t properly express that this algorithm working is my assumption, or hypothesis if you prefer, based on a few facts
The functioning of the algorithm and the continuation of the discussion is found below, thanks again to circuit craft for making me realize my poorly stated comment
The algorithm you are proposing cannot possibly exist, due to information theory. If you do have such an algorithm, you will either find that it does not terminate on some inputs, or the random data it generates is, on average, at least as long as the information you are trying to compress.
@@circuitcraft2399 the concept is that it generates random data till it find a prime that is then represented as a 0 followed by a separator and a number that indicates how many primes there are before
The final compression also has a counter of how many operations are needed and another counter that imply an addition of the term to the original file written in numbers
The compression works by association of the file to a single number (but not all numbers have a correspondence)
What you've said is partially true, but it doesn't get stuck and the problem could be at most that some files exceed their size
The logic is that, for a given size N of compressed file, it associate (N/4)^4 possible correspondences of files with a tendence of them being bigger.
It's true that not all of the number represent a file but every file has a 1 on 1 correspondence to a number
For an arbitrary big original size there should be an N after wich the compression tend to make file smaller and where the file will most likely be found.
Well this is just an idea i had and i've not studied it too throughfully so there is a good possibility that it's wrong but that is to say only after a more rigorous study
I would love to have it studied and have the proof i'm a fool but don't have time, exact knowledge and motivation to do this myself sadly
@@silver_3552 As I understand it, you search for least (greatest) prime number that is greater than (less than) your file's representation, then pair it with the difference between the two? It's true that the prime's index is logarithmic with respect to the input length, but your offest will grow with the size of the files. After all, as primes become more sparse, the nearest one will be farther away.
Here's why it must be impossible:
Suppose that the average compressed size of a file of length n is less than n. Then, there is some N where the compressed size of every n
@@circuitcraft2399 i trully didn't want to write this all... Ok
The method is an iteration of factorization
You can use the base you want but let's say you use base 10 for the example
You start writing 0c that indicate 0 itaration and C is a simple separator
You write a 1 or a 0 to differentiate even and odd numbers
You then write the power of the prime factor with an A to indicate the next one and B to indicate a repetition
For example 559021 becomes 1C 12A1B3A3 and it means
0C first iteration
1 odd
2A 2^2
1B3A 3×5×7
3 11^3
When you get this you reiterate the factorization on the number you got in base 12 (1C becomes 2C and isn't part of hte factorization due to it being the counter)
A prime is written a nC 10Bm
Where n is the number of iterations and m is the number of primes with a value less than half the original number
Obviously you don't only need a prime but any number with the format xC (1/0)yBz
Is ok where the parentesis if for even or odd; x is still the counter, y is the power of all the prime factors and z is the number of the prime factor "used" that represent a prime if y is 0
To assure that there is no loop you can then add a new number before the C like this
0D157C125A0B17 (not a real compression, just a random valid combination i wrote)
Where the number before D can be signed and represent a number to add or subtract to the input.
This allow a parallel computation that will give solutions since, at worst, you just add 1 till you find a prime
And if there is a repetitions then you simply go to the next D
Finally you could also add a E number to imply that at some point you transformed a compression represented in base 14, convert it back to base 12 and start from the start
A little note, when i made the example with a number base 10 it was slightly wrong since the factorizarion is always done in base n+2 and the result are written in base n, in our example A and B are needed as separator and compressor but it's similar for any given base.
The example started in base 10 and the result was also in base 10 only to make it more readable to everyone.
If you wish to try and prove me wrong then i would be happy no matter the result since knowing i was wrong would be an interesting result in and of itself.
If you have other questions then i'm all ears.
Edit:
it should obvious that if 125Cx is a valid compression then so are 124Cx, 123Cx and so on since they were steps of the compression
But not all numbers are input since you can always try to decompress only to find AB somewhere. You could make it so that AB means A0B but we are interessed in compressing and then decompressing not just decompressing
And, as you can see, all steps are done with the possibility to uniquely convert the successive step to the previous one so there is no loss in this
i'll also add that the step with an E could be called at any moment and you could ideally make an F with the same function of the E and so on
@@silver_3552 Okay, I can see that this algorithm would translate any input number into some other number, in a reversible way. What I'm questioning is that the result is *shorter* than the input, on average. Can you justify that?
Só this is basically a Carnot Engine. But instead of thermodynamical machines it applies for computing ones.
Basically a metric for efficiency but can't be built in reality.
Isn't L depends on n, therefore it isn't a constant and be factored out of the big O notation?
It isn't, L is basically 2^number of characters of your code.
@@PolylogCS it's independant only if the code represents an algorithm instead of a straight answer, tho right?
@@deltamico If an algorithm that solves the problem exists, eventually (for high enough n) the universal search will reach said algorithm and thus won't check the stuff past that point. And we don't care about small n for an asymptotic analysis.