FAQ! *4:44** Kruithof's method seems flawed. If the number of telephones in Town D stays the same, why does the number of calls within Town D decrease?* The residents of Town D now have more people they can call outside of Town D. So if they make the same number of calls as before, there will be fewer calls within Town D. However, the problem of predicting telephone traffic isn't well defined as stated, so it doesn't have a unique solution, and you could argue that Kruithof's method isn't realistic in some ways. *3:06** If the number of telephones in Town B increases 33%, then shouldn't the number of incoming/outgoing calls increase from 3000 to 3990 rather than 4000?* I'm following Kruithof's example, and Kruithof rounded here. (See Appendix 3d in the English translation: wwwhome.ewi.utwente.nl/~ptdeboer/misc/kruithof-1937-translation.html .) What's important is that the sum of the target row sums is equal to the sum of the target column sums, since both are equal to the sum of all entries in the table; otherwise you don't get convergence, since it's not possible to satisfy both constraints at once. *5:15** How could we get an irrational number after scaling finitely many times?* You're right; we can't. I could have been more explicit that this is a limit process, so we're scaling infinitely many times. *12:25** A degree-6 polynomial has 6 roots; how do we know which root we're interested in?* Great question! The intended root should of course be a real number between 0 and 1, but if there are multiple such roots then I don't know. I don't even know what an answer would look like, except for "It's the root closest to ...", which wouldn't be very satisfying. Anyway, to compute a Sinkhorn limit in applications, it's faster to iteratively scale to get the approximation you want than to compute a big polynomial for each root; it's only mathematicians who are interested in properties of the exact entries. *Why didn't you include the definitions of Delta, Gamma, M, and Sigma? They would have helped.* I struggled with whether to include these definitions. An earlier draft of the video did include them, but Sigma in particular required much more time than I thought I should spend. Maybe I should have included more examples instead. The formal definitions are in our paper if you're interested: arxiv.org/abs/2409.02789
Despite my having a degree qualification in Semiconductor Device Technology and another in Electrical and Electronics Engineering, I have no idea about anything you said after the first couple of minutes in this video. You don't explain what many of the terms you are using mean. You also have obscure acronyms which you do not explain. And at the end of the video you have given no meaningful answers to the questions posed at the start. So for people who are not familiar with your jargon this video is a complete waste of time. I guess that your paper will be of very little use to normal people who are not mathematicians. The video sounds more like an ego trip than anything which is intended to inform your public, and you confirm this by idly throwing out a seemingly impossible challenge for others to figure out. I'll not condone your felony by giving a Like or Dislike. At least your audacity has earned a comment.
@@RWBHere I am not sure what missing explainations you're talking about but if either of the terms "limit", "linear combination", "matrix", or "determinant" doesn't sound familiar to you then you're not part of the video's target audience. Some other terms (e.g. Sinkhorn limit) as well as their notation for Delta, Gamma and M were explained quite nicely and although there were terms that weren't explained (e.g. Gröbner Bases, which I had never heard of before) that's because they're too technical to explain in a youtube video. Your rather unpolite choice of words as well as your objection to them writing a math paper that is aimed towards mathematicians or at least math enthusiasts leads me to assume (and indeed hope) that you're just trolling although I'm not sure about that. Either way, promoting your own breakthroughs in your own videos on your own YT channel can hardly be called a felony and the same goes for posing fun research ideas. I hope that you find the kind of video you're into elsewhere but college math probably isn't for you.
4:44 Basically you've just proved the original projections were wrong/impossible as calls from D to D should instictively remain the same. The method is probably fine. The projected number of calls to/from D in the projection must be raised to ensure D to D calls do not decrease.
I think you have further constraints on the roots: All of the examples you've shown have only positive entries in the A matrix. And for the result we then always get only positive numbers as we are only rescaling with a positive scalar to get to the (positive) target vector. So I think the desired roots lie within: Equation for s11, ... Equation for s33, Row_sum = target_row_sum, column_sum = target_column_sum Such that s11,...,s33 > 0 And that should hopefully solve this Currently we have all the solutions which even would allow for negative entries in the matrix (that would never happen in the iterative approach as the rescale is always positive) Or maybe negative a entries are allowed and we restrict r1..r3 c1..c3 to be positive
What's mindblowing is that this is an active area of research in math but it's still accessible to understand for someone interested in math on a casual level.
I think most of active research is accessible if presented as brilliantly as this one. Starting with a real-world problem and iterating on it in intuitive steps.
This video is a great example showing what math research can be like. You might be going nowhere, but if you have a few ideas you can keep pulling the thread and then one day something interesting pops out!
Indeed, its pretty awesome. I think if they showed this kind of thing to young kids, it would be possible to persuade more people that math isn’t boring.
I find it fascinating how humans could find an overly complex mathematical problem that's left unsolved for many years from a seemingly simple problem like the phone demand given in this video.
Indeed, phone demand has always been something that you would think is simple, but is fiendishly complex. My mentor was an econometrician for Verizon whose job it was to use statistics to analyze the demand for new Internet backbone service, which at the time consisted of things like SONET rings which are now _passe_ , and this being the early 2000s there was a glut of dark fiber, but dark fiber can be expensive to light, since you need high end layer 1 equipment at a minimum, such as expensive high speed extended range transceivers, which will blind you or burn paper, and often there is a need to upgrade the switching and routing infrastructure as well. And also frequently you can do that and make lighting additional fibre superfluous. But the beauty of single mode fiber is that you can get incredible performance out of it, at least until it has an unpleasant encounter with the anchor of a ship, or a shark with an insatiable desire to chew on something, or even more frequently in the case of overland fibre, with a backhoe or shovel. Hence the need for redundant networks to deal with when things are cut. And capacity planning for Internet applications generally involves massive overprovisioning to handle unexpected traffic surges due both to outages of alternate routes or things “going viral” (back in the day, this was largely due to websites being “slashdotted” - those were the days of the Internet, before Eternal September became much more … eternal)
The formula is (2n-2 choose n-1), which is the same as (2k choose k), with k = n - 1. It's the binomial coefficient in the middle, hence it lies on the median of Pascal's triangle.
Wow! My new definition for a mathematician is someone who is not intimidated by apparent overwhelming chaos, but instead strives to uncover the beautiful underlying simplicity.
Almost exactly this problem shows up in road traffic. The problem there is estimating cars driving between pairs of cities, or pairs of areas in a single city. The standard solution is something called Furnessing which is a similar proceedure. If you look for Furness matrices you will find a lot of work.
@@Soken50 we get it, you don't like cars. Me neither, but this isn't about what's better. Richard was just saying that the maths can be used for any traffic network design, not that he prefers cars
Really nice! A couple years ago I implemented a robust point matching algorithm for point cloud registration. It uses deterministic annealing (softassign) to progressively compute the match matrix that associates points between the two clouds, and you have to calculate a Sinkhorn limit each step of the softassign to force the match matrix to be doubly stochastic. I was wondering if a closed-form expression existed, but didn't put much thought into it, as I sometimes have to avoid mathematical side-quests during my engineering tasks... Well I would never have thought that rabbit hole was that deep. Thanks for sharing your work with us!
"I sometimes have to avoid mathematical side-quests during my engineering tasks" only sometimes? I can get so easily distracted by some stupid thing in excel or my software, or the codes, or just a new spec I've never seen before with ALL of this stuff I could look up more info on, I have to actively focus myself.
@@kindlin Yeah I said "sometimes" because most of the time I get sidetracked. But when I have to get sh*t done there's no way! I usually take a note and come back to it later on, just for fun.
Great video. I love how this was motivated by a concrete example but quickly rose to the point where I had forgotten about the telephones. The concept of scaling the rows and columns in a matrix was interesting enough on its own to carry the video.
This video makes me feel smart because, having thought about similar problems at length before, my immediate thought for how to solve the telephone problem was basically exactly Kruithof’s solution. Now, does that mean I’d have been aware of the problem or been confident enough to write a paper castigating others? Very doubtful. But the possibility makes me happy.
End of the 1990s it was my job to plan new mobile cellular and fixed networks around the world - and then went on to design tools to aid the planning. I’m very happy to report that I didn’t know about this method and that the networks do in fact work! The methods actually used by telcos are quite different.
@@JohnLee-bf2ux We created a custom planning tool suite using Excel 2000 to hold the data tables and to provide a graphical/mapping interface - this was a little before Google Maps API became available. Integration with Excel was via VBA stubs to connect to DLLs we wrote to perform all the stages of the modelling, from a Market model, a Gravity model (to determine inter-city traffic), a map-driven core network topology model using Floyd shortest path or least hop, and an access network automated model using simulated annealing to automatically optimise radio or fibre access networks. It was fun - took a year, but enabled us to design 15 networks around Europe in a year!
@@JohnLee-bf2ux Software engineer solutions tend to be rather crude, and tend to deal with more inputs, such as costs of construction, user entered dynamic limits etc. My guess is the algorithm can't be classified as anything other than a hot mess of heuristics and approximation, and does not impress a mathematician.
It's amazing how some people are able to come up with such things. Hell, even just fully understanding them is a feat in itself. & even more so when something so abstract has any applicability to reality whatsoever. Powerful stuff! 💪🧠
Lol @ the final equation of degree 20. But yeah, this does happen sometimes, or it is even the usual case in mathematics: Solving some seemingly simple problem "explicitly" rather than iteratively is possible, but the solution is so complex, that it seems like some fundamental and not-quite-describable aspect of mathematics converged much slower than comfortable - but at least it converged rather than diverged (as in, the problem was solvable using a lot of complexity).
Yup! And I wonder whether these sort of weirdly shaped answers present soft limits related to the boundary of uncomputability, where we know that a 745-state Turing machine cannot be decidable by mathematics, much less by specific, fixed formulas.
Amazing video. Thanks for sharing the fruits of this discovery. A few months ago, I worked out the formula for 0^p+1^p+...+n^p using generating functions, and I imagine that my satisfaction must have been a tenth of the satisfaction of finding that linear combinations of your sigma functions were the key to the coefficients.
WOW! This is first time I see a RUclips video being published along the scientific paper at the same time. I mean video and paper are published *simultaneously*. BTW: great video. I haven’t read the paper yet.
The paper is well beyond what i can understand but this video allows a laymen to appreciate the beauty of the math even if i only have a surface understanding.
2 месяца назад+7
Very well executed! I love experimental mathematics, it should be emphasized more in education. Experiment, conjecture, prove!
As soon as the Theorem popped up at 7:52 I knew it was related to determinants. Never wouldn't have figured out how, but I even looked it up to check the definition of a determinant at that moment.
When I was in graduate school, I worked on this. I got nowhere. I was today years old when I heard of The Sinkhorn Limit. I could not have done what they did in this video though. After grad school I became a Software Engineer and have had a nice career though. So it''s all good. I'm reading about Sinkhorn's Theorem now.
I mean, there has been a lot of progress making groebner bases more feasible in the last couple of years. I think, just doing the gb computation for the 3x3 case would've been quite difficult 20 years ago. (but I checked and it seems like you can get the final polynomial for the 3x3 case within seconds also in OSCAR, which is open source)
I don’t know why I watched this, but as you described the problem, I realized I had had to use the iterative approach for a problem that had to do with populations and the languages they know. So this problem was familiar to me and had practical implications in my work. Of course, this went well beyond what I was doing, where usually about 6 iterations was enough to solve my problem. I’d wondered if there was a generalizable solution to the problem, but had no idea where to start. I’d actually had to approximate the n x n cases, but I had figured out ways to do so with reasonable efficiency by solving smaller cases and then tackling pieces of the bigger cases. None of what I did is really useful to anyone but me, but I was excited to see that something I clicked on at random actually answered a question I’d had for years.
Although you can tell that i have a physics background because when you started manipulating the coefficients to simplify my eyes glazed over until you hit the answer. I would have accepted “and we can manipulate these to get this”. When mathematicians get into the globetrotter math I just say “I believe you can do it! Just show me the end of it 😭 “
This is probably the best view into math research I've seen on RUclips! And it illustrates the kind of problem that is great for an undergrad (or in this case high schooler!) research project which can produce a publishable paper. There really are unsolved research problems that don't require PhD-level background knowledge (beyond things that can be black boxed like Grobner bases in this example, which I assume the PI would have done). Great video!
In the field of statistics, the initial algorithm is also known as Iterative proportional fitting or Raking. Useful to calibrate tables to known totals.
Absolutely Fascinating . I loved the explanation so much. The idea of a video accompanying a publishing should be normalized. You also gave me a way to showcase my research when done independently and not related to any institution, because that has been a tough question in my mind, on how to publish exactly. So a video + arxiv is the way to go.
I really enjoyed this video. I have one criticism (which I am open to discussion about). I feel like the definitions of the delta and gamma functions from around the 20 minute mark are not completely clear. For example, the functions took a single set of variables upfront and, later, a set of two variables. Hence, I have no idea how that would generalize to sets of greater cardinality for higher dimension matrices. I understand that, for brevity, that the definition needed to be cut, but I would have liked a reference to check it so I could follow along with a sufficient basis for understand the derivation for the higher degree coefficients. Otherwise, like I said, this was a really good video with really great motivation and intuition leading to next steps. As for rigor, I refer back to the comment above and hope that it's not just my own brain not parsing your explanation correctly. Thanks so much for posting this!
Thanks for your comments! I did avoid the formal definitions in the video, but maybe I didn’t give enough examples instead. Anyway you can find the definitions in our paper: arxiv.org/abs/2409.02789
I was thinking about this problem invariants noticed some nice intuition for why determinants should pop up here.the short version is that the Sinkhorn limit is invariant under row and column scaling, and row and column scaling winds up scaling the coefficients of this polynomial, so the solutions of the polynomial are invariant. Long version:Observe that if you scale the columns of a Matrix M any way you want to make new matrix M', then run the sinkhorn process on both M and M', beginning with a column scaling step, then the column scaling will give you the same result for M and M'. Then for the rest of the process they will match, so M and M' have the same Sinkhorn limit. This means that the Sinkhorn limit should be invariant under column scaling, and by a symmetrical argument, under row scaling. Now a determinant isn't invariant under such operations, but it scales according to the same factor you scale your rows or columns by, and so if every coefficient of the polynomial has determinants multiplied in just the right way, then the solution of that polynomial is invariant.
Fantastic video. I love the way your narrative carried interest all the way through, and the payoff at the end was worth it even as a viewer. I audibly chuckled when you (metaphorically) opened the hood to the car and revealed the engine of math underpinning the whole thing, what a mess lol.
I was baffled by the fact alone that the algorithm always converges. Anyway, congratulations to your awesome idea and that it worked out. Constructive criticism: I like to see definitions for simpler notations, e.g. of \Gamma, M and \Sigma, when you introduce them. It helps me to follow.
Thank you! I struggled with whether to include definitions. Sigma in particular required much more time than I thought I should spend, but of course the risk of not including a definition is that it seems meaningless. The formal definitions are in our paper: arxiv.org/abs/2409.02789
Great video! I especially liked your storyline, which very well reflects the workflow of solving problems in math or related fields (minus the frustrating moments and dead ends😅). Reading papers often leaves people wondering how they could possibly came up with the results, but a paper is just the "reverse-engineering" of this workflow
Me: "How did a youtuber do such intelligent mathematics?" Post-video me: "Oh, bro is a professor, not a youtuber". Fascinating work, thanks for sharing :)
Wow great video Eric! You introduced the problem and my first thought was that there is no way this should be unsolved for so long. Then you started writing out the details of the polynomial system and I had ptsd flashbacks at grobner bases. Excellent job making sense of the coefficients you got out. If you’re looking for a similar problem for future students I have one from my research on quantum walks (a quantum analogue to random walks, shows up in quantum computing a bunch). Basically the spreading pattern of these things are solutions to these crazy polynomial systems so that even the simple cases which look like easy geometric shapes are crazy to solve. Let me know if you’re interested, I have a bunch of vids on my channel of these simulations
Fascinating look into the way math research is done and how progresses are made by educated guess from numerical approximation, symmetry of the problem, and just beautiful intuitions and lots of work. Thanks, and sub 🙂
Brilliant! Many brilliant moves you two made. This is good, fun, math. With good development like this, even if you have to go back and correct something and redevelop, you get there. 😊 But this seems solid… I love applications that involve the Grobner basis. You showed a good method for working more efficiently. Good work you two!
That is an insane formula.. but now make it fractional and with complex numbers and you'll have a glimpse of what we have to deal with when designing high order digital filters!
Back in the 1990s, I wrote a program on a NeXT computer to compute a Grobner basis. It ran all nigh at the Universityt. It swapped so hard, that the hard drive was destroyed. I eventually figured out the solution, though.
Great work! This looks like a very complicated combinatorial problem, I'm glad to see some progress in it. And I'd be surprised if the conjectures on the degree bounds aren't true. I hope you'll make another video if you can work out the deal with the sign alterations. On another note, I think I've found the formula for the 2x2 Kruithof limit. Though I didn't actually prove this is the right limit, I just assumed the conjectured degree bound and found it by polynomial interpolation. Let M = {{a,b},{c,d}} be our matrix, with a,b,c,d positive, and let {m,n} be the row sum targets and {x,y} the column sum targets, with m,n,x,y positive. Then in order for the Kruithof limit achieving those targets to exist, we need m+n = x+y. Further, letting p = ad, q = bc, the value of the top left entry of the Kruithof limit is given by: (p(m+x)+q(n-x)-Sqrt[(p(m+x)+q(n-x))^2-4p(p-q)mx])/(2(p-q)) One might notice the absence of y. That's because m+n-x-y = 0, so we can add multiples of this combination throughout the formula to re-symmetrise. This form without y was among the cleanest ways I could find to write it, though I imagine there's some more symmetric way to write it. This has worked for countless random matrices and random targets, so I'm fairly certain it's right, and if you plug in m = n = x = y = 1, you do indeed get the Sinkhorn limit formula, after some manipulation.
Actually, I just noticed a slight problem with my formula, it fails for matrices with determinant 0, which corresponds to the case where the entries of the limit are rational, as opposed to algebraic of degree 2. So the formula definitely needs to be manipulated to have it work for determinant 0 matrices, which would involve splitting the p-q in the denominator into (sqrt(p)+sqrt(q))(sqrt(p)-sqrt(q)) and cancelling down the latter factor anywhere it appears, which would make the formula look more like the Sinkhorn limit.
Perhaps it's nicer to just have the polynomial of which the top left entry of the Kruithof limit is a root. The polynomial is simply (p-q)t^2-(pm+qn+(p-q)x)t+pmx
4:54 If number of telephones in towns A, B, and, C all increase while the number of phones in D stays the same, why would the expectation be that there won't be more calls to or from town D? I can see how the assumption that calls from town D to itself might make sense as there's not an increase in households to call. But the table after the iterative scaling assumes those numbers will go down. Is the assumption that each household has a phone budget that won't go up even when more households they know are added to the phone network?
I think it's more accurately just described as to why models are only approximations of reality. As long as the deviation isn't too large in practice the numbers will still be useful. But I too initially thought the video would be about analyzing those pathologies in the method, so I was hoping it would be addressed along with the analysis presented.
How could it be A->A (Ignore all other towns) goes above 1.5 x 3000? Also, D->D shrunk? HUH? HOW? 160->124? With the same number of people in the town? It doesn't make ANY sense. I mean for one, like "If i'm in town D, and my number of options expand outside of town D only, then it makes sense for the number of inter-town d calls to slightly decrease" But, This isn't being done the right way. What the right way is, I couldn't tell you, but this is not correct. It's something like "The more people within someone's people- range there is to call, the more likely someone is to want to use the phone, so as we increase phone lines, people want to use the phone lines even more", but this isn't going to be mapped by some random matrix thing. It's just defintely the wrong way to approach the problem, as we aren't even mentioning at any point how the growth of the telephone lines impacts peoples wants to use the telephone, which is CLEARLY an essential element this is modelling. This is also OBVIOUS to us today, that now that we have all the people connected on telephone lines, the same logic must surely NOT apply, imagine adding 3000 lines in alaska, would they get some exponential growth? Yes, but not the way it would be somewhere more close nit or as telephones were just being adopted. There's some element of "remoteness" and disconected groups that will impact these numbers, surely. And none of it's even being mentioned. It's just completely wrong. There MUST be some "Connectivity constant" at play, that the matrix model might get close to or something, but its defintely not the same for every grouping of cities, and its also defintely not the same for every "phone" like problem. It's just some random matrix thing.
When phone companies used this there was a *lot* of money riding on it working. This is a toy example, but it gets the idea across. You can trust the full thing worked. This kind of research is why engineers dreamt of working at Bell Labs. They where at the forefront of a lot of fields. The math around optimizing utilization of phone lines is amazing.
@@peterbonucci9661 I once wrote a toy tool in Excel that predicted traffic in an Ethernet-like network. It was atrociously simplistic and unrealistic, but it spew out numbers and people absolutely loved it.
@@peterbonucci9661 Imagine City A has nothing but shunin tech nerds who just want to get the new tech thing. City B has a bunch of social people who want to be able to talk to their friends without the hastle of driving to peoples homes every time they want to talk to someone. City A Goes up 50%, City B goes up 10%, because they are poor or something. The matrix will fail spectacularly. Just think about it. The matrix would need to include some ratio to distinguish these cities apart from each other. It would need to only include "like" cities, or something that isn't being covered in the example. It's 100% not just throw every city in america into a 10000x10000 matrix.
I wouldn't say it's just a random matrix thing. I would argue it's rather a model assumption error. I see it as: based on the behavoiral situation in the A matrix how would the dynamics extrapolate to the situation in the S matrix. The "problem" lies within the assumption that the number of calls of D doesn't rise. As D are very antisocial people and they don't necessarily even like to talk to people from their own city. They now have a problem that people from A are now even more willing to talk to D people and well D people would rather talk A vs D and the total number of calls doesn't increase so D - D decreases Same situation in A-A. As A people are very eager to talk to A people the total number can increase more than 1.5. They are even more connected now so given the last years model assumption we should extrapolate this further All models are wrong :)
I find the video fascinating. After trying myself, I only could do so much to find the first row and column if the original row sum is 4000 and column sum 4150 and the intended row sum 6000 and column sum 6225. We can find the coefficient to be Lcm 6000, 6225 over the Lcm 4000, 4150 Which will be equal to 1,5 And if we plug it in, it becomes 3000, 1545, 975, 480 | 6000 1620 1080 525 ------------ 6225
That was nice. I wish there was a reminder of what are delta, gamma, M and sigma visible at 22:20, but the picture is already a bit crawded. 20:52 "[...] stroke of luck" I think you forgot a lot of work and a bit of genius
Very interesting. I wonder if any computational optimisations come out of this research. So far it seems like the original iterative method would be much faster than evaluating high degree polynomials
Oh, this is exactly what I've been thinking about lately ...........unrelated field but same iterative process needed ...I was looking at Newtons method and other iterative approximations
I have used this method for years without knowing its name. Just a few weeks ago I'd tried to use it on a problem and had been dissatisfied with my results, but this presentation helped satisfy why I shouldn't have expected an obvious answer. I assume there's a reasonable simple formula for the 2 by 2 case with non uniform marginal, i.e., non Sinkhorn marginal. There goes my weekend...
Here's an interesting result, that you probably noticed: The Sinkhorn limit of "next year's" table is identical to the Sinkhorn limit of "this year's" table. In fact, this seems to be true for ANY (non-zero) target values for the rows and columns. They don't even have to follow the "multiply the row and column sums by the same value" rule Kruithof's method uses. It looks like you can start with a table and perform Kruithof's method on it multiple times and it will always have the same Sinkhorn limit. More generally, If you have a matrix and two sets of targets A and B, you can use Kruithof's method to match targets A, then use that result to match targets B, and it would be an identical result to just matching the original matrix to B. So this ALSO means that if you take the Sinkhorn limit of this year's data, you can use that with next year's targets and get the same result. Which sounds interesting but might not be helpful. Now, I'm not a mathematician, I'm just playing with some computer code and noticed this. I could be completely wrong and I just happened to find situations where this worked.
@EricRowland - I love how you did this work with gifted high school students, as this will really help jumpstart their careers! Being a high school student that is published in peer reviewed journals must be incredible. Out of curiosity what software were you using to assist you in these calculations? Matlab, or custom programs in python, or did you have to use a compiled language or run on the GPU due to the extreme memory and CPU usage? And were you running this on a cloud computing service, or on a private cluster you owned or had access to? As a network engineer and sysadmin with a desire to get into HPC I’m very interested in how you did the computational aspects of this. Thank you for your work and God bless you! ❤
Oh also, I love this subject matter of this problem, since as a network engineer, I got my start working in the industry working for a legacy telecom provider, specifically Ghana Telecom. I worked for them as a consultant during the preparations for their merger with Vodaphone. A Nigerian mentor of mine, who had previously worked for Verizon and before that, for the Federal Reserve Bank in the US, who had been my professor of economics and econometrics in college, specialized in econometrics relating to telecom network demand, specifically forecasting the need for backbone infrastructure (at the time, there was a still a glut of dark fibre due to the dotcom bust in the West, but in Ghana, there was not quite as much good backbone infrastructure as one would prefer, and furthermore, the equipment was not what I would have preferred - non-redundant switches from Alcatel were handling both voice and data traffic because they were cheap, but for IP operations they were very limited, and there was in particular a lack of good Cisco and Juniper routing and switching, and at the time there was almost no mobile broadband, so the GSM based cellphone network, while modern as far as voice was concerned, was not able to do much more than handle voice and SMS, and broadband internet access broke down outside of the limits of the major cities, and the switched telephone network was a mess - nonetheless, we could tell from all indicators that demand was steadily rising, and the real challenge was financing the capital improvements needed to meet that demand, which is why the system was privatized and sold to Vodaphone - there were several competitors in the mobile phone space but Ghana Telecom had a monopoly on land lines, and this created the basis by which mobile data was implemented in the coming years. After I left Ghana I moved into network engineering for colocation providers, which is a lot of fun as I continued to do BGP routing, which is the mathematically elegant protocol that enables the Internet to work as a confederation of independent networks, or “autonomous systems” as we call them in BGP parlance, without each network having to know any details about the internals of other autonomous systems. But within an autonomous system, there are equally interesting routing protocols like OSPF, IS-IS and Cisco’s proprietary EIGRP, as well as the dated, and nowadays, rather aptly named RIP (Router Information Protocol, but also it Rests In Peace). The math of these systems I find quite interesting, but to an artist like you it might well be boring and trivial.
Thanks! The software we used was Mathematica. The hardware wasn't anything special, just a laptop. For the Gröbner basis computation I talk about in the video, it was more about being clever with the settings than having a lot of compute power. To guess the formula for larger matrices (which we talk about in the paper), we used 1.5 years of CPU time to iteratively scale matrices and run PSLQ on the results, but again most of that was done on a laptop running jobs in parallel.
@@EricRowland well I guess I made the classic sysadmin mistake of over-estimating what mathematicians need for most things - a Freudian slip on our parts because we want more expensive and exotic servers to play with, and mathematicians like yourself who can justify resources create the raison d’etre for us to buy more toys, and those of us who are good pride ourselves on providing a reliable service, even if it is massive overkill. Software developers on the other hand tend to be a lot more conservative, and in the case of some developers, way too conservative, about hardware requirements.
I disagree with the assumption that the # of telephones staying the same in some town means that the # of incoming and outgoing calls stays the same. Even if there are no additional people with telephones in that town, there are still more people in the other towns for those people to call, and more people in other towns that may want to call them. With calls taking place between towns, the call volume should also reflect growth elsewhere in the network (Calls within the town could stay the same though) In the beginning example, this assumption actually causes the calls within town D to go down, in order to keep the total calls constant while volume from elsewhere increases. Calls between C and D also goes down despite and increase within C, and between B and D barely increases ay all. These results seem odd, and the assumption is basically that each individual telephone will make and receive the same number of calls each year which seems unrealistic to me.
I would not get too hung up on the exact method to estimate call growth as it is not the important part here. That said I think you might agree with the following statements. The people in all the towns have the same kind of behaviour. They are no more talkative or more stand-offish. A) Make the assumption that people in all towns make the same average number of calls per day. B) Make the assumption that people in all towns are just as likely to receive as to make a call. The two assumptions seem reasonable if we assume no town is more grumpy or talkative. Assumption A means outgoing calls scale with population. Assumption B means incoming calls scale to march them. Any other model you will get towns which have "more chatty" people and that seems a little odd.
I also got hung up a little bit on this when seeing that D, taken to be a closed system, makes fewer internal calls when other towns increase their phone traffic. Your reply helped me see that, vaguely, the extra calls from A-C that the folks in D have to contend with will crowd out their willingness to call other D.
@@biggiemac42 you could justify it as when the other towns got bigger then D town made friends outside their neighbourhood and so made calls to their neighbours in D town less. In reality it gets complicated - Like maybe the towns get cheap local calls. Maybe A town and B town are really close so they have lots of friends. Maybe C town has a call centre so makes loads of outgoing calls. You sacrifice a bit of reality to get a beautiful simple mathematical model. :)
Yeah I'd wager that through the Network Effect the utility and therefore the call numbers increase even in towns without an increase of internal nodes. So the model is definitely interesting, but it fails to account for that aspect of reality. The math is great though.
@@richardclegg8027 While I agree that this is beside the point for the video, I have a really hard time accepting those assumptions. It's not really about chattiness, but availability. Put simply, people will likely spend more time on the phone if they have more people to call. To make this clearer, take an extreme example. If everyone in my town has a phone, I will find reasons to phone them - perhaps because it's quicker than making house calls, or whatever. On the other hand, if only one other person in my town has a phone, I'll only use it rarely - not because I don't want to, simply because there are limited opportunities to do so. So I think it's quite reasonable to expect phone owners in these two towns to spend very different amounts of time on the phone. Real life examples wouldn't be so extreme, and I'm intentionally ignoring inter-town traffic for simplicity, but you get the idea. Again, I'm not knocking the maths in any way - it's a fascinating result - but applications in the real world are inevitably messy.
15:00 To understand why it's degree 6, not 15 Its because for an n x n matrix we have n + n constraints so 2n variables. Basically we need a normalization factor for each row and each column so for the element in (I,j) you take the original and multiply it by the normalization constant related to row I times the constant for column j. Still seems pretty basic.
Ngl my pet peeve is all these youtube math videos trying to make simple concepts look complicated with unnecessary terminology. Hope this isn't one of those cases 18:00
Oh, I thought there would be a simple expression for a general limit via Sinkhorn limit. But I couldn’t guess what it’d be so I decided to wait until the end of the video. It ended up both surprising and not surprising that there’s none known yet! I guess a useful question to ask here would be does taking Kruithof limits with different row and column sums commute? If it does, probably we can always start from a bi-stochastic matrix as mentioned in a comment by @tzimmermann. Then ideally we could compose two polynomials. I guess the nature of coefficients before Σ might be crucial for working out what’s happening in the general case... I wonder which kinds of linear algebra are still not applied to this problem. Having determinants like these alludes to maybe try and make this coordinate-free in a language of exterior algebras. But it’s not obvious what unexpected things just a mere reformulation would give.
Mind blown! That was a very interesting explanation. Now, from a real word performance point of view, is the Kruithof's method faster of slower? I'm guessing it's likely faster, and since it's all about estimations, maybe it can be stopped before it fully converges and still be "good enough". Wonder how the uncertainty margins in the original data and predictions, impact the final formula.
there exists a sinkhorn limit to the solution of kruithof's algorithm, which just scales each row and column of the solution by a number. maybe a relation between that and the expected sums can be found, in which case finding the solution would be trivial
My take on the problem (most likely not new): rows scaling of matrix A is equivalent to left multiplication by doagonal matrix d, namely A -> dA. Columns scaling A -> Ad'. All infinitely many scalings lead to ...d_4 d_2 A d_1 d_3... = D A D'. For N towns this transformation has 2N parameters (N diagonal entries of D and N of D'). We have 2N coupled quadratic equations for those parameters (N row sums and N column sums). Enough! We take only all-positive solutions. It doesn't prove there is only one solution, but for sure - finitely many.
FAQ!
*4:44** Kruithof's method seems flawed. If the number of telephones in Town D stays the same, why does the number of calls within Town D decrease?*
The residents of Town D now have more people they can call outside of Town D. So if they make the same number of calls as before, there will be fewer calls within Town D. However, the problem of predicting telephone traffic isn't well defined as stated, so it doesn't have a unique solution, and you could argue that Kruithof's method isn't realistic in some ways.
*3:06** If the number of telephones in Town B increases 33%, then shouldn't the number of incoming/outgoing calls increase from 3000 to 3990 rather than 4000?*
I'm following Kruithof's example, and Kruithof rounded here. (See Appendix 3d in the English translation: wwwhome.ewi.utwente.nl/~ptdeboer/misc/kruithof-1937-translation.html .) What's important is that the sum of the target row sums is equal to the sum of the target column sums, since both are equal to the sum of all entries in the table; otherwise you don't get convergence, since it's not possible to satisfy both constraints at once.
*5:15** How could we get an irrational number after scaling finitely many times?*
You're right; we can't. I could have been more explicit that this is a limit process, so we're scaling infinitely many times.
*12:25** A degree-6 polynomial has 6 roots; how do we know which root we're interested in?*
Great question! The intended root should of course be a real number between 0 and 1, but if there are multiple such roots then I don't know. I don't even know what an answer would look like, except for "It's the root closest to ...", which wouldn't be very satisfying. Anyway, to compute a Sinkhorn limit in applications, it's faster to iteratively scale to get the approximation you want than to compute a big polynomial for each root; it's only mathematicians who are interested in properties of the exact entries.
*Why didn't you include the definitions of Delta, Gamma, M, and Sigma? They would have helped.*
I struggled with whether to include these definitions. An earlier draft of the video did include them, but Sigma in particular required much more time than I thought I should spend. Maybe I should have included more examples instead. The formal definitions are in our paper if you're interested: arxiv.org/abs/2409.02789
Despite my having a degree qualification in Semiconductor Device Technology and another in Electrical and Electronics Engineering, I have no idea about anything you said after the first couple of minutes in this video.
You don't explain what many of the terms you are using mean. You also have obscure acronyms which you do not explain. And at the end of the video you have given no meaningful answers to the questions posed at the start. So for people who are not familiar with your jargon this video is a complete waste of time. I guess that your paper will be of very little use to normal people who are not mathematicians.
The video sounds more like an ego trip than anything which is intended to inform your public, and you confirm this by idly throwing out a seemingly impossible challenge for others to figure out. I'll not condone your felony by giving a Like or Dislike. At least your audacity has earned a comment.
@@RWBHere I am not sure what missing explainations you're talking about but if either of the terms "limit", "linear combination", "matrix", or "determinant" doesn't sound familiar to you then you're not part of the video's target audience.
Some other terms (e.g. Sinkhorn limit) as well as their notation for Delta, Gamma and M were explained quite nicely and although there were terms that weren't explained (e.g. Gröbner Bases, which I had never heard of before) that's because they're too technical to explain in a youtube video.
Your rather unpolite choice of words as well as your objection to them writing a math paper that is aimed towards mathematicians or at least math enthusiasts leads me to assume (and indeed hope) that you're just trolling although I'm not sure about that.
Either way, promoting your own breakthroughs in your own videos on your own YT channel can hardly be called a felony and the same goes for posing fun research ideas.
I hope that you find the kind of video you're into elsewhere but college math probably isn't for you.
4:44 Basically you've just proved the original projections were wrong/impossible as calls from D to D should instictively remain the same. The method is probably fine. The projected number of calls to/from D in the projection must be raised to ensure D to D calls do not decrease.
@@RWBHerewhat a strangely hostile comment
I think you have further constraints on the roots:
All of the examples you've shown have only positive entries in the A matrix. And for the result we then always get only positive numbers as we are only rescaling with a positive scalar to get to the (positive) target vector. So I think the desired roots lie within:
Equation for s11,
...
Equation for s33,
Row_sum = target_row_sum,
column_sum = target_column_sum
Such that
s11,...,s33 > 0
And that should hopefully solve this
Currently we have all the solutions which even would allow for negative entries in the matrix (that would never happen in the iterative approach as the rescale is always positive)
Or maybe negative a entries are allowed and we restrict r1..r3 c1..c3 to be positive
What's mindblowing is that this is an active area of research in math but it's still accessible to understand for someone interested in math on a casual level.
I think most of active research is accessible if presented as brilliantly as this one. Starting with a real-world problem and iterating on it in intuitive steps.
These techniques feel a bit like looking up the answers in the back of the book where whoever wrote the answer key didn't include any derivations
Dude discovered research in mathematics.
Fr
YFAI
most mathematic olympiad questions (in my experience and based on my knowledge) require some sort of guessing so, it's not that rare
Indeed, this is where math becomes fun and interesting in its own right. _Mathematica Gratia Mathematicae_ so to speak.
This is amazing!
There are Researchers and then there are Science Comunicators, but someone that does both so well at the same time... wow!
This video is a great example showing what math research can be like. You might be going nowhere, but if you have a few ideas you can keep pulling the thread and then one day something interesting pops out!
Nice to see you here :)
The video itself is well done, but it’s especially neat how it gets to be a companion explanation to your own research. Awesome job on this! :)
Thanks so much!
Man, I wish the papers in *my* field came with such a concise and pleasantly narrated expository video
Congratulations on cracking this 90-year-old problem !
Thank you!
This shows just how far on our way up the growth spike we are.
a mathematical breakthrough in an open problem only 4 hours ago?!
Satisfactory!
@@aloysiuskurnia7643the mathematical community will thrive
of course, these are happening all the time - what makes this exceptional is that it was popularized through video
So cool! Reminds me of Numberphile blowing up when David Smith et al. found the aperiodic monotile
Indeed, its pretty awesome. I think if they showed this kind of thing to young kids, it would be possible to persuade more people that math isn’t boring.
I find it fascinating how humans could find an overly complex mathematical problem that's left unsolved for many years from a seemingly simple problem like the phone demand given in this video.
because phone demand is not a simple problem?
@@kelvinluk27 "...from a _seemingly_ simple problem..." how about read first before replying, we good now?
Indeed, phone demand has always been something that you would think is simple, but is fiendishly complex. My mentor was an econometrician for Verizon whose job it was to use statistics to analyze the demand for new Internet backbone service, which at the time consisted of things like SONET rings which are now _passe_ , and this being the early 2000s there was a glut of dark fiber, but dark fiber can be expensive to light, since you need high end layer 1 equipment at a minimum, such as expensive high speed extended range transceivers, which will blind you or burn paper, and often there is a need to upgrade the switching and routing infrastructure as well. And also frequently you can do that and make lighting additional fibre superfluous. But the beauty of single mode fiber is that you can get incredible performance out of it, at least until it has an unpleasant encounter with the anchor of a ship, or a shark with an insatiable desire to chew on something, or even more frequently in the case of overland fibre, with a backhoe or shovel. Hence the need for redundant networks to deal with when things are cut. And capacity planning for Internet applications generally involves massive overprovisioning to handle unexpected traffic surges due both to outages of alternate routes or things “going viral” (back in the day, this was largely due to websites being “slashdotted” - those were the days of the Internet, before Eternal September became much more … eternal)
@@Somebody71828 how about you chill out before raging, we good now?
@@SwordQuake2 Dude, if I'm raging I would've responded with something irrational that makes me look butt hurt. I just simply pointed out his mistake.
1,2,6,20,70,... is the height of Pascal's Triangle. It's nice to see combinatoric stuff linked to each other
My naive guess halfway through the video was that for an n x n matrix, the polynomial would be degree n!
What precisely do you mean? Surely the height would be 1,2,3,4...? What are you referring to as 'height' here?
@@KeimoKissa They are the numbers in the central column of Pascal's triangle. I'm not sure if this is meant by height.
The formula is (2n-2 choose n-1), which is the same as (2k choose k), with k = n - 1. It's the binomial coefficient in the middle, hence it lies on the median of Pascal's triangle.
@@KeimoKissa I think they meant "altitude" by "height".
For about 5 seconds, I was half a step ahead of the video. Probably the most insight I've ever had or will have. Amazing work.
Fascinating and very entertaining! I love the idea of releasing an expository video to go along with a research paper. Awesome video!
Thanks so much!
YFAI
Wow!
My new definition for a mathematician is someone who is not intimidated by apparent overwhelming chaos, but instead strives to uncover the beautiful underlying simplicity.
well said
Almost exactly this problem shows up in road traffic. The problem there is estimating cars driving between pairs of cities, or pairs of areas in a single city.
The standard solution is something called Furnessing which is a similar proceedure. If you look for Furness matrices you will find a lot of work.
And the result of this very complex problem is a high speed rail line. Saves a lot of "one more lane" math.
@@Soken50 almost exactly the same problem shows up in designing rail networks. Some maths is just very applicable. :)
@@richardclegg8027 You're right! There are so many 10 lane high speed railways in the world! What a headache!
@@Soken50 we get it, you don't like cars. Me neither, but this isn't about what's better. Richard was just saying that the maths can be used for any traffic network design, not that he prefers cars
Really nice! A couple years ago I implemented a robust point matching algorithm for point cloud registration. It uses deterministic annealing (softassign) to progressively compute the match matrix that associates points between the two clouds, and you have to calculate a Sinkhorn limit each step of the softassign to force the match matrix to be doubly stochastic. I was wondering if a closed-form expression existed, but didn't put much thought into it, as I sometimes have to avoid mathematical side-quests during my engineering tasks... Well I would never have thought that rabbit hole was that deep. Thanks for sharing your work with us!
Ah, the mathematical side-quests! Why I get up in the morning!
"I sometimes have to avoid mathematical side-quests during my engineering tasks" only sometimes? I can get so easily distracted by some stupid thing in excel or my software, or the codes, or just a new spec I've never seen before with ALL of this stuff I could look up more info on, I have to actively focus myself.
@@kindlin Yeah I said "sometimes" because most of the time I get sidetracked. But when I have to get sh*t done there's no way! I usually take a note and come back to it later on, just for fun.
@@trevistics Ah, the mathematical side-quests! Why I stay up late at nights...
Excellent explanation of how research is done in mathematics. A mix of intuition, tries, numerical experiments and deduction.
Great video. I love how this was motivated by a concrete example but quickly rose to the point where I had forgotten about the telephones. The concept of scaling the rows and columns in a matrix was interesting enough on its own to carry the video.
YFAI
I read your comment and still forgot about them😂😂😂
This is such a good insight into what academia is like in mathematics!!!
i was completely dumbfounded when i heard the delta, gamma and sigma functions, and the pattern between them is absolutely stunning
As a high school teacher, it’s very special that you’ve worked with a student on a project of this magnitude.
Exactly what I was thinking!
Show your class Gematria.
This video makes me feel smart because, having thought about similar problems at length before, my immediate thought for how to solve the telephone problem was basically exactly Kruithof’s solution. Now, does that mean I’d have been aware of the problem or been confident enough to write a paper castigating others? Very doubtful. But the possibility makes me happy.
My idea would have been this too, but I would have immediately rejected it on the basis of "there is no way it actually converges to a solution"
BEAUTIFUL solution. The gammas-deltas framework is so elegant. Really did not anticipate something so nice falling out of this.
Your videos feel like the brain child of 3b1b and stand up math's, and I'm seriously here for it.
End of the 1990s it was my job to plan new mobile cellular and fixed networks around the world - and then went on to design tools to aid the planning. I’m very happy to report that I didn’t know about this method and that the networks do in fact work! The methods actually used by telcos are quite different.
What did you use Sir ?
@@JohnLee-bf2ux We created a custom planning tool suite using Excel 2000 to hold the data tables and to provide a graphical/mapping interface - this was a little before Google Maps API became available. Integration with Excel was via VBA stubs to connect to DLLs we wrote to perform all the stages of the modelling, from a Market model, a Gravity model (to determine inter-city traffic), a map-driven core network topology model using Floyd shortest path or least hop, and an access network automated model using simulated annealing to automatically optimise radio or fibre access networks. It was fun - took a year, but enabled us to design 15 networks around Europe in a year!
@@JohnLee-bf2ux Software engineer solutions tend to be rather crude, and tend to deal with more inputs, such as costs of construction, user entered dynamic limits etc. My guess is the algorithm can't be classified as anything other than a hot mess of heuristics and approximation, and does not impress a mathematician.
It's amazing how some people are able to come up with such things. Hell, even just fully understanding them is a feat in itself. & even more so when something so abstract has any applicability to reality whatsoever. Powerful stuff! 💪🧠
Lol @ the final equation of degree 20.
But yeah, this does happen sometimes, or it is even the usual case in mathematics: Solving some seemingly simple problem "explicitly" rather than iteratively is possible, but the solution is so complex, that it seems like some fundamental and not-quite-describable aspect of mathematics converged much slower than comfortable - but at least it converged rather than diverged (as in, the problem was solvable using a lot of complexity).
Yup!
And I wonder whether these sort of weirdly shaped answers present soft limits related to the boundary of uncomputability, where we know that a 745-state Turing machine cannot be decidable by mathematics, much less by specific, fixed formulas.
Amazing video. Thanks for sharing the fruits of this discovery. A few months ago, I worked out the formula for 0^p+1^p+...+n^p using generating functions, and I imagine that my satisfaction must have been a tenth of the satisfaction of finding that linear combinations of your sigma functions were the key to the coefficients.
WOW! This is first time I see a RUclips video being published along the scientific paper at the same time. I mean video and paper are published *simultaneously*.
BTW: great video. I haven’t read the paper yet.
Thanks! Posting the paper first would have been easier and safer, but I thought it would be fun to post them at the same time!
The paper is well beyond what i can understand but this video allows a laymen to appreciate the beauty of the math even if i only have a surface understanding.
Very well executed! I love experimental mathematics, it should be emphasized more in education. Experiment, conjecture, prove!
YFAI
Every paper should have a video like this! Great stuff.
As soon as the Theorem popped up at 7:52 I knew it was related to determinants. Never wouldn't have figured out how, but I even looked it up to check the definition of a determinant at that moment.
When I was in graduate school, I worked on this. I got nowhere. I was today years old when I heard of The Sinkhorn Limit. I could not have done what they did in this video though. After grad school I became a Software Engineer and have had a nice career though. So it''s all good. I'm reading about Sinkhorn's Theorem now.
I mean, there has been a lot of progress making groebner bases more feasible in the last couple of years. I think, just doing the gb computation for the 3x3 case would've been quite difficult 20 years ago. (but I checked and it seems like you can get the final polynomial for the 3x3 case within seconds also in OSCAR, which is open source)
this feels so complicated yet you manage to communicate it in a way that i understand it. great job, your vids are amazing :3
Very, very well done video! Probably the best math video I've ever watched.
I don’t know why I watched this, but as you described the problem, I realized I had had to use the iterative approach for a problem that had to do with populations and the languages they know. So this problem was familiar to me and had practical implications in my work. Of course, this went well beyond what I was doing, where usually about 6 iterations was enough to solve my problem. I’d wondered if there was a generalizable solution to the problem, but had no idea where to start. I’d actually had to approximate the n x n cases, but I had figured out ways to do so with reasonable efficiency by solving smaller cases and then tackling pieces of the bigger cases.
None of what I did is really useful to anyone but me, but I was excited to see that something I clicked on at random actually answered a question I’d had for years.
Another fascinating video. You might not pump them out (who could?!) but every one you’ve done has been top notch.
I admire that you were able to get the time to mount such a explicative video while solving the problem.
This is a pretty excellent video Eric! Very accessible and well presented
Although you can tell that i have a physics background because when you started manipulating the coefficients to simplify my eyes glazed over until you hit the answer. I would have accepted “and we can manipulate these to get this”. When mathematicians get into the globetrotter math I just say “I believe you can do it! Just show me the end of it 😭 “
This is probably the best view into math research I've seen on RUclips! And it illustrates the kind of problem that is great for an undergrad (or in this case high schooler!) research project which can produce a publishable paper. There really are unsolved research problems that don't require PhD-level background knowledge (beyond things that can be black boxed like Grobner bases in this example, which I assume the PI would have done).
Great video!
Pretty good summary. I actually already solved all this math in the 2nd grade during my lunch period, but it's nice to see others catching up.
In the field of statistics, the initial algorithm is also known as Iterative proportional fitting or Raking. Useful to calibrate tables to known totals.
Absolutely Fascinating . I loved the explanation so much. The idea of a video accompanying a publishing should be normalized. You also gave me a way to showcase my research when done independently and not related to any institution, because that has been a tough question in my mind, on how to publish exactly. So a video + arxiv is the way to go.
This is awesome. I love how you went through the experiments you did that led you to an insightful and ultimately a conjecture
I really enjoyed this video. I have one criticism (which I am open to discussion about). I feel like the definitions of the delta and gamma functions from around the 20 minute mark are not completely clear. For example, the functions took a single set of variables upfront and, later, a set of two variables. Hence, I have no idea how that would generalize to sets of greater cardinality for higher dimension matrices. I understand that, for brevity, that the definition needed to be cut, but I would have liked a reference to check it so I could follow along with a sufficient basis for understand the derivation for the higher degree coefficients. Otherwise, like I said, this was a really good video with really great motivation and intuition leading to next steps. As for rigor, I refer back to the comment above and hope that it's not just my own brain not parsing your explanation correctly. Thanks so much for posting this!
Thanks for your comments! I did avoid the formal definitions in the video, but maybe I didn’t give enough examples instead. Anyway you can find the definitions in our paper: arxiv.org/abs/2409.02789
Wonderful!
It's so much fun to see original content, on original piece of math, both authord by the same person.
Bravo
Im an English major and this might be the most mind-blowingly beautiful video on RUclips
I was thinking about this problem invariants noticed some nice intuition for why determinants should pop up here.the short version is that the Sinkhorn limit is invariant under row and column scaling, and row and column scaling winds up scaling the coefficients of this polynomial, so the solutions of the polynomial are invariant.
Long version:Observe that if you scale the columns of a Matrix M any way you want to make new matrix M', then run the sinkhorn process on both M and M', beginning with a column scaling step, then the column scaling will give you the same result for M and M'. Then for the rest of the process they will match, so M and M' have the same Sinkhorn limit. This means that the Sinkhorn limit should be invariant under column scaling, and by a symmetrical argument, under row scaling. Now a determinant isn't invariant under such operations, but it scales according to the same factor you scale your rows or columns by, and so if every coefficient of the polynomial has determinants multiplied in just the right way, then the solution of that polynomial is invariant.
Fantastic video. I love the way your narrative carried interest all the way through, and the payoff at the end was worth it even as a viewer. I audibly chuckled when you (metaphorically) opened the hood to the car and revealed the engine of math underpinning the whole thing, what a mess lol.
I love this video! Super clear and exciting. Congrats on your research!
I was baffled by the fact alone that the algorithm always converges.
Anyway, congratulations to your awesome idea and that it worked out.
Constructive criticism: I like to see definitions for simpler notations, e.g. of \Gamma, M and \Sigma, when you introduce them. It helps me to follow.
Thank you! I struggled with whether to include definitions. Sigma in particular required much more time than I thought I should spend, but of course the risk of not including a definition is that it seems meaningless. The formal definitions are in our paper: arxiv.org/abs/2409.02789
Great video! I especially liked your storyline, which very well reflects the workflow of solving problems in math or related fields (minus the frustrating moments and dead ends😅). Reading papers often leaves people wondering how they could possibly came up with the results, but a paper is just the "reverse-engineering" of this workflow
Thanks! Exactly... it's a shame that we rarely get to see behind the curtain!
Me: "How did a youtuber do such intelligent mathematics?" Post-video me: "Oh, bro is a professor, not a youtuber". Fascinating work, thanks for sharing :)
Wow great video Eric! You introduced the problem and my first thought was that there is no way this should be unsolved for so long. Then you started writing out the details of the polynomial system and I had ptsd flashbacks at grobner bases. Excellent job making sense of the coefficients you got out.
If you’re looking for a similar problem for future students I have one from my research on quantum walks (a quantum analogue to random walks, shows up in quantum computing a bunch). Basically the spreading pattern of these things are solutions to these crazy polynomial systems so that even the simple cases which look like easy geometric shapes are crazy to solve. Let me know if you’re interested, I have a bunch of vids on my channel of these simulations
Amazing video! Beautifully explained
Fascinating look into the way math research is done and how progresses are made by educated guess from numerical approximation, symmetry of the problem, and just beautiful intuitions and lots of work.
Thanks, and sub 🙂
this is EXCELLENT work! well done!!
Awesome research and presentation. Ty
Great explanation and result, congrats 👏
Brilliant! Many brilliant moves you two made. This is good, fun, math. With good development like this, even if you have to go back and correct something and redevelop, you get there. 😊 But this seems solid… I love applications that involve the Grobner basis. You showed a good method for working more efficiently. Good work you two!
That is an insane formula.. but now make it fractional and with complex numbers and you'll have a glimpse of what we have to deal with when designing high order digital filters!
Back in the 1990s, I wrote a program on a NeXT computer to compute a Grobner basis. It ran all nigh at the Universityt. It swapped so hard, that the hard drive was destroyed. I eventually figured out the solution, though.
Why did I actually understand most of this wtf
I've never even worked with matrices before
thank you for making a video about this cutting edge research!!!
Great work! This looks like a very complicated combinatorial problem, I'm glad to see some progress in it. And I'd be surprised if the conjectures on the degree bounds aren't true. I hope you'll make another video if you can work out the deal with the sign alterations. On another note, I think I've found the formula for the 2x2 Kruithof limit. Though I didn't actually prove this is the right limit, I just assumed the conjectured degree bound and found it by polynomial interpolation.
Let M = {{a,b},{c,d}} be our matrix, with a,b,c,d positive, and let {m,n} be the row sum targets and {x,y} the column sum targets, with m,n,x,y positive. Then in order for the Kruithof limit achieving those targets to exist, we need m+n = x+y. Further, letting p = ad, q = bc, the value of the top left entry of the Kruithof limit is given by:
(p(m+x)+q(n-x)-Sqrt[(p(m+x)+q(n-x))^2-4p(p-q)mx])/(2(p-q))
One might notice the absence of y. That's because m+n-x-y = 0, so we can add multiples of this combination throughout the formula to re-symmetrise. This form without y was among the cleanest ways I could find to write it, though I imagine there's some more symmetric way to write it. This has worked for countless random matrices and random targets, so I'm fairly certain it's right, and if you plug in m = n = x = y = 1, you do indeed get the Sinkhorn limit formula, after some manipulation.
Actually, I just noticed a slight problem with my formula, it fails for matrices with determinant 0, which corresponds to the case where the entries of the limit are rational, as opposed to algebraic of degree 2. So the formula definitely needs to be manipulated to have it work for determinant 0 matrices, which would involve splitting the p-q in the denominator into (sqrt(p)+sqrt(q))(sqrt(p)-sqrt(q)) and cancelling down the latter factor anywhere it appears, which would make the formula look more like the Sinkhorn limit.
Perhaps it's nicer to just have the polynomial of which the top left entry of the Kruithof limit is a root. The polynomial is simply (p-q)t^2-(pm+qn+(p-q)x)t+pmx
4:54 If number of telephones in towns A, B, and, C all increase while the number of phones in D stays the same, why would the expectation be that there won't be more calls to or from town D? I can see how the assumption that calls from town D to itself might make sense as there's not an increase in households to call. But the table after the iterative scaling assumes those numbers will go down. Is the assumption that each household has a phone budget that won't go up even when more households they know are added to the phone network?
I think it's more accurately just described as to why models are only approximations of reality. As long as the deviation isn't too large in practice the numbers will still be useful.
But I too initially thought the video would be about analyzing those pathologies in the method, so I was hoping it would be addressed along with the analysis presented.
This is incredibly cursed. Congratulations on the paper and a very interesting video!
I got an ad break while that coefficient was scrolling at 15:53 💀
How could it be A->A (Ignore all other towns) goes above 1.5 x 3000?
Also, D->D shrunk? HUH? HOW? 160->124? With the same number of people in the town?
It doesn't make ANY sense.
I mean for one, like "If i'm in town D, and my number of options expand outside of town D only, then it makes sense for the number of inter-town d calls to slightly decrease"
But, This isn't being done the right way. What the right way is, I couldn't tell you, but this is not correct.
It's something like "The more people within someone's people- range there is to call, the more likely someone is to want to use the phone, so as we increase phone lines, people want to use the phone lines even more", but this isn't going to be mapped by some random matrix thing. It's just defintely the wrong way to approach the problem, as we aren't even mentioning at any point how the growth of the telephone lines impacts peoples wants to use the telephone, which is CLEARLY an essential element this is modelling.
This is also OBVIOUS to us today, that now that we have all the people connected on telephone lines, the same logic must surely NOT apply, imagine adding 3000 lines in alaska, would they get some exponential growth? Yes, but not the way it would be somewhere more close nit or as telephones were just being adopted. There's some element of "remoteness" and disconected groups that will impact these numbers, surely. And none of it's even being mentioned. It's just completely wrong.
There MUST be some "Connectivity constant" at play, that the matrix model might get close to or something, but its defintely not the same for every grouping of cities, and its also defintely not the same for every "phone" like problem. It's just some random matrix thing.
When phone companies used this there was a *lot* of money riding on it working. This is a toy example, but it gets the idea across. You can trust the full thing worked.
This kind of research is why engineers dreamt of working at Bell Labs. They where at the forefront of a lot of fields.
The math around optimizing utilization of phone lines is amazing.
Exactly, the assumption about how row and column sums scale makes no sense.
@@peterbonucci9661 I once wrote a toy tool in Excel that predicted traffic in an Ethernet-like network. It was atrociously simplistic and unrealistic, but it spew out numbers and people absolutely loved it.
@@peterbonucci9661 Imagine City A has nothing but shunin tech nerds who just want to get the new tech thing.
City B has a bunch of social people who want to be able to talk to their friends without the hastle of driving to peoples homes every time they want to talk to someone.
City A Goes up 50%, City B goes up 10%, because they are poor or something. The matrix will fail spectacularly.
Just think about it. The matrix would need to include some ratio to distinguish these cities apart from each other. It would need to only include "like" cities, or something that isn't being covered in the example. It's 100% not just throw every city in america into a 10000x10000 matrix.
I wouldn't say it's just a random matrix thing. I would argue it's rather a model assumption error. I see it as: based on the behavoiral situation in the A matrix how would the dynamics extrapolate to the situation in the S matrix. The "problem" lies within the assumption that the number of calls of D doesn't rise. As D are very antisocial people and they don't necessarily even like to talk to people from their own city. They now have a problem that people from A are now even more willing to talk to D people and well D people would rather talk A vs D and the total number of calls doesn't increase so D - D decreases
Same situation in A-A. As A people are very eager to talk to A people the total number can increase more than 1.5. They are even more connected now so given the last years model assumption we should extrapolate this further
All models are wrong :)
Beautiful solution and explanation.
I find the video fascinating. After trying myself, I only could do so much to find the first row and column if the original row sum is 4000 and column sum 4150 and the intended row sum 6000 and column sum 6225.
We can find the coefficient to be
Lcm 6000, 6225
over the
Lcm 4000, 4150
Which will be equal to 1,5
And if we plug it in, it becomes
3000, 1545, 975, 480 | 6000
1620
1080
525
------------
6225
Lovely video and good work. Enjoyed your paper.
25:48 for each, its just ðe n-1þ row of pascals triangle, but each term is squared
That was nice.
I wish there was a reminder of what are delta, gamma, M and sigma visible at 22:20, but the picture is already a bit crawded.
20:52 "[...] stroke of luck" I think you forgot a lot of work and a bit of genius
In case you are wondering, the numbers at the right of equal sign left bottom of screen starting from 5:15 are all within 0.00001 range from 3246.387.
Very interesting. I wonder if any computational optimisations come out of this research. So far it seems like the original iterative method would be much faster than evaluating high degree polynomials
Oh, this is exactly what I've been thinking about lately ...........unrelated field but same iterative process needed ...I was looking at Newtons method and other iterative approximations
I have used this method for years without knowing its name. Just a few weeks ago I'd tried to use it on a problem and had been dissatisfied with my results, but this presentation helped satisfy why I shouldn't have expected an obvious answer.
I assume there's a reasonable simple formula for the 2 by 2 case with non uniform marginal, i.e., non Sinkhorn marginal. There goes my weekend...
I should add that I file this method under entropy maximization under side constraints....at least in certain contexts.
Here's an interesting result, that you probably noticed: The Sinkhorn limit of "next year's" table is identical to the Sinkhorn limit of "this year's" table.
In fact, this seems to be true for ANY (non-zero) target values for the rows and columns. They don't even have to follow the "multiply the row and column sums by the same value" rule Kruithof's method uses. It looks like you can start with a table and perform Kruithof's method on it multiple times and it will always have the same Sinkhorn limit. More generally, If you have a matrix and two sets of targets A and B, you can use Kruithof's method to match targets A, then use that result to match targets B, and it would be an identical result to just matching the original matrix to B.
So this ALSO means that if you take the Sinkhorn limit of this year's data, you can use that with next year's targets and get the same result. Which sounds interesting but might not be helpful.
Now, I'm not a mathematician, I'm just playing with some computer code and noticed this. I could be completely wrong and I just happened to find situations where this worked.
Impressive bit of work.
@EricRowland - I love how you did this work with gifted high school students, as this will really help jumpstart their careers! Being a high school student that is published in peer reviewed journals must be incredible. Out of curiosity what software were you using to assist you in these calculations? Matlab, or custom programs in python, or did you have to use a compiled language or run on the GPU due to the extreme memory and CPU usage? And were you running this on a cloud computing service, or on a private cluster you owned or had access to? As a network engineer and sysadmin with a desire to get into HPC I’m very interested in how you did the computational aspects of this. Thank you for your work and God bless you! ❤
Oh also, I love this subject matter of this problem, since as a network engineer, I got my start working in the industry working for a legacy telecom provider, specifically Ghana Telecom. I worked for them as a consultant during the preparations for their merger with Vodaphone. A Nigerian mentor of mine, who had previously worked for Verizon and before that, for the Federal Reserve Bank in the US, who had been my professor of economics and econometrics in college, specialized in econometrics relating to telecom network demand, specifically forecasting the need for backbone infrastructure (at the time, there was a still a glut of dark fibre due to the dotcom bust in the West, but in Ghana, there was not quite as much good backbone infrastructure as one would prefer, and furthermore, the equipment was not what I would have preferred - non-redundant switches from Alcatel were handling both voice and data traffic because they were cheap, but for IP operations they were very limited, and there was in particular a lack of good Cisco and Juniper routing and switching, and at the time there was almost no mobile broadband, so the GSM based cellphone network, while modern as far as voice was concerned, was not able to do much more than handle voice and SMS, and broadband internet access broke down outside of the limits of the major cities, and the switched telephone network was a mess - nonetheless, we could tell from all indicators that demand was steadily rising, and the real challenge was financing the capital improvements needed to meet that demand, which is why the system was privatized and sold to Vodaphone - there were several competitors in the mobile phone space but Ghana Telecom had a monopoly on land lines, and this created the basis by which mobile data was implemented in the coming years. After I left Ghana I moved into network engineering for colocation providers, which is a lot of fun as I continued to do BGP routing, which is the mathematically elegant protocol that enables the Internet to work as a confederation of independent networks, or “autonomous systems” as we call them in BGP parlance, without each network having to know any details about the internals of other autonomous systems. But within an autonomous system, there are equally interesting routing protocols like OSPF, IS-IS and Cisco’s proprietary EIGRP, as well as the dated, and nowadays, rather aptly named RIP (Router Information Protocol, but also it Rests In Peace). The math of these systems I find quite interesting, but to an artist like you it might well be boring and trivial.
Thanks! The software we used was Mathematica. The hardware wasn't anything special, just a laptop. For the Gröbner basis computation I talk about in the video, it was more about being clever with the settings than having a lot of compute power. To guess the formula for larger matrices (which we talk about in the paper), we used 1.5 years of CPU time to iteratively scale matrices and run PSLQ on the results, but again most of that was done on a laptop running jobs in parallel.
@@EricRowland well I guess I made the classic sysadmin mistake of over-estimating what mathematicians need for most things - a Freudian slip on our parts because we want more expensive and exotic servers to play with, and mathematicians like yourself who can justify resources create the raison d’etre for us to buy more toys, and those of us who are good pride ourselves on providing a reliable service, even if it is massive overkill. Software developers on the other hand tend to be a lot more conservative, and in the case of some developers, way too conservative, about hardware requirements.
I disagree with the assumption that the # of telephones staying the same in some town means that the # of incoming and outgoing calls stays the same. Even if there are no additional people with telephones in that town, there are still more people in the other towns for those people to call, and more people in other towns that may want to call them. With calls taking place between towns, the call volume should also reflect growth elsewhere in the network (Calls within the town could stay the same though)
In the beginning example, this assumption actually causes the calls within town D to go down, in order to keep the total calls constant while volume from elsewhere increases. Calls between C and D also goes down despite and increase within C, and between B and D barely increases ay all. These results seem odd, and the assumption is basically that each individual telephone will make and receive the same number of calls each year which seems unrealistic to me.
I would not get too hung up on the exact method to estimate call growth as it is not the important part here.
That said I think you might agree with the following statements.
The people in all the towns have the same kind of behaviour. They are no more talkative or more stand-offish.
A) Make the assumption that people in all towns make the same average number of calls per day.
B) Make the assumption that people in all towns are just as likely to receive as to make a call.
The two assumptions seem reasonable if we assume no town is more grumpy or talkative.
Assumption A means outgoing calls scale with population.
Assumption B means incoming calls scale to march them.
Any other model you will get towns which have "more chatty" people and that seems a little odd.
I also got hung up a little bit on this when seeing that D, taken to be a closed system, makes fewer internal calls when other towns increase their phone traffic. Your reply helped me see that, vaguely, the extra calls from A-C that the folks in D have to contend with will crowd out their willingness to call other D.
@@biggiemac42 you could justify it as when the other towns got bigger then D town made friends outside their neighbourhood and so made calls to their neighbours in D town less.
In reality it gets complicated - Like maybe the towns get cheap local calls. Maybe A town and B town are really close so they have lots of friends. Maybe C town has a call centre so makes loads of outgoing calls.
You sacrifice a bit of reality to get a beautiful simple mathematical model. :)
Yeah I'd wager that through the Network Effect the utility and therefore the call numbers increase even in towns without an increase of internal nodes. So the model is definitely interesting, but it fails to account for that aspect of reality. The math is great though.
@@richardclegg8027 While I agree that this is beside the point for the video, I have a really hard time accepting those assumptions. It's not really about chattiness, but availability. Put simply, people will likely spend more time on the phone if they have more people to call.
To make this clearer, take an extreme example. If everyone in my town has a phone, I will find reasons to phone them - perhaps because it's quicker than making house calls, or whatever. On the other hand, if only one other person in my town has a phone, I'll only use it rarely - not because I don't want to, simply because there are limited opportunities to do so. So I think it's quite reasonable to expect phone owners in these two towns to spend very different amounts of time on the phone. Real life examples wouldn't be so extreme, and I'm intentionally ignoring inter-town traffic for simplicity, but you get the idea.
Again, I'm not knocking the maths in any way - it's a fascinating result - but applications in the real world are inevitably messy.
Inspiring in so many ways. Thank you
15:00
To understand why it's degree 6, not 15
Its because for an n x n matrix we have n + n constraints so 2n variables. Basically we need a normalization factor for each row and each column so for the element in (I,j) you take the original and multiply it by the normalization constant related to row I times the constant for column j.
Still seems pretty basic.
Ngl my pet peeve is all these youtube math videos trying to make simple concepts look complicated with unnecessary terminology. Hope this isn't one of those cases 18:00
That was superb; very interesting stuff and clear explanation.
Thank you!
1,2,6,20,70,.. are the type B Catalan numbers. One can perhaps get the coefficients by taking a sum over all type-B set partitions.
Oh, I thought there would be a simple expression for a general limit via Sinkhorn limit. But I couldn’t guess what it’d be so I decided to wait until the end of the video. It ended up both surprising and not surprising that there’s none known yet!
I guess a useful question to ask here would be does taking Kruithof limits with different row and column sums commute? If it does, probably we can always start from a bi-stochastic matrix as mentioned in a comment by @tzimmermann. Then ideally we could compose two polynomials.
I guess the nature of coefficients before Σ might be crucial for working out what’s happening in the general case...
I wonder which kinds of linear algebra are still not applied to this problem. Having determinants like these alludes to maybe try and make this coordinate-free in a language of exterior algebras. But it’s not obvious what unexpected things just a mere reformulation would give.
Beautiful video, very well done! I felt the discovery
Thanks, that's great to hear!
Of course, we don't expect a closed-form solution in general. We will have to use numerical methods. You did pretty well though.
Amazing research!!! Congrats!
Iterative methods! This makes me think, it would apply to network traffic, but less obviously to economics and social networks.
Such a great work. You should feel proud of yourselves 🎉
Mind blown! That was a very interesting explanation.
Now, from a real word performance point of view, is the Kruithof's method faster of slower? I'm guessing it's likely faster, and since it's all about estimations, maybe it can be stopped before it fully converges and still be "good enough".
Wonder how the uncertainty margins in the original data and predictions, impact the final formula.
there exists a sinkhorn limit to the solution of kruithof's algorithm, which just scales each row and column of the solution by a number. maybe a relation between that and the expected sums can be found, in which case finding the solution would be trivial
another great discovery!! good job!!
My take on the problem (most likely not new): rows scaling of matrix A is equivalent to left multiplication by doagonal matrix d, namely A -> dA. Columns scaling A -> Ad'. All infinitely many scalings lead to ...d_4 d_2 A d_1 d_3... = D A D'. For N towns this transformation has 2N parameters (N diagonal entries of D and N of D'). We have 2N coupled quadratic equations for those parameters (N row sums and N column sums). Enough! We take only all-positive solutions. It doesn't prove there is only one solution, but for sure - finitely many.
Just finished the video. Another great one from this channel!
I made it 18 minutes before my brain fully melted into a warm puddle. I feel pretty good about that.
cool piece of exploration dude