Hi everyone, thanks for the positive response! ERRORS IN THIS VIDEO: - At 12:50, the formula for odds transitivity should read wij wjk = wik - there is an extra equal sign (pointed out by a commenter below, thank you!). FAQ/STUFF I DIDN'T TALK ABOUT: - "I heard that modern Elo is the same as the old one, just they swapped out a logistic for a normal distribution?" Yes - but it's important to be precise about what that means. The Thurstone model involves the players generating independent random variables X and Y, and the first player wins if X > Y. Under the Thurstone model, X and Y are normally distributed. If you instead sample X and Y from a logistic distribution, you *do not* get a Bradley-Terry model. To get a Bradley-Terry model, you have to use something a little more obscure called a Gumbel distribution. The reason a lot of sources say that modern Elo is "the same thing, but they switched to a logistic distribution" is because both the Thurstone and BT models are of the form p = f(R1 - R2) for some function f. For the Thurstone, f is the CDF of a normal (the error function). For the BT, f is the logistic formula. So you are literally swapping out the function f. The two functions are numerically very close, so from a practical point of view this is kind of a technicality. But the mathematical motivations for them are very different. - "What about draws?" Draws are kind of tacked on as an ad-hoc extension (at least if you motivate the Bradley-Terry model the way I did in the video). You can see that in what I call the Zermelo model here, with random outcomes based on player strengths, there are explicitly never any draws, so extending the system for draws necessarily breaks that mental model. All you do is you say that if a player wins they get a score of 1, if they lose they get a score of 0, and in a draw both get a score of 1/2 (this seems somewhat arbitrary, but it's important that it have the property that the scores of the players always add to 1). Then your rating update at the end is k(s - p) where p is your estimated "win probability" (which is now more properly thought of as your expected score). The draw-free case is then a special case of this. - I saw an upvoted comment somewhere unsure what I meant by "the update algorithm" in Part IV. I just mean the rule by which you update a player ratings after each game, DeltaR = kp.
@@Leon-qp9xy That's supposed to be one of the selling points of a model like Elo, is that it can keep up with changing skill. So it's funny that that feature isn't really explicitly designed into Elo - it's a kind of incidental benefit. I think basically the idea is just that if player skill changes reasonably slowly relative to how often games are played, then you can kinda imagine breaking the game history into "periods" in which player skill if constant, and Elo will converge within each period. Or alternatively, if a player's skill changes drastically in a short period of time, then once their true skill has settled a bit and stopped changing, we're basically just running Elo on them again, just that they have a different initial rating. There is surely some breaking point where if player skill fluctuates too wildly, Elo can't keep up - there are probably papers on this, I haven't looked into it.
My favorite type of chsnnel: - Pleasant calm voice - Dark Mode - Explains something interesting If we get this quality for your 1st video I can't wait to see your 10th or 100th video!
Man this is your 1st video (probably not, according to professionalism in result) yet it is very well done, good luck to you and your channel, you deserve it, keep rocking!
@@littlehorn0063This seems like confirmation bias to me, most new channel vids aren’t this good. You’re just more likely to be recommended the good ones.
@@DevenRamchandranit would only be confirmation bias if he sought out proof for the statement he said he didnt so its survivorship bias (from what j know)
This is hands down the best video about Elo rating-congratulations! As someone who studies these kinds of models, it is very satisfying to watch a video crafted with such care and detail. Thank you very, very much. A couple of comments: In Part IV, it is implied that the algorithm converges when run for a long time. However, since the ratings oscillate back and forth due to randomness, it is not clear what this "convergence" means. All one can hope for is that the expected values of the ratings converge to their true values. Unfortunately, the ratings are typically biased-that is, their expected values converge to something else, which is close but not equal to their true values. This has been demonstrated numerically; I can share some references if you're interested. The good news is that the predicted probabilities of winning do converge to their true values. In this sense, the Elo rating system gets the job done. =) Again, congratulations and thanks for the hard work! I'm looking forward to your next video.
Does this bias involve any sort of tendency for outliers (people with very high or very low true ratings) to lie off of the dotted line in the simulations in the video? What I've noticed in some simulations is that sometimes it seems like the ratings converge to a kind of sigmoid shape. You can actually see this happening with the k = 10 simulation at at 17:10. It often seems like the slope near the edges of the distribution converges to something other than 1, if you see what I mean. I was never quite able to tell if this was a real phenomenon, if it just meant I needed to run the simulation longer, or if it was even a bug in my code (which is linked in the description, for anyone wondering).
@@j3m-math I'm pretty sure that the algorithm silghtly over-estimates the rating of an above average player, and viceversa. Regarding your simulation, I believe that for k small the algorithm needs to be run for longer to achieve equilibrium. I had the same problem with my simulations.
I love that you explain math in such an understandable and concise way. It's very hard to make something complex simple, so you're doing outstanding work. Keep it up
This is an amazing video omg. Im pretty sure many math enthusiasts are gonna enjoy this pop up in their YT feed. Would you ever consider doing a video on the Glicko rating system? Both Glicko and Glicko-2.
This is by far the best explanation about the ELO rating system, I've seen. And yes, the mix-up between normal distribution and logistics function (Part V) cause me great confusion in the past. Thanks in particular for covering this part!
Respect you cracked it with your first videos really informative and it's really got deeper work into it like wachting old paper to explain the tell Modell
Did you miss the part where this is a 2 player 1v1 model? I don't know if you meant league or valorant but either way, originally mmr was a lightly modified version of elo early in league's life. We don't know what or how they tweaked their parameters at this point. And it's not because it's a team game that elo doesn't work, it's because the teams vary. So you need 2 functions to distribute points, 1 that scores you on your teammates and another that takes the generated team score as a normal elo system. At the very least. Then at game end elo distributes and team system give you your share.
Hello, this is Gragas from the Rito support team. We've taken a look at this video, and we still dont get it. We just randomly take away and give mmr and watch our players smash their keyboards for the lols. We are still very grateful for your feedback and will do nothing about this and focus on selling you gacha skins. -Gragas, Rito Support Team
This was an incredible video to me. I did not expect math to this detail, but I'm very glad it was this detailed, because now I can actually say I understand how Elo works, which is what this video promisses to explain
omg i watch a lot of explaining videos, and you are AMAZING!!! there are so many people with good stories to tell, although maybe only a few best stories per person, but those are are so powerful. your 1st vid (?) too so i am curious and gratitude for your work, hope to see you again soon!
honestly love chill videos like this that mix in some cool little graphics to explain themselves better. its literally the perfect way for me to absorb information :3
Great video Your explanations are very clear and the structure of the video is organized very well. The visualization is done in the same style. And this style is very nice. Great job, thank you
I can't believe this is your first video, congrats! I wonder if someone has ever played with the idea of a "normally distributed Elo system" but with variable variances for each player instead of a constant one. To me that seems like the most natural way of representing player strength (since two players could have the same average "strength" but one be much more inconsistent than the other) Edit: looks like the Glicko system is almost exactly what I was describing
Great video and masterfully animated! Love it! I didn't know Elo's model was different from what we know as Elo system today (so I guess the habit of some people to capitalize letter in it - ELO - is somewhat justified?). Would love to see a similar analysis on newr systems, like Glicko/Glicko-2
I'd be interested in a similar explanation for Glicko's various incarnations and a comparison between them and the different models at play here. This was a fun video to watch.
20:30 Actually I learned in a probability class that logistic probabilities can be derived using a normal distribution. Here's how: Suppose you are sending a signal +1 indicates a win and -1 indicates a losses. but the signal is noisy so has a standard deviation of sigma. Then given the measured signal you can find the probability of a win or a loss to be logistic with a scaling factor related to sigma and the difference between -1 (loss) and 1 (win). This can kinda of help us understand the underlying mechanism of chess if we let 0 also represent a draw then the reason GMs have more draws is because their sigma is smaller than low rated players. However this model is kind of complicated when you allow players to have different sigma. I've always been intrigued by how probabilities 0 to 1 are so much more complicated that normal variables that are natural and have never found a satisfying way to combine the two since there always is something that isn't natural. Elo/log odds comes close but the issue is that you can't find a formula for performance rating based on game results and varying the K-factor based on (inverse of) games played is somewhat janky. Glicko fixes some of these issues into a unified model but creates the issue that there's no precise formula, it's just an iterative approximation that can be arbitrarily close. One alternative I've considered and found more mathematically "beautiful" is thinking player (or Team in my use case of sports leagues like NBA/NFL) skill is fundamentally unknown but can be narrowed down based on their results. They lie on a 0 to 1 scale in my model where 0 is the 0th percentile and can and 1 is the 100th percentile. So 0.5 is the median player performance. The model I layed out works on a framework that a win or loss is deterministic not probabistic so if A beats B we know for certain that A was better than B at least for that 1 game they played. Then to account for player talent changing over time you need to take all these snapshots in time and line them up to find the natural variation in player talent and the actual true change in skill. Anyways this is an area I've been obsessed with forever and it's always unsatisfying how much simpler the computations are when we're not working on a binary win-loss result system but rather a linear one on the scale of the ratings. For instance the way performance ratings work is a simple average of each game performance where a win is opponent + 400 and loss is opponent - 400. This is just a rough approximation of Elo that has the benefit of being easier to compute but the downfall of not working at all when the rating gap is larger than 400
The transitivity assumption is interesting and I believe it is violated for real chess players. Players like Hikaru sometimes "farm Elo" by playing against lower rated players. If transitivity was correct farming shouldn't be possible, but in reality the actual probablility of a strong player beating weaker players is apparently higher than predicted by Elo. Furthermore almost all matches are between players of the same/similar rating. This gives very little data to keep the ratings consistent across large ranges of Elo. This can lead to being stuck in "Elo-hell" where a bunch of low rated players keep playing amongst each other while getting stronger, but only trading points amongst each other so their rating does not improve with their actual strength. In general there can be different rating inflation/deflation in different sectors of the rating range. If matchmaking was worse (i.e. pairing players of different skill more often), then the Elo system could potentially be more accurate.
20:00 it can be equivalent if you use a logistic distribution instead of normal since it's very similar, just has slightly fatter tails indicating upsets aren't as impossible as the normal model indicates
it can be equivalent if you use an extreme value distribution for both elo, because if X ~ EVD and Y ~ EVD then Y-X ~ Logistic, and we need Y-X to be a logistic curve, and not X or Y.
Great video, it really showed the fundamentals of probability. If im gonna be honest, I didn't know what odds were since the term is so saturated in gambling lol. Did you use manim for this btw?
Yup, manim and Flash. I also never really knew what odds were before doing this project! That's why I felt the need to include a discussion about them.
If one where to simulate strengths from the historical Elo and use the algorithm of the modern Elo, what would the results be? In other words it would be interesting to investigate how robust the Elo system is to different models of the true strength data generating process.
What an amazing video! clear visual style aids the explanations perfectly. I liked it a lot, and understood everything. I am only left with a question why didn't the original Elo rating based on the normal probability of performance ever catch up? is the assumption not correct? does it converge more slowly? I guess I'm gonna research those question son my own.
An excellent question. While I don't know what the complete answer is, I can say this: the integral of the Gaussian distribution at 19:38 (i.e. the cumulative distribution function of the Gaussian distribution) is what's called an Error function, denoted erf(x). The Error function belongs to a class of functions known as sigmoids, named as such due to their resemblance to the letter "s". The Logistic function is also a sigmoid, so already roughly the same shape. But in fact, the derivative of the Logistic function, somewhat confusingly called the Logistic distribution, is actually very similar to the Gaussian distribution - with the only qualitative difference being heavier tails. So in short, the Logistic function model should yield very similar results to a model based off the Gaussian distribution, with discrepancies most noticeable when the difference in player ratings is large.
Unlike the logistic, the integral of the Gaussian does not have an explicit formula that's easy to evaluate. That's not an issue now that we have computers in our pockets, but it certainly was back in the 1970s. This might explain why people preferred the logistic over the error function.
Man this is the best video I have seen on the Elo system. I love it. Do you have an explanation on how physicists may have come up with this model from their physics knowledge? It looks like the Fermi function in Fermi-Dirac statistics?
I didn't dive too deep into the history, but I think basically Elo used the Thurstone model originally, then they swapped in a logistic for a normal distribution later. To be clear (I've seen people get confused about this, including me): this does NOT mean that the Zermelo/Bradley-Terry model is equivalent to both players generating numbers from a logistic distribution, and then the bigger number wins. It's just that the Thurstone and BT models are both of the form p = f(R1 - R2), where f is an increasing function from R into [0, 1]. For the Thurstone, f is the CDF of a Gaussian, and for the BT it's a logistic function. I think they basically just swapped in a different f that gave better results. As for what motivated this specific choice of f... I don't know. In this video I motivate it the way Zermelo does in his paper, using this notion of "strengths", but I'm sure there are different angles that could be motivated by statistical physics as you suggest. I suppose you could start by looking at sections 8.3 and 8.4 of Elo's book (Elo 1978). In those sections he also cites (Elo 1966) and (Berkson 1929, 1944). I haven't looked into those. References: - Elo 1978, The Rating of Chess Players Past and Present - Elo 1966, Use of the Standard Sigmoid and Logistic Curves in Pairwise Comparisons (sounds very relevant!) - Berkson 1929, Application of the Logistic Function to Experimental Data - Berksen 1944, Application of the Logistic Function to Bioassay
@@j3m-math It's true that the Zermelo model is not equivalent to both players generating numbers from a logistic distribution with the bigger number winning, but there's something almost as good. The issue is that the difference of two iid logistic random variables is not logistically distributed, but if we could find a (family of) distribution(s) (say D) such that when X ~ D(a), Y ~ D(b) independent, then X - Y ~ Logistic(a-b,1), we'd be all set. The Zermelo model would then be equivalent to player 1 generating X, player 2 generating Y, and the largest number wins. It turns out there is a distribution that works, called the Gumbel distribution. You can go the other way too. For example, Elo's original model has a nice transitivity property; knowing p_ij and p_jk uniquely determines p_ik.
I used to captain a rec-league tennis team, and there are so many rating systems for tennis players, but I created my own plain old straight-up chess-style Elo scores for every player in the league, and it predicted match results with 80+% accuracy. All the other scoring systems I could find were much worse at predicting outcomes. Even though they all claim to be some derivative of the Elo system. Which I have to question if they really are, or if so, that they modified it beyond all recognition. Because the straight old-fashioned chess-style Elo score worked brilliantly for me. Outside of the first year when I was still building my team, every season after that first year we made the play-offs, and almost every season we won the division. And we weren't always necessarily the best team, but I knew who to put where based on the Elo scores I saw in my spreadsheet, and that let me game the lines just right to squeak out wins against arguably better teams. Never did manage to win a city championship, but we made it to the finals more than once, but at that point you're playing teams that are loaded with first-season self-rated players who clearly lied on their self-rating questionnaire in order to play down a level or two just so they could create a team specifically to win the city, i.e. teams that cheated. All my Elo scores tell me at that point is that the team we're facing is a cheating team and that we really don't have a chance. Of course I can't tell my team that, but I can see it plain as day in the numbers. And unfortunately, as obvious as it may be to me, and to anybody who actually watches the matches in person, the league administration doesn't watch the individual matches in person and is completely head-in-the-sand about the problem of people cheating on the self-rating questionnaire. They refuse to even entertain such a thing is a possibility, even though every single captain on the courts, and most of the players, know it full well. Ultimately that's what lead me to give up on rec-league tennis. Too much cheating, no administration willingness to acknowledge it. Ok then, I'll go find something more fun to with my time and money.
This is what school should be. I learned how to calculate probability while not focusing on calculating probability. Also the difference between probability and odds WHILE WERE NOT EVEN TALKING ABT EM
Great explanations overall! I just had a couple small gripes. The first is more of a perspective shift than anything (I'm an engineer, not a mathematician), but in your simulation where each player has a "true rating" you mention an offset C, which should equal the average true rating. I found the description you gave to be pretty interesting, since it seemed to be almost entirely motivated by the mathematics, but to me the more useful interpretation is that universal ratings don't exist, and each pool of players can only be compared against players in that same pool, as the rating estimates by definition calibrate to a previously chosen center point, regardless of the true ratings in that pool. This raises some interesting follow-up questions as well, like what happens when players are matched to prioritize close ratings, as that will likely create some kind of locality within the same pool as well. The second gripe is the assertion that historical elo and modern elo are totally different systems. The reasoning you give for this is that historical elo assumes outcomes to be normally distributed, and modern elo assumes they're logistically distributed. While this is strictly true, the difference between the two is more of an implementation detail rather than some fundamental difference as it's possible to get very similar results with both. Perhaps this pedantry is warranted, especially for anyone looking deeper into it, but some mention of their similarity would have been nice.
Indeed, the Thurstone and B-T models are very numerically similar. I felt it was important to be very explicit about the difference because I've seen confusion about this online - I considered talking about how close they are, but to be honest, I just left it out to keep that section a bit shorter. I think I mention it in the pinned comment, but I'll add it in if not. Your first observation is very valid. Of course, in our case the true ratings do literally exist in at least the sense that exist as variables in the code.
@j3m-math Note that I'm not saying true ratings don't exist (one could argue for or against them) but rather that elo scores are not "universal", as depicted in your graphs where C can vary when the average true scores are different. This is to say that someone at 1800 elo in one pool is not comparable to someone at 1800 elo in another, as their true skill may be different.
great video! I always wondered how to derive elo system. btw, why did elo assume the variance stays the same? what if someone is more consistent in their play, while other players are more chaotic and have 'good' and 'bad' days? if we relax this assumption, do we get a more accurate model?
I don't know about Elo specifically, but statistical models are often chosen because they are simple to work with and understand. Also, this was at a time before the PC was invented, so a procedure that players and organizers could easily do and verify by hand was important. Newer models by Mark Glickman (Glicko and Glicko-2) account for the fact that we are less confident about a player's rating when they are new or play infrequently and that player consistency can vary. These are actually the models that the major chess sites use behind the scenes although players tend to still refer to them as their Elo ratings. One of the major benefits is that new players' ratings are adjusted much more quickly. So, even if a player starts at 1500, they can drop to 500 or shoot up to 2500 in just a handful of games. Their ratings will change by much smaller amounts once the system has a stronger estimate. Although, official systems also handle new players by using special procedures, but Glicko is a more theoretically rigorous way to handle it.
But how does the Thurstone model work in practice? I understand that it is mostly impossible to see which model is the most correct, but how do they differ and what are the strengths of both models? Great video and I hope you upload more!
According to Wikipedia, Elo is the guy who outlined the properties of what would be a good chess skill rating system. Thurstone was what he had in mind when constructing the outline of the properties, but the downside is the math is hard. So he proposed to use the simplified model that is known as the modern Elo. Regardless, both models are simplifications of the reality. As wiki says: "chess performance is almost certainly not distributed as a normal distribution" and "In paired comparison data, there is often very little practical difference in whether it is assumed that the differences in players' strengths are normally or logistically distributed". So basically we are using a simplified model of a simplified model, but it is decent and very simple so we like it.
17:07 it is very interesting how those edges curve, this seems to indicate that low rated players are slightly overrated while high rated players are slightly underrated.
If im understanding correctly, since total rating points are maintained, the offset C between the elo rating and actual rating is easily computed as the starting elo minus the mean of the actual ratings. Ie, C = 1500 - mean(sum of actual ratings)
Hay, great video. Really enjoyed watching it. I was wondering about some other use cases of elo rating. Like in puzzle chess games where I would presume each puzzle gets a rating of its own and win or lose rating according to result of player getting the puzzle right or not. Or the rating system in the context of competitive programming, like in codeforces. It would be fascinating to see a video going deeper on less known use cases of these systems. thanks again for the video, great job.
I would have appreciated a simulation where you run the historical Elo (Thurstone) algorithm on the Zermelo model, or run the Zermelo algorithm on the Elo model.
Very good Video, nice Animations and Understandable explanation for everyone. The only thing that's a bit bothering in my opinion is that your Voice sounds to "Sharp".Maybe it's because suboptimal Mastering of the Audio or sth. because your voice tone and expression itself is very good for reading and explaining
Interestingly, on the gaming website I use a lot not only doesn't let your ELO go below 0, but once you get to 100 ELO for a game, you can never go below that, even if you then repeatedly lose at that game. I assume this is because this website has an arena mode and it requires 100 ELO to play a given game in arena mode, and someone probably decided it wouldn't be fair to take away the arena mode privilege once it was already earned.
Hi everyone, thanks for the positive response!
ERRORS IN THIS VIDEO:
- At 12:50, the formula for odds transitivity should read wij wjk = wik - there is an extra equal sign (pointed out by a commenter below, thank you!).
FAQ/STUFF I DIDN'T TALK ABOUT:
- "I heard that modern Elo is the same as the old one, just they swapped out a logistic for a normal distribution?" Yes - but it's important to be precise about what that means. The Thurstone model involves the players generating independent random variables X and Y, and the first player wins if X > Y. Under the Thurstone model, X and Y are normally distributed. If you instead sample X and Y from a logistic distribution, you *do not* get a Bradley-Terry model. To get a Bradley-Terry model, you have to use something a little more obscure called a Gumbel distribution. The reason a lot of sources say that modern Elo is "the same thing, but they switched to a logistic distribution" is because both the Thurstone and BT models are of the form p = f(R1 - R2) for some function f. For the Thurstone, f is the CDF of a normal (the error function). For the BT, f is the logistic formula. So you are literally swapping out the function f. The two functions are numerically very close, so from a practical point of view this is kind of a technicality. But the mathematical motivations for them are very different.
- "What about draws?" Draws are kind of tacked on as an ad-hoc extension (at least if you motivate the Bradley-Terry model the way I did in the video). You can see that in what I call the Zermelo model here, with random outcomes based on player strengths, there are explicitly never any draws, so extending the system for draws necessarily breaks that mental model. All you do is you say that if a player wins they get a score of 1, if they lose they get a score of 0, and in a draw both get a score of 1/2 (this seems somewhat arbitrary, but it's important that it have the property that the scores of the players always add to 1). Then your rating update at the end is k(s - p) where p is your estimated "win probability" (which is now more properly thought of as your expected score). The draw-free case is then a special case of this.
- I saw an upvoted comment somewhere unsure what I meant by "the update algorithm" in Part IV. I just mean the rule by which you update a player ratings after each game, DeltaR = kp.
What about non-constant Strengths (players getting better with time/games played) ? Is that a problem for the model ?
@@Leon-qp9xy That's supposed to be one of the selling points of a model like Elo, is that it can keep up with changing skill. So it's funny that that feature isn't really explicitly designed into Elo - it's a kind of incidental benefit. I think basically the idea is just that if player skill changes reasonably slowly relative to how often games are played, then you can kinda imagine breaking the game history into "periods" in which player skill if constant, and Elo will converge within each period. Or alternatively, if a player's skill changes drastically in a short period of time, then once their true skill has settled a bit and stopped changing, we're basically just running Elo on them again, just that they have a different initial rating. There is surely some breaking point where if player skill fluctuates too wildly, Elo can't keep up - there are probably papers on this, I haven't looked into it.
My favorite type of chsnnel:
- Pleasant calm voice
- Dark Mode
- Explains something interesting
If we get this quality for your 1st video I can't wait to see your 10th or 100th video!
"Channel"
@OvejaGD pls don’t correct people’s typos. It just makes you look bad, since it doesn’t add anything, or help anyone.
Wow, Better interact with this comment to ensure more eyes get on this video. Also so I later get more of these recommended (if they make more)
@@builderboy0251 good job on your spelling 👏
@OvejaGD bro got verified with 50 avg viewers 🤣
Man this is your 1st video (probably not, according to professionalism in result) yet it is very well done, good luck to you and your channel, you deserve it, keep rocking!
Most of new channels have really well made videos. It's a new standart now, not really suprising anymore
@@littlehorn0063This seems like confirmation bias to me, most new channel vids aren’t this good. You’re just more likely to be recommended the good ones.
@@DevenRamchandran shouldn't it be survivorship bias?
@@dinhero21 I think both are applicable
@@DevenRamchandranit would only be confirmation bias if he sought out proof for the statement he said
he didnt so its survivorship bias (from what j know)
This is hands down the best video about Elo rating-congratulations! As someone who studies these kinds of models, it is very satisfying to watch a video crafted with such care and detail. Thank you very, very much.
A couple of comments: In Part IV, it is implied that the algorithm converges when run for a long time. However, since the ratings oscillate back and forth due to randomness, it is not clear what this "convergence" means. All one can hope for is that the expected values of the ratings converge to their true values.
Unfortunately, the ratings are typically biased-that is, their expected values converge to something else, which is close but not equal to their true values. This has been demonstrated numerically; I can share some references if you're interested.
The good news is that the predicted probabilities of winning do converge to their true values. In this sense, the Elo rating system gets the job done. =)
Again, congratulations and thanks for the hard work! I'm looking forward to your next video.
Does this bias involve any sort of tendency for outliers (people with very high or very low true ratings) to lie off of the dotted line in the simulations in the video? What I've noticed in some simulations is that sometimes it seems like the ratings converge to a kind of sigmoid shape. You can actually see this happening with the k = 10 simulation at at 17:10. It often seems like the slope near the edges of the distribution converges to something other than 1, if you see what I mean. I was never quite able to tell if this was a real phenomenon, if it just meant I needed to run the simulation longer, or if it was even a bug in my code (which is linked in the description, for anyone wondering).
@@j3m-math I'm pretty sure that the algorithm silghtly over-estimates the rating of an above average player, and viceversa.
Regarding your simulation, I believe that for k small the algorithm needs to be run for longer to achieve equilibrium. I had the same problem with my simulations.
I love that you explain math in such an understandable and concise way. It's very hard to make something complex simple, so you're doing outstanding work. Keep it up
Nice video!
IIRC modern games often use the glicko rating system insteadof elo. Might be material for a future video
Came down to suggest a video about the differences between Glicko and Elo (as well as other, less common rating systems)
As well as some bayesian approaches. (microsoft Trueskill works like that IIRC)
@@roderik1990 ELO is inherently a Bayesian approach; TrueSkill is just 'more Bayesian'
This is an amazing video omg. Im pretty sure many math enthusiasts are gonna enjoy this pop up in their YT feed. Would you ever consider doing a video on the Glicko rating system? Both Glicko and Glicko-2.
Incredible video! Would be facinating to see how the simulation changes with skill based matchmaking instead of a randomly selected opponent
This is by far the best explanation about the ELO rating system, I've seen. And yes, the mix-up between normal distribution and logistics function (Part V) cause me great confusion in the past. Thanks in particular for covering this part!
I loved this video! The fact you concluded by dispelling common misconceptions was very welcome.
This video is of such high quality it deserves a 3B1B's math video award! What a great work of excellence. You've just earned yourself a subscriber.
Keep up the great work! I love the effort that you put into the video. Very impressive for your first upload!
Respect you cracked it with your first videos really informative and it's really got deeper work into it like wachting old paper to explain the tell Modell
Wow 2k subscribers putting something out this good. I look forward to seeing what you do in the future.
THIS IS YOUR FIRST VIDEO??? It's amazing!!! Absolutely subscribed and can't wait for the next one!!!
13:09 the formula for transitivity has an extra equal sign i think. Anyway, great explaination !
Can anyone send this to Riot Games
Ong
Good one
Did you miss the part where this is a 2 player 1v1 model?
I don't know if you meant league or valorant but either way, originally mmr was a lightly modified version of elo early in league's life. We don't know what or how they tweaked their parameters at this point. And it's not because it's a team game that elo doesn't work, it's because the teams vary. So you need 2 functions to distribute points, 1 that scores you on your teammates and another that takes the generated team score as a normal elo system. At the very least. Then at game end elo distributes and team system give you your share.
Hello, this is Gragas from the Rito support team. We've taken a look at this video, and we still dont get it. We just randomly take away and give mmr and watch our players smash their keyboards for the lols.
We are still very grateful for your feedback and will do nothing about this and focus on selling you gacha skins.
-Gragas, Rito Support Team
This is an incredibly high-quality explanation, I look forward to what you do next
Wanted to say again, I'm very impressed with this video. I really wish I was making content like this. Keep up the good work!
Sorry I hung in as long as I could, but you video is SOOO OVER MY HEAD...good luck, I'm sure the people who get it love it
This was an incredible video to me. I did not expect math to this detail, but I'm very glad it was this detailed, because now I can actually say I understand how Elo works, which is what this video promisses to explain
omg i watch a lot of explaining videos, and you are AMAZING!!! there are so many people with good stories to tell, although maybe only a few best stories per person, but those are are so powerful. your 1st vid (?) too so i am curious and gratitude for your work, hope to see you again soon!
Gonna go big for sure...keep doing it
THIS IS YOUR FIRST VIDEO??? I NEED MORE OF THESE!!
honestly love chill videos like this that mix in some cool little graphics to explain themselves better. its literally the perfect way for me to absorb information :3
The meticulous historical background at the end earned you a like from me. Well done sir
Great video
Your explanations are very clear and the structure of the video is organized very well. The visualization is done in the same style. And this style is very nice. Great job, thank you
Bro 1st video is so good, this is unbelievable
Wait, this isn't a video about Electric Light Orchestra...
Really good video! would love to see something similar on the Glicko system :D
really excited for more videos from this channel
A video you'd expect from an already established channel.. Bravo
This is actually so well made
Really excellent stuff! Subscribed and excited for anything you may make next!
This video was so good! Subscribed. I hope you tackle more related topics!
My favorite ELO record is Out of the Blue. Not just great music, but also one of my favorite album covers.
before watching this video i can tell purely based on title and thumbnail that i will absolutely love it
I can't believe this is your first video, congrats!
I wonder if someone has ever played with the idea of a "normally distributed Elo system" but with variable variances for each player instead of a constant one. To me that seems like the most natural way of representing player strength (since two players could have the same average "strength" but one be much more inconsistent than the other)
Edit: looks like the Glicko system is almost exactly what I was describing
The effort put into this video amazes me, cant wait for a new video 🙏
yep this video will the most viewed in this channel, i'm sure of it because of how accurate, simple, and informative
Great work! And looks like the algorithm is supporting you too
I always wondered exactly how the Elo system worked. Thanks!
Great video and masterfully animated! Love it!
I didn't know Elo's model was different from what we know as Elo system today (so I guess the habit of some people to capitalize letter in it - ELO - is somewhat justified?).
Would love to see a similar analysis on newr systems, like Glicko/Glicko-2
I'd be interested in a similar explanation for Glicko's various incarnations and a comparison between them and the different models at play here. This was a fun video to watch.
cannot imagine it is the first video from a channel without reading the comment section. you earned one more sub from me
Great video, I’m really looking forward to see your upcoming videos, keep it up 👍
I clicked on this video because I saw English title, didn't knew bro speaks Mathematican.
20:30
Actually I learned in a probability class that logistic probabilities can be derived using a normal distribution.
Here's how:
Suppose you are sending a signal
+1 indicates a win and -1 indicates a losses.
but the signal is noisy so has a standard deviation of sigma.
Then given the measured signal you can find the probability of a win or a loss to be logistic with a scaling factor related to sigma and the difference between -1 (loss) and 1 (win).
This can kinda of help us understand the underlying mechanism of chess if we let 0 also represent a draw then the reason GMs have more draws is because their sigma is smaller than low rated players.
However this model is kind of complicated when you allow players to have different sigma.
I've always been intrigued by how probabilities 0 to 1 are so much more complicated that normal variables that are natural and have never found a satisfying way to combine the two since there always is something that isn't natural.
Elo/log odds comes close but the issue is that you can't find a formula for performance rating based on game results and varying the K-factor based on (inverse of) games played is somewhat janky.
Glicko fixes some of these issues into a unified model but creates the issue that there's no precise formula, it's just an iterative approximation that can be arbitrarily close.
One alternative I've considered and found more mathematically "beautiful" is thinking player (or Team in my use case of sports leagues like NBA/NFL) skill is fundamentally unknown but can be narrowed down based on their results. They lie on a 0 to 1 scale in my model where 0 is the 0th percentile and can and 1 is the 100th percentile. So 0.5 is the median player performance.
The model I layed out works on a framework that a win or loss is deterministic not probabistic so if A beats B we know for certain that A was better than B at least for that 1 game they played.
Then to account for player talent changing over time you need to take all these snapshots in time and line them up to find the natural variation in player talent and the actual true change in skill.
Anyways this is an area I've been obsessed with forever and it's always unsatisfying how much simpler the computations are when we're not working on a binary win-loss result system but rather a linear one on the scale of the ratings. For instance the way performance ratings work is a simple average of each game performance where a win is opponent + 400 and loss is opponent - 400. This is just a rough approximation of Elo that has the benefit of being easier to compute but the downfall of not working at all when the rating gap is larger than 400
Talent doesn’t generally change over time . But skill levels definitely can .
Yo dude what the heck, this is your first video and it is such high quality
Starting to follow from day one before you become famous🙌
As good as 3B1B but it cites and explores the papers behind it?
HOLY SHIT S TIER ❤️
6:04 yoo I didnt know Zermelo was also into rating systems... as if it was not enough to create the whole axyomatic basis of modern mathematics lol
I LOVE YOU SO MUCH YOURE SO FUNNY AND YOUR EXPLANATIONS MAKE SO MUCH SENE
Very accessible and well produced video
The transitivity assumption is interesting and I believe it is violated for real chess players. Players like Hikaru sometimes "farm Elo" by playing against lower rated players. If transitivity was correct farming shouldn't be possible, but in reality the actual probablility of a strong player beating weaker players is apparently higher than predicted by Elo.
Furthermore almost all matches are between players of the same/similar rating. This gives very little data to keep the ratings consistent across large ranges of Elo. This can lead to being stuck in "Elo-hell" where a bunch of low rated players keep playing amongst each other while getting stronger, but only trading points amongst each other so their rating does not improve with their actual strength. In general there can be different rating inflation/deflation in different sectors of the rating range.
If matchmaking was worse (i.e. pairing players of different skill more often), then the Elo system could potentially be more accurate.
20:00 it can be equivalent if you use a logistic distribution instead of normal since it's very similar, just has slightly fatter tails indicating upsets aren't as impossible as the normal model indicates
it can be equivalent if you use an extreme value distribution for both elo, because if X ~ EVD and Y ~ EVD then Y-X ~ Logistic, and we need Y-X to be a logistic curve, and not X or Y.
Appreciate the longer writeup and addendum at the link. Great, informative video!
The graphics are awesome. I love the silly pawn dudes.
Well done! Very clear and enjoyable explanations
This guy has captions!
This is the best video on this topic, thank you
Really good stuff, well presented
Great video, it really showed the fundamentals of probability.
If im gonna be honest, I didn't know what odds were since the term is so saturated in gambling lol.
Did you use manim for this btw?
Yup, manim and Flash. I also never really knew what odds were before doing this project! That's why I felt the need to include a discussion about them.
@@j3m-math Did you voice this yourself?
@@shrekeyes2410 Yup
If one where to simulate strengths from the historical Elo and use the algorithm of the modern Elo, what would the results be? In other words it would be interesting to investigate how robust the Elo system is to different models of the true strength data generating process.
What an amazing video! clear visual style aids the explanations perfectly. I liked it a lot, and understood everything.
I am only left with a question why didn't the original Elo rating based on the normal probability of performance ever catch up? is the assumption not correct? does it converge more slowly? I guess I'm gonna research those question son my own.
Yeah! I was left humgry for a part 2 covering the gaussian-based model further!
An excellent question. While I don't know what the complete answer is, I can say this: the integral of the Gaussian distribution at 19:38 (i.e. the cumulative distribution function of the Gaussian distribution) is what's called an Error function, denoted erf(x). The Error function belongs to a class of functions known as sigmoids, named as such due to their resemblance to the letter "s". The Logistic function is also a sigmoid, so already roughly the same shape. But in fact, the derivative of the Logistic function, somewhat confusingly called the Logistic distribution, is actually very similar to the Gaussian distribution - with the only qualitative difference being heavier tails.
So in short, the Logistic function model should yield very similar results to a model based off the Gaussian distribution, with discrepancies most noticeable when the difference in player ratings is large.
Unlike the logistic, the integral of the Gaussian does not have an explicit formula that's easy to evaluate. That's not an issue now that we have computers in our pockets, but it certainly was back in the 1970s. This might explain why people preferred the logistic over the error function.
Probably easier to compute?
Man this is the best video I have seen on the Elo system. I love it. Do you have an explanation on how physicists may have come up with this model from their physics knowledge? It looks like the Fermi function in Fermi-Dirac statistics?
I didn't dive too deep into the history, but I think basically Elo used the Thurstone model originally, then they swapped in a logistic for a normal distribution later. To be clear (I've seen people get confused about this, including me): this does NOT mean that the Zermelo/Bradley-Terry model is equivalent to both players generating numbers from a logistic distribution, and then the bigger number wins. It's just that the Thurstone and BT models are both of the form p = f(R1 - R2), where f is an increasing function from R into [0, 1]. For the Thurstone, f is the CDF of a Gaussian, and for the BT it's a logistic function. I think they basically just swapped in a different f that gave better results.
As for what motivated this specific choice of f... I don't know. In this video I motivate it the way Zermelo does in his paper, using this notion of "strengths", but I'm sure there are different angles that could be motivated by statistical physics as you suggest. I suppose you could start by looking at sections 8.3 and 8.4 of Elo's book (Elo 1978). In those sections he also cites (Elo 1966) and (Berkson 1929, 1944). I haven't looked into those.
References:
- Elo 1978, The Rating of Chess Players Past and Present
- Elo 1966, Use of the Standard Sigmoid and Logistic Curves in Pairwise Comparisons (sounds very relevant!)
- Berkson 1929, Application of the Logistic Function to Experimental Data
- Berksen 1944, Application of the Logistic Function to Bioassay
@@j3m-math It's true that the Zermelo model is not equivalent to both players generating numbers from a logistic distribution with the bigger number winning, but there's something almost as good. The issue is that the difference of two iid logistic random variables is not logistically distributed, but if we could find a (family of) distribution(s) (say D) such that when X ~ D(a), Y ~ D(b) independent, then X - Y ~ Logistic(a-b,1), we'd be all set. The Zermelo model would then be equivalent to player 1 generating X, player 2 generating Y, and the largest number wins. It turns out there is a distribution that works, called the Gumbel distribution.
You can go the other way too. For example, Elo's original model has a nice transitivity property; knowing p_ij and p_jk uniquely determines p_ik.
lowkey goated video remember me when u hit 10k
I used to captain a rec-league tennis team, and there are so many rating systems for tennis players, but I created my own plain old straight-up chess-style Elo scores for every player in the league, and it predicted match results with 80+% accuracy. All the other scoring systems I could find were much worse at predicting outcomes. Even though they all claim to be some derivative of the Elo system. Which I have to question if they really are, or if so, that they modified it beyond all recognition. Because the straight old-fashioned chess-style Elo score worked brilliantly for me.
Outside of the first year when I was still building my team, every season after that first year we made the play-offs, and almost every season we won the division. And we weren't always necessarily the best team, but I knew who to put where based on the Elo scores I saw in my spreadsheet, and that let me game the lines just right to squeak out wins against arguably better teams. Never did manage to win a city championship, but we made it to the finals more than once, but at that point you're playing teams that are loaded with first-season self-rated players who clearly lied on their self-rating questionnaire in order to play down a level or two just so they could create a team specifically to win the city, i.e. teams that cheated. All my Elo scores tell me at that point is that the team we're facing is a cheating team and that we really don't have a chance. Of course I can't tell my team that, but I can see it plain as day in the numbers. And unfortunately, as obvious as it may be to me, and to anybody who actually watches the matches in person, the league administration doesn't watch the individual matches in person and is completely head-in-the-sand about the problem of people cheating on the self-rating questionnaire. They refuse to even entertain such a thing is a possibility, even though every single captain on the courts, and most of the players, know it full well. Ultimately that's what lead me to give up on rec-league tennis. Too much cheating, no administration willingness to acknowledge it. Ok then, I'll go find something more fun to with my time and money.
thanks for the great video! it seems like your target audience is math nerds that like chess. im here for it.
Initial ELO ratings in the US Chess Federation are 1000, not 1500.
My mind blown by these QEDs
This video is how I wish I could have learnt the content from my Maths undergrad lectures!
This is what school should be.
I learned how to calculate probability while not focusing on calculating probability.
Also the difference between probability and odds WHILE WERE NOT EVEN TALKING ABT EM
Wow! Last week, I made a video on this precise topic. This one is better than mine.
Great explanations overall! I just had a couple small gripes.
The first is more of a perspective shift than anything (I'm an engineer, not a mathematician), but in your simulation where each player has a "true rating" you mention an offset C, which should equal the average true rating. I found the description you gave to be pretty interesting, since it seemed to be almost entirely motivated by the mathematics, but to me the more useful interpretation is that universal ratings don't exist, and each pool of players can only be compared against players in that same pool, as the rating estimates by definition calibrate to a previously chosen center point, regardless of the true ratings in that pool.
This raises some interesting follow-up questions as well, like what happens when players are matched to prioritize close ratings, as that will likely create some kind of locality within the same pool as well.
The second gripe is the assertion that historical elo and modern elo are totally different systems. The reasoning you give for this is that historical elo assumes outcomes to be normally distributed, and modern elo assumes they're logistically distributed. While this is strictly true, the difference between the two is more of an implementation detail rather than some fundamental difference as it's possible to get very similar results with both. Perhaps this pedantry is warranted, especially for anyone looking deeper into it, but some mention of their similarity would have been nice.
Indeed, the Thurstone and B-T models are very numerically similar. I felt it was important to be very explicit about the difference because I've seen confusion about this online - I considered talking about how close they are, but to be honest, I just left it out to keep that section a bit shorter. I think I mention it in the pinned comment, but I'll add it in if not.
Your first observation is very valid. Of course, in our case the true ratings do literally exist in at least the sense that exist as variables in the code.
@j3m-math Note that I'm not saying true ratings don't exist (one could argue for or against them) but rather that elo scores are not "universal", as depicted in your graphs where C can vary when the average true scores are different. This is to say that someone at 1800 elo in one pool is not comparable to someone at 1800 elo in another, as their true skill may be different.
Well, I can tell this channel is gonna be at 100k subs before the end of 2025
“'elo' is short for 'zermelo'”
great video! I always wondered how to derive elo system. btw, why did elo assume the variance stays the same? what if someone is more consistent in their play, while other players are more chaotic and have 'good' and 'bad' days? if we relax this assumption, do we get a more accurate model?
I don't know about Elo specifically, but statistical models are often chosen because they are simple to work with and understand. Also, this was at a time before the PC was invented, so a procedure that players and organizers could easily do and verify by hand was important. Newer models by Mark Glickman (Glicko and Glicko-2) account for the fact that we are less confident about a player's rating when they are new or play infrequently and that player consistency can vary. These are actually the models that the major chess sites use behind the scenes although players tend to still refer to them as their Elo ratings. One of the major benefits is that new players' ratings are adjusted much more quickly. So, even if a player starts at 1500, they can drop to 500 or shoot up to 2500 in just a handful of games. Their ratings will change by much smaller amounts once the system has a stronger estimate. Although, official systems also handle new players by using special procedures, but Glicko is a more theoretically rigorous way to handle it.
Truly a masterpiece video
I don’t care if this channel is an AI bot, I agree with all the other bots in the comments who wish this channel luck.
It’s definitely not a bot because it actually posts good content
@@Hnxzxvr It actually only posted one thing.
It’d be awesome if you talked about Glicko2 one day… the formulas are very interesting but I’d be amazed to see the whys. This video was awesome btw!
But how does the Thurstone model work in practice?
I understand that it is mostly impossible to see which model is the most correct, but how do they differ and what are the strengths of both models?
Great video and I hope you upload more!
According to Wikipedia, Elo is the guy who outlined the properties of what would be a good chess skill rating system. Thurstone was what he had in mind when constructing the outline of the properties, but the downside is the math is hard. So he proposed to use the simplified model that is known as the modern Elo. Regardless, both models are simplifications of the reality. As wiki says: "chess performance is almost certainly not distributed as a normal distribution" and "In paired comparison data, there is often very little practical difference in whether it is assumed that the differences in players' strengths are normally or logistically distributed". So basically we are using a simplified model of a simplified model, but it is decent and very simple so we like it.
>cutesy pawn-shaped stick figures
>D50 music
yeah
Subscribed, just make more videos, don't rest or sleep!
17:07 it is very interesting how those edges curve, this seems to indicate that low rated players are slightly overrated while high rated players are slightly underrated.
really good explanation of elo thanks
Wow, great video.
I would live to see a video from you on Swiss score.
If im understanding correctly, since total rating points are maintained, the offset C between the elo rating and actual rating is easily computed as the starting elo minus the mean of the actual ratings.
Ie, C = 1500 - mean(sum of actual ratings)
Hay, great video. Really enjoyed watching it. I was wondering about some other use cases of elo rating. Like in puzzle chess games where I would presume each puzzle gets a rating of its own and win or lose rating according to result of player getting the puzzle right or not. Or the rating system in the context of competitive programming, like in codeforces. It would be fascinating to see a video going deeper on less known use cases of these systems. thanks again for the video, great job.
Just here to boost the algorithm. Well done.
Nice video. Reminds me of my undergrad days
Bro is gonna become absolutely successful
packaged maths, statistics and chess in one video.
I would have appreciated a simulation where you run the historical Elo (Thurstone) algorithm on the Zermelo model, or run the Zermelo algorithm on the Elo model.
Great video. Very educational 🙂
Very good Video, nice Animations and Understandable explanation for everyone.
The only thing that's a bit bothering in my opinion is that your Voice sounds to "Sharp".Maybe it's because suboptimal Mastering of the Audio or sth. because your voice tone and expression itself is very good for reading and explaining
criminally underrated
Love this! Subscribed!
Interestingly, on the gaming website I use a lot not only doesn't let your ELO go below 0, but once you get to 100 ELO for a game, you can never go below that, even if you then repeatedly lose at that game. I assume this is because this website has an arena mode and it requires 100 ELO to play a given game in arena mode, and someone probably decided it wouldn't be fair to take away the arena mode privilege once it was already earned.
I like discussion of metalogic sooooo much more when I'm not the one writing the proof lol