Thank you for this video. I used your model along with ELO and the implied probability from the opening line to handicap UFC 298. Your model worked pretty well. Your model was 10/12, ELO was 10/12 and the opening line was 11/12. I thought results of your model might be improved by using the most recent fights but that wasn't the case. Using fight data from the last two years the model correctly predicted 9/12 and using the last three fights the model correctly predicted 10/12 again. Thank you again.
Over the long term, I had difficulty making this model profitable in UFC betting. The MMA judging rules changed in 2017 (1 Jan 2017) with the new unified MMA rules. The data set that you used includes 1993-2021. I suspect the results of the model would improve if you used only data from 2017 and later. Rerunning the regression, is on my list of things to do. When I get to it, I'll post the results here. Thanks again.
Haha thanks for the support lad 🙏 That was the aim when building this model, but I haven’t yet found a good website to pull this data all at once into Excel. 😞
@@excel_ladz Lad, I checked the Kaggle dataset files but I wasn't able to find the same exact stats that you've shown right at the beginning of the video. Did you change the name of the columns or something?
For sure 🔥 As long as the predictors are relevant (a good rule of thumb is having the initial p values for each predictor, after running regression, to be below 0.05) then you absolutely do that 👍 I suppose you could also manipulate events that could have multiple events into two: e.g. under/over 220 points 😃
Love the video, thank you! 1. For column C, how did you determine who was going to win? 2. If I want to add more variables to correlate, can I just add them, so long as there is a differential to the stat? (i.e. KO rate diff., SUB rate diff, Height Diff and reach Diff)? 3. How can I add a weighted variable to this, such as age?
Very impressive, I’ll watch a few more times. Question though, is the model basically making a prediction off of the fighter with better stats basically? Would be interesting to find a way to quantitate out-ring edges ; such as strength of gym, previous strength of schedule, and here lately new dads have been locking it down I’m curious if finding a way to quantify things like mentioned above would yield a tighter prediction Very cool stuff and super cool of you to upload all of it and let us learn with you!
Question - that Kaggle data set is only up to 2021, is there a more reliable data set available that we can use that will be maintained for future-proofing the model?
G’day lad, unfortunately I haven’t found one yet. Usually to compile an updating list there needs to be some sort of code to scrape data off of the internet (e.g. scraping results/data off of UFC stats).
Great video. But where did you get the statistics at the beginning like SLpM and Str Acc. The data set on Kaggle does not contain those numbers. Did you somehow extract it from somewhere else
Hi lad, thanks for watching 🔥 I selected the latest 1,500 rows from the ‘data.csv’ dataset. All I grabbed from this dataset was the two fighters and the winner. I then grabbed the Red and Blue Fighter’s 8 different UFC Stats from the ‘raw_fighter_details.csv’, and added them into the table. I hope that helps lad 👍
This is great stuff. Thanks for sharing! Could you explain why zeroing out the stats affords the red fighter a few percentage points? Does the calculation of actual data (values > 0) remove this seeming bias? Additionally, when swapping the red fighters stats to the blue fighter (and vice versa), the percentages don't equate to the same values. This seems problematic. Is there an explanation for this, or a solution?
Hi lad, the square root function is to calculate a fighter's 'true expected striking accuracy'. For example, Fighter Red has a 60% accuracy and their opponent has a defence rate of 50%. The SQRT function determines that Fighter 1's accuracy decreases to 54.7%, as if Fighter Red lands 60%, Fighter Blue defends 50%, then a midpoint has to be found. This is done for both fighters, so each fighter has a 'true expected striking accuracy'. The difference is then subtracted, just as is done for the other stats, to see the advantage the Red Fighter has over the Blue Fighter 👍
Great videos, i was just wondering what type of math you applied in the nba prediction videos, i saw you use poisson for fb and logistic regression for ufc. Thanks
Hi lad, in the NBA Model I used the binomial distribution to simulate a player’s points. Specifically, the BINOM.INV function to simulate a player’s expected shots, their distribution of shots (ie the number of threes, twos and fts) and finally the number of shots made out of those attempted 🏀
If you had the data for when the fight ended such as the round and how it ended Sub, KO/TKO, or Dec could you use this same process to determine what round and by how the fight would be decided based on the previous data?
Also I really think taking stats before say 2010 is almost a misdirection. Judging, fight rules and basically everything was totally different in modern mma-so I wonder if it could skew the data
Hi lad, that's right if the average odds are below $1.39. However there's a good article by Pinnacle Sports saying that the bookie favourite in UFC Fights wins 66% of the time, so any model with an accuracy above this is considered good 👍
G'day lads, if you have a question let me know here 🔥
How did you get SLPM of every fighter it is not in the dataset provided
i can see slpm for individual fighters but not R = however many and B = however many
Is It possible for excel to say how accurate every prediction is
Thank you for this video. I used your model along with ELO and the implied probability from the opening line to handicap UFC 298. Your model worked pretty well. Your model was 10/12, ELO was 10/12 and the opening line was 11/12. I thought results of your model might be improved by using the most recent fights but that wasn't the case. Using fight data from the last two years the model correctly predicted 9/12 and using the last three fights the model correctly predicted 10/12 again. Thank you again.
It’s just so based on matchups but I reckon if you combine the model with your own eye test it could be very profitable
Over the long term, I had difficulty making this model profitable in UFC betting. The MMA judging rules changed in 2017 (1 Jan 2017) with the new unified MMA rules. The data set that you used includes 1993-2021. I suspect the results of the model would improve if you used only data from 2017 and later. Rerunning the regression, is on my list of things to do. When I get to it, I'll post the results here. Thanks again.
Excellent video as always. Thank you!
This is really good. Thank you.
love this video I always wanted something like this !!!!
Lad, updating the fighters stats automatically would be a nice improvement to the model (power query maybe). Anyway, you rock!
Haha thanks for the support lad 🙏 That was the aim when building this model, but I haven’t yet found a good website to pull this data all at once into Excel. 😞
@@excel_ladz Yeah, I just noticed that the Kaggle dataset contains stats from 1993 to 2021, so my previous comment doesn't make sense.
@@excel_ladz Lad, I checked the Kaggle dataset files but I wasn't able to find the same exact stats that you've shown right at the beginning of the video. Did you change the name of the columns or something?
Looks good again. Can you also make a system like this for darts that would be cool.
My best wishes for the new year ;)
That thumbnail 🥵
You're on fire! Great video! Can Logistic regression be used for other binary outcome sports, say basketball for example?
For sure 🔥 As long as the predictors are relevant (a good rule of thumb is having the initial p values for each predictor, after running regression, to be below 0.05) then you absolutely do that 👍 I suppose you could also manipulate events that could have multiple events into two: e.g. under/over 220 points 😃
@@excel_ladz Great, thanks Lad.
Love the video, thank you!
1. For column C, how did you determine who was going to win?
2. If I want to add more variables to correlate, can I just add them, so long as there is a differential to the stat? (i.e. KO rate diff., SUB rate diff, Height Diff and reach Diff)?
3. How can I add a weighted variable to this, such as age?
Maybe even something like wins and losses against what type of fighting style etc
I was wondering the same thing
Great movie! Could you please show us how to create a model for tennis matches?
Very impressive, I’ll watch a few more times.
Question though, is the model basically making a prediction off of the fighter with better stats basically?
Would be interesting to find a way to quantitate out-ring edges ; such as strength of gym, previous strength of schedule, and here lately new dads have been locking it down
I’m curious if finding a way to quantify things like mentioned above would yield a tighter prediction
Very cool stuff and super cool of you to upload all of it and let us learn with you!
Question - that Kaggle data set is only up to 2021, is there a more reliable data set available that we can use that will be maintained for future-proofing the model?
G’day lad, unfortunately I haven’t found one yet. Usually to compile an updating list there needs to be some sort of code to scrape data off of the internet (e.g. scraping results/data off of UFC stats).
Thanks for the vid. Can you explain why calcuating the win probability with the EXP() function?
Great video. But where did you get the statistics at the beginning like SLpM and Str Acc. The data set on Kaggle does not contain those numbers. Did you somehow extract it from somewhere else
Hi lad, thanks for watching 🔥 I selected the latest 1,500 rows from the ‘data.csv’ dataset. All I grabbed from this dataset was the two fighters and the winner. I then grabbed the Red and Blue Fighter’s 8 different UFC Stats from the ‘raw_fighter_details.csv’, and added them into the table. I hope that helps lad 👍
how did you know which one was red and which one was blue in that dataset they dont differentiate?@@excel_ladz
did you figure it out?
This is great stuff. Thanks for sharing! Could you explain why zeroing out the stats affords the red fighter a few percentage points? Does the calculation of actual data (values > 0) remove this seeming bias? Additionally, when swapping the red fighters stats to the blue fighter (and vice versa), the percentages don't equate to the same values. This seems problematic. Is there an explanation for this, or a solution?
Any reason (or references) for why you need to subtract square root functions when you're calculating differences in accuracy? (E.g. 4:39)
Hi lad, the square root function is to calculate a fighter's 'true expected striking accuracy'. For example, Fighter Red has a 60% accuracy and their opponent has a defence rate of 50%. The SQRT function determines that Fighter 1's accuracy decreases to 54.7%, as if Fighter Red lands 60%, Fighter Blue defends 50%, then a midpoint has to be found. This is done for both fighters, so each fighter has a 'true expected striking accuracy'. The difference is then subtracted, just as is done for the other stats, to see the advantage the Red Fighter has over the Blue Fighter 👍
Great videos, i was just wondering what type of math you applied in the nba prediction videos, i saw you use poisson for fb and logistic regression for ufc. Thanks
Hi lad, in the NBA Model I used the binomial distribution to simulate a player’s points. Specifically, the BINOM.INV function to simulate a player’s expected shots, their distribution of shots (ie the number of threes, twos and fts) and finally the number of shots made out of those attempted 🏀
@@excel_ladz Thanks lad
I wonder if you can do one for greyhound races, UK & AUS?
would this account for intangibles like power or speed or finishes before the 5th round or is it just stat for stat
If you had the data for when the fight ended such as the round and how it ended Sub, KO/TKO, or Dec could you use this same process to determine what round and by how the fight would be decided based on the previous data?
Does the dataset need to be updated with more recent fights as time goes on or is it just plug and play and time proof please ?
Also I really think taking stats before say 2010 is almost a misdirection. Judging, fight rules and basically everything was totally different in modern mma-so I wonder if it could skew the data
how did you copy all the data into excel quickly from the csv
Do you have this in googlesheets form? that way the data is live and auto updates?
For football and soccer please
72% doesn't mean anything if the odds are worse than -300...
Hi lad, that's right if the average odds are below $1.39. However there's a good article by Pinnacle Sports saying that the bookie favourite in UFC Fights wins 66% of the time, so any model with an accuracy above this is considered good 👍