Super Bowl Prediction by a Data Scientist

ritvikmath

Просмотров 3,2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 ноя 2024

Комментарии • 34

@ThierryAZALBERT 8 месяцев назад ⁺⁶
Unbelievable! Well done, you were spot on with your prediction of +3 for the Chiefs!
@dukewright2387 8 месяцев назад ⁺³
Talk about a great prediction!
@TavoLL1511 9 месяцев назад ⁺¹
You're one of the best data science channels on RUclips. Thanks for the video.
@Jackhangen 9 месяцев назад ⁺⁴
Cool applications of these models, I would be interested to see what would happen if you weighted each game by the ranking of the opposing team!
@xx__.. 9 месяцев назад
Thanks for creating. Interesting hearing about the thought process you take and your approach to the problem.
@bokehbeauty 8 месяцев назад ⁺¹
Wow, YOU won 🎉🥇
@mfurkanatac 8 месяцев назад
Glad to meet a Taylor fan data scientist!
@maxtalberg5233 8 месяцев назад
Awesome video!
Would love a video on Hamiltonian Monte Carlo and how this links to Metropolis-Hastings and Markov Chains.
Keep up the great content :)
@theprofessionalfence-sitter 9 месяцев назад ⁺⁶
One thing I think is missing with your approach to looking at the team's history is that you are not telling the model which of the teams they played against were good and which ones were not (winning against the best team and winning against the worst team obviously give you different information about how good your team is). I have had some reasonable success trying to predict the other football using two models (trained together): the first one was an LSTM that would be fed the the outcomes of the matches between ALL teams for the previous year and would produce status vectors for all of them and a second model that, given the status vectors of the two teams I care about and some other information about the match, would predict the final result.
@eyuelmelese944 8 месяцев назад
This sounds interesting! Have you tried treating the two things as different variables? e.g clustering the teams and using that as a variable of how good the are, and the sequence of the last games independently of the goodness of the teams? If so, did it yield good results?
@swim921 8 месяцев назад
Congratulations on your model being right! Amazing!
@benyuen3193 8 месяцев назад ⁺¹
Holy cow! What a prediction!!!
@taylorbosier 8 месяцев назад ⁺²
Wow your model was right!
@eyuelmelese944 8 месяцев назад
Love it! What about doing some feature engineering on the las 5 games of each team instead of using the RNN for that? And then fitting a much simpler model? That could help balance the size of data set and complexity of the model
@bonjourusman 8 месяцев назад ⁺¹
well done! would you mind explaining how you derived the baseline probability of winning for each example in the validation set?
@posthocprior 9 месяцев назад ⁺⁴
There's a lot that's wrong with this model. You would have benefitted by looking at the (published) literature of prediction models for American sports. Specifically, what's found to work best is either a decision tree or.a boosted decision tree. In any case, two points:
1) In selecting the last five games or the last ten games, you aren't following the optimization method to remove bias in your model. You have to implement a cross validation method. That is, you have to have a randomized, combinatorial approach that selects, for instance, the previous last game, the first game, the third game, etc. Bias in the model can only be removed if (almost) all random combinations are compared to each other.
2) In selecting your initial features for your logistic model, you didn't implement LASSO. That is, the first step in selecting your features is to test your bias that this is the correct number of features. A common method is to use LASSO to see how many are independent and therefore how many are predictive. (It also could be that you aren't using enough features, which would also be made evident using LASSO.)
@ritvikmath 9 месяцев назад ⁺¹
hey first off, thank you for taking the time two write up your comment.
So, I should have mentioned in passing other models I tried, but trying any tree based model (decision tree, random forest, gradient boosted decision tree) gave weaker performance on the validation set than the logistic regression.
I definitely agree with your point about needing more robust lag window selection. I basically tried 5 and 10 as a gut check but we should iterate over all combinations (as well as all the other hyperparameters available to us) if we were trying to get the best performance possible.
So I would agree with the feature selection point if our feature space was a lot bigger (in the dozens, hundreds, or more) since that might interfere with our model learning properly. In fact, more rich feature are probably the main limitation of this model. But with the 6 features we're using, I think there's an argument that the RNN would decide not to use those that are ineffective at solving the task. Of course, we need to look under the hood into the learned weights of the RNN to know for sure.
Again, really appreciate your insight into how to make this model even better, cheers!
@posthocprior 9 месяцев назад ⁺¹
@@ritvikmath Yes, you're right RNN's do work as an implicit form of LASSO; but this is why it should have been implemented when you were developing your logistic model. That is, because you've chosen to map many features to one, if a team won or lost, you've essentially lost the predictive ability of RNN's by not using sequential data that can be predictive. In other words, because you don't know if multi-linearity affects your features, you won't learn this using a RNN.
@mikecaetano 9 месяцев назад
Both teams in the Superbowl had to go undefeated in the playoffs to get to the game, so their recent win-lost ratio should be the same depending on whether they got the bye week off. Point spreads in playoff games might be more predictive.
@timz2917 8 месяцев назад
The issue with predicting the unpredictable is that the odds or price are always the best indicator
@tienlaihsia835 9 месяцев назад
Dear, can you make a series that specifically about implementing data science portfolio projects? I still have no idea that what kind of DS projects should I make to apply for DS job and how those projects really look like in details. 🙏
@GabeNicholson 9 месяцев назад
I think the next obvious improvement is taking into consideration injuries. If a team loses 3 key players over the last two games, their next game is not going to go as well. I'm sure injury reports are well documented but that is a feature that could simply be the number of starting player injured before game.
@bin4ry_d3struct0r 9 месяцев назад
I'm not into sports (at all), but I do fancy the idea of using data science to make money betting on sports.
Question 1: Is hometown advantage of such significance that it deserves to be featured so strongly in the model?
Question 2: Why stop at only 5 or 10 games? Why not track the performance of the two teams since the last Super Bowl?
@bokehbeauty 8 месяцев назад
Now you made me interested in the outcome. So far I was only interested in the female performer in the break 😂
@alexmash1353 9 месяцев назад ⁺¹
Interesting video. I wonder, do you use Bayesian inference or Bayesian neural networks in your work?
@ritvikmath 9 месяцев назад ⁺¹
hey I didn't happen to use those here but will look into them for next time! cheers!
@gordongoodwin6279 9 месяцев назад ⁺¹
@@ritvikmathhe wants to know about your actual work job I think
@mickeymaples4928 8 месяцев назад
Nice prediction but also lucky hahahaha! Nice job
@BabjiEManohar 8 месяцев назад
I wonder when I would see video on gen ai
@darkside935 8 месяцев назад ⁺¹
Can you share the code?
@MyMy-tv7fd 9 месяцев назад
nice
@ritvikmath 9 месяцев назад
Thanks 🙏
@gav6251 7 месяцев назад
Promo-SM 💋
@zombieshoe 8 месяцев назад
Wow I cannot believe that worked so well!

Следующие

Автовоспроизведение

Why You Shouldn't Trust Your ML Models (...too much)