This was dope man. I was expecting to look at the view counter and see it in the thousands, same with the sub count - undiscovered high quality content! Any plans to do any more stuff similar to this? Would love to see more ML stuff around sports data.
My man I really appreciate that Tom it means a lot. As far as posting more right now I am actually just too busy because I just got a job which I am starting soon so I been getting ready for that!
Hey! This is incredible! I am currently working through the code and trying to predict playoff games using this method! Will let you know how it turns out. Kudos on the impressive work
@@OvixioTams data based off of previous season game logs and can be found at the website mentioned in the video! Best of luck! I’ll be attempting again once we get 3 weeks into the new season.
@@SidY.-hj3eq understood I have the data and spreadsheets but I need to merge the last few columns, week, opponent score and home team won true/false. Can I email you? I worked on trying to do the data cleanup with no success
Thank you for taking the time to create this. I have a question regarding the data being used to train the model. It appears you are using the averages from the full season and training the data on games in the past? Wouldn't that be a data leakage? It seems like this concept would be better applied using rolling averages of the games that occurred just prior to each game in the training data.
That makes a lot of sense the way you are suggesting. The approach I used isn’t perfect, it doesn’t account for a lot of things, namely player injuries and things of the sort, but I think it’s a good starting point for those that just want to try a project like this out and use machine learning / AI to predict things in the sports world
Hey! I was just looking into how to clean the data up, and it seems that the steps within are very inconsistent, from variable names to outright missing methods. One main missing method is "match_games_with_scores_updated" which I assume matches the games in both csv files such that they don't overlap each other. That's the problem I'm having now, hope you can get back to me!
Hey Samuil, thanks for watching. So yeah that process was a bit complicated and that section is the only one I didn’t document fully that’s why I tried to give you the cleaned data beforehand in the links so you didn’t have to do it. If I can find my original code I’ll update the doc to include it for you
This is fantastic. I've been trying to do what you accomplished, but I've been working in Access and Excel. Your way is MUCH better, and I'm digging into the code to better understand the details of your approach, so I can port my existing work (if/as possible) to Python--which I'm only basically familiar with. One thing that I've done in my own Access modeling is to weight the factors I'm using. For some work, I use a 10 year history as-is. But for other work I use two years of history plus the current season. In those cases, I assign a weight of 1 to the furthest season back (2022 currently), a weight of 2 for last season (2023) and a weight of 3 for the current season. I'm attempting to give more weight to more recent data. Does weighting make sense in your code? I'm not familiar enough yet with all the functions, so I don't know if/how weighting would impact results. If so, what should I learn to make it happen? But thanks for what you've provided. It's awesome!
Hi Richard, thanks for watching man. That is very interesting what you are doing with Access and Excel. So in regards to the weighting I think that is a great idea but let me explain a bit how the code works currently. So the Logistic Regression model that I used in the code, the way it works is that you provide the two sets of data ( training data and testing data) and it will automatically adjust how heavily it weighs each factor (yards per play, etc) to try and most accurately predict the test data. Now back to your approach. I think your weighting can add to the accuracy potentially, however I personally am unsure where your weight would fit into this code as it as currently. Great idea though let me know if you end up doing that
@@christianandtech Thank you for taking the time to reply. I have one more question I hope you can answer regarding 2024. I've updated the 2023 play-by-play data through year end of last season. I opted to do it outside of the project (in Excel) as I'm much more familiar with that approach for the time being. But I want to now start including 2024 data. Rather than tweak the code to include a THIRD season, can I just merge the 2022 and 2023 data into a single file (retaining it as 2022 play-by-play until I get around to changing the file names throughout the script) and begin adding 2024 play-by-play into the 2023 file (also until I get around to modifying the file names in the code). The 2022 and 2023 files are eventually merged anyway, so I thought that would be acceptable. Will the code still function as you describe in your reply above? That is, does it consider DATES when determining weighting? On the surface, I believe this will be OK, but being new to Python and machine learning, I don't want to adopt a wrong approach. Thanks again.
This was dope man. I was expecting to look at the view counter and see it in the thousands, same with the sub count - undiscovered high quality content! Any plans to do any more stuff similar to this? Would love to see more ML stuff around sports data.
My man I really appreciate that Tom it means a lot. As far as posting more right now I am actually just too busy because I just got a job which I am starting soon so I been getting ready for that!
Hey! This is incredible! I am currently working through the code and trying to predict playoff games using this method! Will let you know how it turns out. Kudos on the impressive work
Awesome to hear, good luck 👍🏽
@SidY.-hj3eq can you share you latest spreadsheets? I am working towards next season
@@OvixioTams data based off of previous season game logs and can be found at the website mentioned in the video! Best of luck! I’ll be attempting again once we get 3 weeks into the new season.
@@SidY.-hj3eq understood I have the data and spreadsheets but I need to merge the last few columns, week, opponent score and home team won true/false. Can I email you? I worked on trying to do the data cleanup with no success
Thank you for taking the time to create this. I have a question regarding the data being used to train the model. It appears you are using the averages from the full season and training the data on games in the past? Wouldn't that be a data leakage? It seems like this concept would be better applied using rolling averages of the games that occurred just prior to each game in the training data.
That makes a lot of sense the way you are suggesting. The approach I used isn’t perfect, it doesn’t account for a lot of things, namely player injuries and things of the sort, but I think it’s a good starting point for those that just want to try a project like this out and use machine learning / AI to predict things in the sports world
Hey! I was just looking into how to clean the data up, and it seems that the steps within are very inconsistent, from variable names to outright missing methods. One main missing method is "match_games_with_scores_updated" which I assume matches the games in both csv files such that they don't overlap each other. That's the problem I'm having now, hope you can get back to me!
Hey Samuil, thanks for watching. So yeah that process was a bit complicated and that section is the only one I didn’t document fully that’s why I tried to give you the cleaned data beforehand in the links so you didn’t have to do it. If I can find my original code I’ll update the doc to include it for you
@@christianandtech That would be awesome yeah! Do you know where you got the game schedule from as well?
This is fantastic. I've been trying to do what you accomplished, but I've been working in Access and Excel. Your way is MUCH better, and I'm digging into the code to better understand the details of your approach, so I can port my existing work (if/as possible) to Python--which I'm only basically familiar with.
One thing that I've done in my own Access modeling is to weight the factors I'm using. For some work, I use a 10 year history as-is. But for other work I use two years of history plus the current season. In those cases, I assign a weight of 1 to the furthest season back (2022 currently), a weight of 2 for last season (2023) and a weight of 3 for the current season. I'm attempting to give more weight to more recent data.
Does weighting make sense in your code? I'm not familiar enough yet with all the functions, so I don't know if/how weighting would impact results. If so, what should I learn to make it happen?
But thanks for what you've provided. It's awesome!
Hi Richard, thanks for watching man. That is very interesting what you are doing with Access and Excel. So in regards to the weighting I think that is a great idea but let me explain a bit how the code works currently. So the Logistic Regression model that I used in the code, the way it works is that you provide the two sets of data ( training data and testing data) and it will automatically adjust how heavily it weighs each factor (yards per play, etc) to try and most accurately predict the test data. Now back to your approach. I think your weighting can add to the accuracy potentially, however I personally am unsure where your weight would fit into this code as it as currently. Great idea though let me know if you end up doing that
@@christianandtech Thank you for taking the time to reply. I have one more question I hope you can answer regarding 2024. I've updated the 2023 play-by-play data through year end of last season. I opted to do it outside of the project (in Excel) as I'm much more familiar with that approach for the time being.
But I want to now start including 2024 data. Rather than tweak the code to include a THIRD season, can I just merge the 2022 and 2023 data into a single file (retaining it as 2022 play-by-play until I get around to changing the file names throughout the script) and begin adding 2024 play-by-play into the 2023 file (also until I get around to modifying the file names in the code). The 2022 and 2023 files are eventually merged anyway, so I thought that would be acceptable.
Will the code still function as you describe in your reply above? That is, does it consider DATES when determining weighting? On the surface, I believe this will be OK, but being new to Python and machine learning, I don't want to adopt a wrong approach.
Thanks again.
Hey man I am trying to do this project but I don't see any attacked Google Colab book, please help me out man I am trying to follow along
Thanks for watching. Updated the video description you should see the link now
loved your video, where can i get data for the 2024 season?
Hey Jose thanks for watching man. Let me get back to you on that it’s been awhile I gotta find the site again
@@christianandtech send the link of the site, but what I need are box scores of the players if you can get them I’ll highly appreciate it man
The link isnt working. I was mainly looking for your data.
It should be working now. Let me know if you still have an issue. colab.research.google.com/drive/1_myoj4ecB1GuRVRBz-vTrQ-SXg7Ld2qm?usp=sharing