Build Your First Machine Learning Project [Full Beginner Walkthrough]
HTML-код
- Опубликовано: 22 июл 2024
- We'll learn how to build an end-to-end machine learning project. We'll cover the main steps in building a machine learning project, then walk you through writing the Python code to create the project.
In the project, we'll try to predict how many medals each country will win in the olympics using a linear regression model.
At the end, you'll have a full machine learning project that you can continue working on.
You can find the README and code here - github.com/dataquestio/projec... .
Chapters
00:00 Introduction
00:40 7-step project process
10:15 Loading the data
12:10 Data exploration
18:05 Building our model
22:30 Measuring error
26:30 Is the model good?
34:20 Wrap-up and next steps
---------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef
A very comprehensive and well explained intro into the workings of the project. I got a lot out of it. Thank you.
love the simplicity of your step by step method. I am absorbing a lot in just one pass. Thank you and well done.
I was just studying the concepts for so long and getting overwhelmed, this video definitely helped to get the bigger picture.
Great video.Really liked the way you explained it before ,instead diving into the code.Thanks
Thank you for providing such a great resource and making ML so digestible! YOU are who introduced me to machine learning, and I love it. I'm looking forward to applying everything I learn to my own projects!!!
First person i saw who is explaing just perfectly and can be understand by a student thanks ane keep it up sir.
@ 12:14
teams.corr()["medals"]
didn't work for me so I did
corr = teams.drop(["team", "country"], axis=1).corr()["medals"]
print(corr)
for those who are also running into the same issues as me :)
It worked thanks
or u can use teams.corr(numeric_only="true")["medals"]
@@user-kj6vz1qo4h thank you!
@@user-kj6vz1qo4h Thanks!
Thanks🙏🙇
Perfect. Best video I’ve found precious and easy to understand so far. Thank you
You are a very good teacher and deserve more subs.
excellent video sir ji.... thanks a lot for such concepts... and your English fluency is amazing Indian
Thanks a lot , it helps to understand ML with basic steps
Amazing . this was simple and great . very very very well done !!!!
Thanks! This video is exactly what I needed 😀
Loved it! superb explanation 😍
Your courses are awesome!!!
thank you so much!!! you are a really good teacher
A great video!
This answered some of my questions. Thanks
Glad it was helpful!
Incredible Teaching !
I like how you showed to use the later data to test the model, but do you have a video that shows how to use the data to predict the future Olympics?
This video realy shows how things are done.
Thank you for the video.
Albania was in the olympics 1992 and i guess any other countries in that csv were also. They just did not win any medals, that's why there are missing values. So actually setting them to zero instead of dropping them is more accurate. In theory you would prefer first or second party data, in this case u would have to do some research to clarify the reson for missing values in the data set.
hi... how to sort the excel data into integer values?
Is there a reason why the Plots disappear after running the code a second time on Jupyter notebook? They don't show anything anymore.
Thank you for the amazing video! However, when I tried running this, I received a value error
teams.corr()["medals"]
This seems to be because the "Team" and "Country" column are in string, and hence making it impossible to get a corr value. So i removed them just to obtain the corr values. But it seems to work for you without filtering the string type columns out. Any ideas why?
Thank you.
guys why after test['predictions']=predictions , size of array disturbing instead of 405*8 its coming 405*413 can anyone help me out with it
Please guys help me on the first step got stuck cannot import csv kinda problem with pandas
Would love a math course that is shown directly relating to ML that I can take to get up to speed. for someone that might be self taught in tech w/ only a highschool education
From where I got data?
Thank you for such a nice video! I have a question though about the error_ratio. You said countries like FRA, CAN, and RUS get a lot of medals in the olympics and it shown that their error ratio is low.
With what should I compare the value of error_ratio?
what a good question! I hope he responds back to you.
why do you want to compare it?
sir that was really simple and very well explained also excellently organised...... yet I struggled at one point I couldn't convert string(teams) to float while performing the corelation....if you see this hope you reply .....
did you found out the solution
@@SiddheshRajaledf.corr(numeric_only=True)
Great video! What coding software did you end up using for this (I haven't seen this python software before which is why I ask)?
That's Jupyter
My like turned this to 2K 😊
best
I love your English
Your English is so perfect as indian
Hi , I really loved your video. I was trying to follow along, but got an error and cant move forward. I would love it if you could help me fix it. i got an error for the predictions = reg.predict(test[predictors]). It kept saying ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- age
- country
- medals
- team
- year
what do i do?
Hello,
Did you get this resolved yet? Having the same issue now.
It will be a good pratcise to use x_test,y_test,x_train,y_train instead of predictors, target,
and it wil also be a good practise to use x , y as independent and dependent variable instead of test , and so on
it gives an error when i run correlation step complaining on data type of team, how can handle ?
I have this too, were you able to fix it?
@@Mynamegeoph teams[teams.columns[2:]].corr()["medals"] use this
@@lalithsai5392 Thanks lalithsai5392, would have been stuck without you!
I bet this is an issue with doing it locally and not using a Jupyter Notebook because I had this problem as well.
The best way around this is:
teams.corr(numeric_only=True)["medals"]
That will only generate value against numeric fields.
code community!! @@lalithsai5392
why seaborn but not matlib>?
you can use whatever you like, it's all about experimenting ;)
Great video. What python interpreter are you using?
Maybe you meant IDE (integrated development environment)? Python only has one interpreter, it's builit-in and it compiles/interpretes the code. I'm pretty sure the IDE he is using in the video is Project Jupyter (interactive development environment) which is pretty much a standard environment in machine learning, data analytics, statistical analysis etc.
Sorry, of course I meant IDE@@eduardtoronto What differs from mine (PyCharm) is that the code gets executed immediately and the result are showd. I have to use the print command for that. Or is the video just edited?
@@baeche In jupyter ENTER inserts a new line, SHIFT+ENTER executes the code. Everything gets executed immediately. It depends on the functions he's using e.g. copy() gets executed but it wont print any output whereas something like 'shape' will output the result to the console like print.
Thank you very much@@eduardtoronto I moved to google colab where the last command gets printed too. I find google colab handy as I can work in the browser. Where I experience problems is accessing a SQL Server (not SQ Lite, mysql). Any idea where I can look for help? ChatGPT could not.
25:45
Telugu lo chey bro
Great video. Thanks for sharing your knowledge and expertise. I ran into an issue in the "corr()" step.
teams.corr()["medals"]
ValueError: could not convert string to float: 'AFG'. May be I can remove this column before doing the corr() call.
i had the same issue. what did you do to solve it?
teams.drop(["country", "team"], axis=1).corr()["medals"]
this code can work
add this to corr(numeric_only=True)["medals"]
@@hongyangtan9897 thank uu
There seems to be a problem when I run 'teams.corr()["medals"]'. Keeps throwing an error "ValueError: could not convert string to float: 'AFG''. Checked unique values and NaN. confused!
I faced the same problem as well, but managed to solve it. The error is due to some columns in teams that are nonnumerical like team and country, so i created a new table, ie teams = teams.drop(columns = [‘team’, ‘country’]) and it should work. Hope this helps.
teams=pd.read_csv("teams.csv")
This line giving me a huge error
how to correct it or what i had wrong
go to the document u download and take it's link and put it Instead teams.csv
I like how you showed to use the later data to test the model, but do you have a video that shows how to use the data to predict the future Olympics?