Thanks for checking out this video. Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg If you want to watch a full course on Python Pandas check out Datacamp: datacamp.pxf.io/XYD7Qg Want to solve Python data interview questions: stratascratch.com/?via=ryan I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com *Both Datacamp and Stratascratch are affiliate links.
Hey Ryan, Greetings from India I shadowed you for 3 days and completed the project in bits but glad I finished the whole video. Loved the project and the way you taught it. (Just a suggestion - Please go by the agenda for the project, so that we can have an outline in our minds of the key things that we as data analysts need to extract from the data.)
@@RyanAndMattDataScience would love it, I am not sure whether you do it or not, as i just came across your video today, but do try posting (community post) some time before the videos, would not want to miss it. Appreciate for valuable input by you, really impressed by a tutor's ability to convey after more than a decade !!
Hey, Ryan, i am getting this error when combining all the filters together. Could you please guide how to sort this? MemoryError: Unable to allocate 75.9 TiB for an array with shape (7461195, 1398540) and data type float64 I have a 8th gen cpu (i5 - 8350U), 24 Gb RAM, 500 GB SSD (Crucial mx500), and am using jupyter notebook in anaconda env.
Hey Ryan, Just watching this and following along. I’ve got a question please; At the 17:30 mark I noticed that the split you did seemed a bit overwhelming. As a novice in data scientce, I couldn't help but notice something interesting in the data. There were event names labeled inconsistently for the USA, some as "usaaaaA" and others as "usaaa". So I used a simple string.contains() function with case sensitivity turned off to standardize it, resulting in 1.7 million rows. Wanted to hear your thoughts on this approach. I know might be labeled a lazy and easy approach but I found this catching more rows effectively. Please give me your views(I’m still learning)
Awesome video, finished it in a day. Planning to do 1 project a day following videos till I get comfortable doing things on my own. Very easy to follow, thank you so much 😊
Thanks for the video! really helpful. Do you think you can do a Data Scientist Portfolio Project(s) series? I'm sure you'll find a lot of people interested in that (including me haha)!
@@RyanAndMattDataScience well , I know python libraries like pandas , numpy , seaborn and maplotlib ....sql , power bi ..is this sufficient to get a job
My kernel keeps dying whenever I combine all the filters of the df to create df2. Does anyone know how to resolve this issue? Otherwise I can't progress :(
hi, can I get the full syntax at minute 49.07, I can't see the continuation. I need it for my current school assignment, and this will help me a lot😊😊😊
Was just doing some python learning to get some more knowledge and and I just found your channel. I heard you say you ran your marathon around UCF. I'm a fellow alumni as well from there haha. Go knights!
yes this is better I agree. Also, I also tried using the split method as used by Ryan but all my entries in the "Event name" get replaced with "None" somehow! I don't understand.
Thank you for your sharing. But when you use (df["Event name"].str.split("(").str.get(1).str.split(")").str.get(0) == "USA") to select all the USA races, it will ignore the events that contain more than one () in their name, such as Palisades Ultra Trail Series (PUTS) - Big Elk 50k (USA). It might be a good way to use df["Event name"].str.contains(r"\(USA\)".
In event_len column there are many row items with km, mi, h..... how can we check all these to get the correct count, and how to extract numbers only. Should we be using REGEX for that?
New subscriber here! Thank you for your good work. Just a quick question. To extract events held in USA, since we know we are looking for the 3 letters between the 5th last and last as USA, couldn't we use this condition: (df['Event name'].str[-4:-1]=='USA')? I used it but my dataframe returns 26524 rows which I thought might be due to difference in the version of dataset. I also tried (df['Event name'].str.endswith("(USA)")) and got the same number of rows.
Hello Ryan, Thanks for sharing your skills. I would like to understand why you have dropped Athlethe Club and Country ? I thinks it would be better if you had dropped rows whose have an empty value than you can modify the type of column.
There’s a lot of ways you could look at a dataset. I did this a long time ago so can’t remember exactly why I did but for what I was working on I don’t believe it mattered
halloo, ryan... can i ask something? is there any other tools (software/ application/ website) that can be used for using python? im so soorry for the question,, please dont laugh for me,, hehehehe... im very new beginner learning for data analyst... i have a dream to become business analyst... do u have some suggestion for me please?
I want to drop many recommendations on this videos. But for now, here's one: #changing columns names dfn.columns = dfn.columns.str.replace(" ","_").str.lower() dfn.head(3) Thanks me Through likes..please 👁👁👁
Thanks for checking out this video.
Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
If you want to watch a full course on Python Pandas check out Datacamp: datacamp.pxf.io/XYD7Qg
Want to solve Python data interview questions: stratascratch.com/?via=ryan
I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
*Both Datacamp and Stratascratch are affiliate links.
You're a good man Ryan. Hope the run went well
Very tough race but completed it. Doing a 100m next year
Great work Ryan . Thank you
Thank you for checking it out
I'm loving your videos, I'm building my own porfolio and your videos are being truly useful. tysm!
Thank you good luck with the portfolio. Make sure to join our discord
Hey Ryan, Greetings from India
I shadowed you for 3 days and completed the project in bits but glad I finished the whole video.
Loved the project and the way you taught it.
(Just a suggestion - Please go by the agenda for the project, so that we can have an outline in our minds of the key things that we as data analysts need to extract from the data.)
Not subscribing you would be a sin, after going through this beautiful and informative video!. keep going!
I appreciate it! Working on a big class video next! Followed by Pytoch
@@RyanAndMattDataScience would love it, I am not sure whether you do it or not, as i just came across your video today, but do try posting (community post) some time before the videos, would not want to miss it.
Appreciate for valuable input by you, really impressed by a tutor's ability to convey after more than a decade !!
Hey, Ryan, i am getting this error when combining all the filters together. Could you please guide how to sort this?
MemoryError: Unable to allocate 75.9 TiB for an array with shape (7461195, 1398540) and data type float64
I have a 8th gen cpu (i5 - 8350U), 24 Gb RAM, 500 GB SSD (Crucial mx500), and am using jupyter notebook in anaconda env.
@@rahulpal_dsmlcan you try running it in Kaggle or Google collab?
@@RyanAndMattDataScience Hi, yes, it did run on google colab, thanks a lot
Great job!! Always appreciated for awesome contents you make!
Thank you! If you want more content join our discord
Hey Ryan,
Just watching this and following along.
I’ve got a question please;
At the 17:30 mark I noticed that the split you did seemed a bit overwhelming. As a novice in data scientce, I couldn't help but notice something interesting in the data. There were event names labeled inconsistently for the USA, some as "usaaaaA" and others as "usaaa". So I used a simple string.contains() function with case sensitivity turned off to standardize it, resulting in 1.7 million rows. Wanted to hear your thoughts on this approach.
I know might be labeled a lazy and easy approach but I found this catching more rows effectively. Please give me your views(I’m still learning)
So my .shape() is 30120 not 26090
Awesome video, finished it in a day. Planning to do 1 project a day following videos till I get comfortable doing things on my own. Very easy to follow, thank you so much 😊
Awesome make sure to check out some of the other videos
@@RyanAndMattDataScience yes definitely
Great Project. Could you develop a full project? Something that includes sql, python and then a visualization tool. That would be amazing
Yes I’ll be working on one in the future. Focus atm is more models like the one I uploaded today
Thanks for the video! really helpful.
Do you think you can do a Data Scientist Portfolio Project(s) series? I'm sure you'll find a lot of people interested in that (including me haha)!
hey I have 2 out so far! and I just published another data analyst project last week
@@RyanAndMattDataScience Okey thanks, I'll definitely check them out!
Fantastic video, Ryan. Thanks for sharing your knowledge with us.
No problem
Its simply amazing ....i lke the way u teach and informative video
Thank you
@@RyanAndMattDataScience ...need one question answer .. according to job market ....which python libraries I should know for data analyst profile ..
@@akshatalanjewar3056 start with pandas and scikit learn
@@RyanAndMattDataScience well , I know python libraries like pandas , numpy , seaborn and maplotlib ....sql , power bi ..is this sufficient to get a job
Awesome work Ryan 🔥🔥🔥🔥
I have just watched it and I appreciate the effort that you put in for the video. I will be using this as my portfolio project.
Thanks for checking it out
Hi, had a quick question!
at 17:27 would there be any downside to me using something like df[df['Event name'].str.contains('USA')] instead?
Thanks!
Thanks Ryan! Great Project.
Thanks for checking it out. New project next week
Really useful project for learning especially since the datasample is included. Thanks a lot 🙏
Glad it was helpful! Currently in the initial stages of next months project
@@RyanAndMattDataScience Great! Looking forward to that👌
Great video, Ryan!
Thanks!
Thank you so much for bringing back my love for this field.
I needed this so much...
great great content ! i am subscribing only on the condition you upload more and more in depth analysis using Python. Keep it up
3 videos a week! I’m starting my regression series today
Great Vid mahn .. really helpful
My kernel keeps dying whenever I combine all the filters of the df to create df2. Does anyone know how to resolve this issue? Otherwise I can't progress :(
same, did u figure it out?
Hi Ryan, Instead of df['Event name'].str.split('(').str.get(1).str.split(')').str.get(0) == 'USA' can we use df['Event name'].str[-5:] == '(USA)'?
Yes this is somehow giving 1408416 rows while the method that Ryan in the video is giving 1398540 rows.
hi, can I get the full syntax at minute 49.07, I can't see the continuation. I need it for my current school assignment, and this will help me a lot😊😊😊
df3.query('race_length == "50mi"').groupby('athlete_age')['athlete_average_speed'].agg(['mean', 'count']).sort_values('mean', ascending=False).query('count > 19').head(15)
Was just doing some python learning to get some more knowledge and and I just found your channel. I heard you say you ran your marathon around UCF. I'm a fellow alumni as well from there haha. Go knights!
Haha charge on and 25 loops around campus. I’ll never run there again lol
I am not able to find this project in your github profile . please help ryan
This is an easier way to remove USA from the event name = df2["Event name"] = df2["Event name"].str.replace("(USA)", " ")
yes this is better I agree. Also, I also tried using the split method as used by Ryan but all my entries in the "Event name" get replaced with "None" somehow! I don't understand.
Great video Ryan!
Thank you
Thank you for your sharing. But when you use (df["Event name"].str.split("(").str.get(1).str.split(")").str.get(0) == "USA") to select all the USA races, it will ignore the events that contain more than one () in their name, such as Palisades Ultra Trail Series (PUTS) - Big Elk 50k (USA). It might be a good way to use df["Event name"].str.contains(r"\(USA\)".
Ah didn’t realize when doing this project. Great catch and thanks for commenting
thanks for the vid -- very good - your camera view cuts off far right side of visual, though
Thanks and sorry :/ problem with face cams
@@RyanAndMattDataScience I subscribed man - thanks a million for the vids - very good
@@michaelg9359 no problem
In event_len column there are many row items with km, mi, h..... how can we check all these to get the correct count, and how to extract numbers only. Should we be using REGEX for that?
New subscriber here! Thank you for your good work. Just a quick question. To extract events held in USA, since we know we are looking for the 3 letters between the 5th last and last as USA, couldn't we use this condition: (df['Event name'].str[-4:-1]=='USA')? I used it but my dataframe returns 26524 rows which I thought might be due to difference in the version of dataset.
I also tried (df['Event name'].str.endswith("(USA)")) and got the same number of rows.
Hey Ryan, great video. Just wanted to ask how much RAM you suggest for these projects to be able to run quickly?
Great work. Thank you i was following hands on. # Subscribed_and_Liked
No problem
Brilliant Project !
Thanks
thanks for the video I enjoy it. But I can not download the data !
You are amazing, keep it bro
Thanks, more vids next week. I’m a bit under the weather atm
subscribed! great video. Thank you for this
Thank you!
Hello
1th of Thank you !!
& how about:
df = pd.DataFrame(data)
usa_events = df[df['Event name'].str.contains('USA')]
print(usa_events)
Thank you very much Ryan
Np
Hello Ryan,
Thanks for sharing your skills.
I would like to understand why you have dropped Athlethe Club and Country ?
I thinks it would be better if you had dropped rows whose have an empty value than you can modify the type of column.
There’s a lot of ways you could look at a dataset. I did this a long time ago so can’t remember exactly why I did but for what I was working on I don’t believe it mattered
Is this lesson possible in vscode or would I need to learn to use jupyternotebook?
Can use either
What is the reason behind cleaning up NaN values?
Often with modeling null values can’t be present
thank you👍
No problem
Great video!
Thank you
Between Matplotlib and Seaborn which one should be used or both should be used ?
I use both in projects
What is the reason behind dropping columns?
Not every column is needed for data analysis
is it okay if I add this project to my portfolio after completing it? great video!
Go for it
Liked, subscribed and commented for stats. Keep going :)
Thanks! Just got my first model video out today
NameError: name 'df' is not defined
Ask in our discord server
@@RyanAndMattDataScience thank you. Please link it. I cant see it on your profile
@@Naadiaajmalpinned comment
@@BreakoutCards thanksss
I can't find 2020 year of event.. Any solution?
same problem here, i made with .contains() and i dont know why i cant return 2020 rows
Where do I find this project source code?
I plan on putting all the code from videos on my website, but I need to scale up a bit dont have the resources atm
Great project!
Thank you!
Great
Thank you!
30:00
halloo, ryan... can i ask something? is there any other tools (software/ application/ website) that can be used for using python? im so soorry for the question,, please dont laugh for me,, hehehehe... im very new beginner learning for data analyst... i have a dream to become business analyst... do u have some suggestion for me please?
I use google collab also and visual studio
it has only 7M rows
you really need to expand the screen a little bit
It is in new videos
Subscribed!
youre the best
I want to drop many recommendations on this videos.
But for now, here's one:
#changing columns names
dfn.columns = dfn.columns.str.replace(" ","_").str.lower()
dfn.head(3)
Thanks me Through likes..please 👁👁👁
comment
Thanks Ryan
No problem