Data Analyst Portfolio Project (Exploratory Data Analysis With Python Pandas)

Ryan & Matt Data Science

Просмотров 68 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 янв 2025
Наука

Комментарии • 130

@RyanAndMattDataScience 5 месяцев назад
Thanks for checking out this video.
Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
If you want to watch a full course on Python Pandas check out Datacamp: datacamp.pxf.io/XYD7Qg
Want to solve Python data interview questions: stratascratch.com/?via=ryan
I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
*Both Datacamp and Stratascratch are affiliate links.
@nagarajsanil4241 11 дней назад ⁺¹
This video is amazing. I started learning to use Pandas last week and this is exactly what someone like me needs.
It's unfortunate that you have not got the amount of subscribers you deserve ; and I can see your frustration when you ask us to subscribe. You've gained a new subscriber in me.
@RyanAndMattDataScience 10 дней назад ⁺¹
Glad it was helpful!
@WarbossPepe Год назад ⁺¹⁴
You're a good man Ryan. Hope the run went well
@RyanAndMattDataScience Год назад ⁺¹
Very tough race but completed it. Doing a 100m next year
@idreeskhan5129 10 месяцев назад ⁺³
Great work Ryan . Thank you
@RyanAndMattDataScience 10 месяцев назад
Thank you for checking it out
@vanializandre3493 3 месяца назад ⁺²
I'm loving your videos, I'm building my own porfolio and your videos are being truly useful. tysm!
@RyanAndMattDataScience 3 месяца назад ⁺¹
Thank you good luck with the portfolio. Make sure to join our discord
@thekendev 9 месяцев назад ⁺²
Hey Ryan,
Just watching this and following along.
I’ve got a question please;
At the 17:30 mark I noticed that the split you did seemed a bit overwhelming. As a novice in data scientce, I couldn't help but notice something interesting in the data. There were event names labeled inconsistently for the USA, some as "usaaaaA" and others as "usaaa". So I used a simple string.contains() function with case sensitivity turned off to standardize it, resulting in 1.7 million rows. Wanted to hear your thoughts on this approach.
I know might be labeled a lazy and easy approach but I found this catching more rows effectively. Please give me your views(I’m still learning)
@thekendev 9 месяцев назад
So my .shape() is 30120 not 26090
@arun_jakhmola 8 месяцев назад ⁺¹
Hey Ryan, Greetings from India
I shadowed you for 3 days and completed the project in bits but glad I finished the whole video.
Loved the project and the way you taught it.
(Just a suggestion - Please go by the agenda for the project, so that we can have an outline in our minds of the key things that we as data analysts need to extract from the data.)
@minast1622 3 месяца назад
Great job!! Always appreciated for awesome contents you make!
@RyanAndMattDataScience 3 месяца назад
Thank you! If you want more content join our discord
@rahulpal_dsml Год назад ⁺³
Not subscribing you would be a sin, after going through this beautiful and informative video!. keep going!
@RyanAndMattDataScience Год назад ⁺¹
I appreciate it! Working on a big class video next! Followed by Pytoch
@rahulpal_dsml Год назад
@@RyanAndMattDataScience would love it, I am not sure whether you do it or not, as i just came across your video today, but do try posting (community post) some time before the videos, would not want to miss it.
Appreciate for valuable input by you, really impressed by a tutor's ability to convey after more than a decade !!
@rahulpal_dsml Год назад
Hey, Ryan, i am getting this error when combining all the filters together. Could you please guide how to sort this?
MemoryError: Unable to allocate 75.9 TiB for an array with shape (7461195, 1398540) and data type float64
I have a 8th gen cpu (i5 - 8350U), 24 Gb RAM, 500 GB SSD (Crucial mx500), and am using jupyter notebook in anaconda env.
@RyanAndMattDataScience Год назад
@@rahulpal_dsmlcan you try running it in Kaggle or Google collab?
@rahulpal_dsml Год назад
@@RyanAndMattDataScience Hi, yes, it did run on google colab, thanks a lot
@MiguelGracia-g2d 6 месяцев назад ⁺²
Hi, had a quick question!
at 17:27 would there be any downside to me using something like df[df['Event name'].str.contains('USA')] instead?
Thanks!
@emastehr Год назад ⁺⁵
Great Project. Could you develop a full project? Something that includes sql, python and then a visualization tool. That would be amazing
@RyanAndMattDataScience Год назад
Yes I’ll be working on one in the future. Focus atm is more models like the one I uploaded today
@234bellamkonda 5 месяцев назад
Awesome video, finished it in a day. Planning to do 1 project a day following videos till I get comfortable doing things on my own. Very easy to follow, thank you so much 😊
@RyanAndMattDataScience 5 месяцев назад ⁺¹
Awesome make sure to check out some of the other videos
@234bellamkonda 5 месяцев назад
@@RyanAndMattDataScience yes definitely
@jkzhakom 9 месяцев назад
Fantastic video, Ryan. Thanks for sharing your knowledge with us.
@RyanAndMattDataScience 9 месяцев назад
No problem
@nlnl72 10 месяцев назад ⁺¹
Thanks for the video! really helpful.
Do you think you can do a Data Scientist Portfolio Project(s) series? I'm sure you'll find a lot of people interested in that (including me haha)!
@RyanAndMattDataScience 10 месяцев назад ⁺¹
hey I have 2 out so far! and I just published another data analyst project last week
@nlnl72 10 месяцев назад
@@RyanAndMattDataScience Okey thanks, I'll definitely check them out!
@shayanakhavan6002 9 месяцев назад
Great video, Ryan!
@RyanAndMattDataScience 9 месяцев назад
Thanks!
@7a30adnanbin5 9 месяцев назад ⁺¹
Great Vid mahn .. really helpful
@takashiiexe 11 месяцев назад
Thanks Ryan! Great Project.
@RyanAndMattDataScience 11 месяцев назад
Thanks for checking it out. New project next week
@shailendra_kunwar 9 месяцев назад
Awesome work Ryan 🔥🔥🔥🔥
I have just watched it and I appreciate the effort that you put in for the video. I will be using this as my portfolio project.
@RyanAndMattDataScience 9 месяцев назад
Thanks for checking it out
@ayantikaC03 Год назад
Great video Ryan!
@RyanAndMattDataScience Год назад
Thank you
@OrochiFlamez 20 дней назад
I can't find the syntax at 33:24 on your github. Can you tell me the whole thing? I can't keep following along until I know.
@RoleJohn Год назад
great great content ! i am subscribing only on the condition you upload more and more in depth analysis using Python. Keep it up
@RyanAndMattDataScience Год назад ⁺²
3 videos a week! I’m starting my regression series today
@akshatalanjewar3056 10 месяцев назад
Its simply amazing ....i lke the way u teach and informative video
@RyanAndMattDataScience 10 месяцев назад
Thank you
@akshatalanjewar3056 10 месяцев назад
@@RyanAndMattDataScience ...need one question answer .. according to job market ....which python libraries I should know for data analyst profile ..
@RyanAndMattDataScience 10 месяцев назад
@@akshatalanjewar3056 start with pandas and scikit learn
@akshatalanjewar3056 10 месяцев назад
@@RyanAndMattDataScience well , I know python libraries like pandas , numpy , seaborn and maplotlib ....sql , power bi ..is this sufficient to get a job
@AmbarGharat 10 месяцев назад ⁺⁴
Hi Ryan, Instead of df['Event name'].str.split('(').str.get(1).str.split(')').str.get(0) == 'USA' can we use df['Event name'].str[-5:] == '(USA)'?
@shailendra_kunwar 9 месяцев назад
Yes this is somehow giving 1408416 rows while the method that Ryan in the video is giving 1398540 rows.
@MilyaYamil 4 дня назад
So I got a question. How would you present this kind of exploratory analysis as portfolio? Would you just provide the code/notebook? I think it's too many lines of code to know what's going on if you don't explain things extensively (maybe in the same notebook or on a separate document). Thanks
@Nighthunterm 9 месяцев назад
Was just doing some python learning to get some more knowledge and and I just found your channel. I heard you say you ran your marathon around UCF. I'm a fellow alumni as well from there haha. Go knights!
@RyanAndMattDataScience 9 месяцев назад
Haha charge on and 25 loops around campus. I’ll never run there again lol
@alexrosen8762 Год назад
Really useful project for learning especially since the datasample is included. Thanks a lot 🙏
@RyanAndMattDataScience Год назад ⁺¹
Glad it was helpful! Currently in the initial stages of next months project
@alexrosen8762 Год назад
@@RyanAndMattDataScience Great! Looking forward to that👌
@binarify4364 10 месяцев назад
Brilliant Project !
@RyanAndMattDataScience 10 месяцев назад
Thanks
@maxnicolasnavarro4017 5 месяцев назад
Thank you so much for bringing back my love for this field.
I needed this so much...
@tarekhusam Год назад
You are amazing, keep it bro
@RyanAndMattDataScience Год назад ⁺¹
Thanks, more vids next week. I’m a bit under the weather atm
@geoffreycg5650 Год назад
Great video!
@RyanAndMattDataScience Год назад
Thank you
@dj-mt1pz 10 месяцев назад ⁺³
My kernel keeps dying whenever I combine all the filters of the df to create df2. Does anyone know how to resolve this issue? Otherwise I can't progress :(
@linda_erose 6 месяцев назад ⁺¹
same, did u figure it out?
@athayaazaria1825 10 месяцев назад ⁺¹
hi, can I get the full syntax at minute 49.07, I can't see the continuation. I need it for my current school assignment, and this will help me a lot😊😊😊
@MessiBetterNoCap 2 месяца назад
df3.query('race_length == "50mi"').groupby('athlete_age')['athlete_average_speed'].agg(['mean', 'count']).sort_values('mean', ascending=False).query('count > 19').head(15)
@kokowin5851 8 месяцев назад ⁺⁸
This is an easier way to remove USA from the event name = df2["Event name"] = df2["Event name"].str.replace("(USA)", " ")
@sandydalhousie 5 месяцев назад
yes this is better I agree. Also, I also tried using the split method as used by Ryan but all my entries in the "Event name" get replaced with "None" somehow! I don't understand.
@stallonengobua8820 7 месяцев назад
Thank you very much Ryan
@RyanAndMattDataScience 7 месяцев назад
Np
@shrushtilonare1034 3 месяца назад
I am not able to find this project in your github profile . please help ryan
@everywoman2774 Год назад
subscribed! great video. Thank you for this
@RyanAndMattDataScience Год назад
Thank you!
@RRangel7b 6 месяцев назад ⁺¹
Hello
1th of Thank you !!
& how about:
df = pd.DataFrame(data)
usa_events = df[df['Event name'].str.contains('USA')]
print(usa_events)
@aminabahloul2175 3 месяца назад
thanks for the video I enjoy it. But I can not download the data !
@Al-Ahdal 8 месяцев назад
In event_len column there are many row items with km, mi, h..... how can we check all these to get the correct count, and how to extract numbers only. Should we be using REGEX for that?
@lujingyan6853 10 месяцев назад ⁺¹
Thank you for your sharing. But when you use (df["Event name"].str.split("(").str.get(1).str.split(")").str.get(0) == "USA") to select all the USA races, it will ignore the events that contain more than one () in their name, such as Palisades Ultra Trail Series (PUTS) - Big Elk 50k (USA). It might be a good way to use df["Event name"].str.contains(r"$USA$".
@RyanAndMattDataScience 10 месяцев назад
Ah didn’t realize when doing this project. Great catch and thanks for commenting
@charlieadleydog 8 месяцев назад
Hey Ryan, great video. Just wanted to ask how much RAM you suggest for these projects to be able to run quickly?
@navid7467 4 месяца назад
New subscriber here! Thank you for your good work. Just a quick question. To extract events held in USA, since we know we are looking for the 3 letters between the 5th last and last as USA, couldn't we use this condition: (df['Event name'].str[-4:-1]=='USA')? I used it but my dataframe returns 26524 rows which I thought might be due to difference in the version of dataset.
I also tried (df['Event name'].str.endswith("(USA)")) and got the same number of rows.
@michaelshepherdmunemo4414 8 месяцев назад
Great work. Thank you i was following hands on. # Subscribed_and_Liked
@RyanAndMattDataScience 7 месяцев назад
No problem
@michaelg9359 Год назад
thanks for the vid -- very good - your camera view cuts off far right side of visual, though
@RyanAndMattDataScience 11 месяцев назад
Thanks and sorry :/ problem with face cams
@michaelg9359 11 месяцев назад
@@RyanAndMattDataScience I subscribed man - thanks a million for the vids - very good
@RyanAndMattDataScience 11 месяцев назад
@@michaelg9359 no problem
@aadilazeem601 2 месяца назад ⁺³
it has only 7M rows
@chalamohamed2013 Год назад
Hello Ryan,
Thanks for sharing your skills.
I would like to understand why you have dropped Athlethe Club and Country ?
I thinks it would be better if you had dropped rows whose have an empty value than you can modify the type of column.
@RyanAndMattDataScience Год назад
There’s a lot of ways you could look at a dataset. I did this a long time ago so can’t remember exactly why I did but for what I was working on I don’t believe it mattered
@dominiktokarski8054 Год назад
Liked, subscribed and commented for stats. Keep going :)
@RyanAndMattDataScience Год назад
Thanks! Just got my first model video out today
@onurdatascience Год назад
Great project!
@RyanAndMattDataScience Год назад
Thank you!
@Naadiaajmal 3 месяца назад
NameError: name 'df' is not defined
@RyanAndMattDataScience 3 месяца назад
Ask in our discord server
@Naadiaajmal 3 месяца назад
@@RyanAndMattDataScience thank you. Please link it. I cant see it on your profile
@BreakoutCards 3 месяца назад
@@Naadiaajmalpinned comment
@Naadiaajmal 3 месяца назад
@@BreakoutCards thanksss
@jonathangarcia8124 8 месяцев назад
Is this lesson possible in vscode or would I need to learn to use jupyternotebook?
@RyanAndMattDataScience 8 месяцев назад
Can use either
@rishidixit7939 8 месяцев назад
Between Matplotlib and Seaborn which one should be used or both should be used ?
@RyanAndMattDataScience 8 месяцев назад
I use both in projects
@mohamedzrirak5884 4 месяца назад
thank you👍
@RyanAndMattDataScience 4 месяца назад
No problem
@katehudson7405 11 месяцев назад
is it okay if I add this project to my portfolio after completing it? great video!
@RyanAndMattDataScience 11 месяцев назад
Go for it
@mikefranko2832 Год назад
What is the reason behind cleaning up NaN values?
@RyanAndMattDataScience Год назад ⁺¹
Often with modeling null values can’t be present
@tosinwilliams9343 Год назад
Thanks Ryan
@RyanAndMattDataScience Год назад ⁺¹
No problem
@dennisbunarta1190 9 месяцев назад
I can't find 2020 year of event.. Any solution?
@J4vierC 6 месяцев назад
same problem here, i made with .contains() and i dont know why i cant return 2020 rows
@mikefranko2832 Год назад
What is the reason behind dropping columns?
@RyanAndMattDataScience Год назад
Not every column is needed for data analysis
@aliomar9594 Год назад ⁺¹
Great
@RyanAndMattDataScience Год назад ⁺¹
Thank you!
@SriramKoyalkar 8 месяцев назад
Where do I find this project source code?
@RyanAndMattDataScience 8 месяцев назад
I plan on putting all the code from videos on my website, but I need to scale up a bit dont have the resources atm
@ammarsyrie1355 3 месяца назад
you really need to expand the screen a little bit
@RyanAndMattDataScience 3 месяца назад
It is in new videos
@iniuntukutube Год назад
halloo, ryan... can i ask something? is there any other tools (software/ application/ website) that can be used for using python? im so soorry for the question,, please dont laugh for me,, hehehehe... im very new beginner learning for data analyst... i have a dream to become business analyst... do u have some suggestion for me please?
@RyanAndMattDataScience Год назад
I use google collab also and visual studio
@JC_333 Год назад
Subscribed!
@RyanAndMattDataScience Год назад
youre the best
@zanngoc 3 месяца назад
30:00
@utkarshpandey2339 2 месяца назад
I want to drop many recommendations on this videos.
But for now, here's one:
#changing columns names
dfn.columns = dfn.columns.str.replace(" ","_").str.lower()
dfn.head(3)
Thanks me Through likes..please 👁👁👁
@GreyHatGenX 6 месяцев назад
comment

Следующие

Автовоспроизведение