Data Analyst Portfolio Project (Exploratory Data Analysis With Python Pandas)

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024

Комментарии • 126

  • @RyanAndMattDataScience
    @RyanAndMattDataScience  4 месяца назад

    Thanks for checking out this video.
    Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
    If you want to watch a full course on Python Pandas check out Datacamp: datacamp.pxf.io/XYD7Qg
    Want to solve Python data interview questions: stratascratch.com/?via=ryan
    I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
    *Both Datacamp and Stratascratch are affiliate links.

  • @WarbossPepe
    @WarbossPepe Год назад +13

    You're a good man Ryan. Hope the run went well

  • @idreeskhan5129
    @idreeskhan5129 9 месяцев назад +3

    Great work Ryan . Thank you

  • @vanializandre3493
    @vanializandre3493 2 месяца назад +2

    I'm loving your videos, I'm building my own porfolio and your videos are being truly useful. tysm!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  2 месяца назад +1

      Thank you good luck with the portfolio. Make sure to join our discord

  • @arun_jakhmola
    @arun_jakhmola 7 месяцев назад +1

    Hey Ryan, Greetings from India
    I shadowed you for 3 days and completed the project in bits but glad I finished the whole video.
    Loved the project and the way you taught it.
    (Just a suggestion - Please go by the agenda for the project, so that we can have an outline in our minds of the key things that we as data analysts need to extract from the data.)

  • @rahulpal_dsml
    @rahulpal_dsml Год назад +3

    Not subscribing you would be a sin, after going through this beautiful and informative video!. keep going!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад +1

      I appreciate it! Working on a big class video next! Followed by Pytoch

    • @rahulpal_dsml
      @rahulpal_dsml Год назад

      @@RyanAndMattDataScience would love it, I am not sure whether you do it or not, as i just came across your video today, but do try posting (community post) some time before the videos, would not want to miss it.
      Appreciate for valuable input by you, really impressed by a tutor's ability to convey after more than a decade !!

    • @rahulpal_dsml
      @rahulpal_dsml Год назад

      Hey, Ryan, i am getting this error when combining all the filters together. Could you please guide how to sort this?
      MemoryError: Unable to allocate 75.9 TiB for an array with shape (7461195, 1398540) and data type float64
      I have a 8th gen cpu (i5 - 8350U), 24 Gb RAM, 500 GB SSD (Crucial mx500), and am using jupyter notebook in anaconda env.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад

      @@rahulpal_dsmlcan you try running it in Kaggle or Google collab?

    • @rahulpal_dsml
      @rahulpal_dsml Год назад

      @@RyanAndMattDataScience Hi, yes, it did run on google colab, thanks a lot

  • @minast1622
    @minast1622 2 месяца назад

    Great job!! Always appreciated for awesome contents you make!

  • @thekendev
    @thekendev 8 месяцев назад +1

    Hey Ryan,
    Just watching this and following along.
    I’ve got a question please;
    At the 17:30 mark I noticed that the split you did seemed a bit overwhelming. As a novice in data scientce, I couldn't help but notice something interesting in the data. There were event names labeled inconsistently for the USA, some as "usaaaaA" and others as "usaaa". So I used a simple string.contains() function with case sensitivity turned off to standardize it, resulting in 1.7 million rows. Wanted to hear your thoughts on this approach.
    I know might be labeled a lazy and easy approach but I found this catching more rows effectively. Please give me your views(I’m still learning)

    • @thekendev
      @thekendev 8 месяцев назад

      So my .shape() is 30120 not 26090

  • @234bellamkonda
    @234bellamkonda 4 месяца назад

    Awesome video, finished it in a day. Planning to do 1 project a day following videos till I get comfortable doing things on my own. Very easy to follow, thank you so much 😊

  • @emastehr
    @emastehr Год назад +5

    Great Project. Could you develop a full project? Something that includes sql, python and then a visualization tool. That would be amazing

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад

      Yes I’ll be working on one in the future. Focus atm is more models like the one I uploaded today

  • @nlnl72
    @nlnl72 9 месяцев назад +1

    Thanks for the video! really helpful.
    Do you think you can do a Data Scientist Portfolio Project(s) series? I'm sure you'll find a lot of people interested in that (including me haha)!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  9 месяцев назад +1

      hey I have 2 out so far! and I just published another data analyst project last week

    • @nlnl72
      @nlnl72 9 месяцев назад

      @@RyanAndMattDataScience Okey thanks, I'll definitely check them out!

  • @jkzhakom
    @jkzhakom 8 месяцев назад

    Fantastic video, Ryan. Thanks for sharing your knowledge with us.

  • @akshatalanjewar3056
    @akshatalanjewar3056 9 месяцев назад

    Its simply amazing ....i lke the way u teach and informative video

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  9 месяцев назад

      Thank you

    • @akshatalanjewar3056
      @akshatalanjewar3056 9 месяцев назад

      @@RyanAndMattDataScience ...need one question answer .. according to job market ....which python libraries I should know for data analyst profile ..

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  9 месяцев назад

      @@akshatalanjewar3056 start with pandas and scikit learn

    • @akshatalanjewar3056
      @akshatalanjewar3056 9 месяцев назад

      @@RyanAndMattDataScience well , I know python libraries like pandas , numpy , seaborn and maplotlib ....sql , power bi ..is this sufficient to get a job

  • @shailendra_kunwar
    @shailendra_kunwar 8 месяцев назад

    Awesome work Ryan 🔥🔥🔥🔥
    I have just watched it and I appreciate the effort that you put in for the video. I will be using this as my portfolio project.

  • @MiguelGracia-g2d
    @MiguelGracia-g2d 5 месяцев назад

    Hi, had a quick question!
    at 17:27 would there be any downside to me using something like df[df['Event name'].str.contains('USA')] instead?
    Thanks!

  • @takashiiexe
    @takashiiexe 9 месяцев назад

    Thanks Ryan! Great Project.

  • @alexrosen8762
    @alexrosen8762 Год назад

    Really useful project for learning especially since the datasample is included. Thanks a lot 🙏

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад +1

      Glad it was helpful! Currently in the initial stages of next months project

    • @alexrosen8762
      @alexrosen8762 Год назад

      @@RyanAndMattDataScience Great! Looking forward to that👌

  • @shayanakhavan6002
    @shayanakhavan6002 8 месяцев назад

    Great video, Ryan!

  • @maxnicolasnavarro4017
    @maxnicolasnavarro4017 4 месяца назад

    Thank you so much for bringing back my love for this field.
    I needed this so much...

  • @RoleJohn
    @RoleJohn Год назад

    great great content ! i am subscribing only on the condition you upload more and more in depth analysis using Python. Keep it up

  • @7a30adnanbin5
    @7a30adnanbin5 8 месяцев назад +1

    Great Vid mahn .. really helpful

  • @dj-mt1pz
    @dj-mt1pz 9 месяцев назад +3

    My kernel keeps dying whenever I combine all the filters of the df to create df2. Does anyone know how to resolve this issue? Otherwise I can't progress :(

    • @linda_erose
      @linda_erose 5 месяцев назад +1

      same, did u figure it out?

  • @AmbarGharat
    @AmbarGharat 9 месяцев назад +3

    Hi Ryan, Instead of df['Event name'].str.split('(').str.get(1).str.split(')').str.get(0) == 'USA' can we use df['Event name'].str[-5:] == '(USA)'?

    • @shailendra_kunwar
      @shailendra_kunwar 8 месяцев назад

      Yes this is somehow giving 1408416 rows while the method that Ryan in the video is giving 1398540 rows.

  • @athayaazaria1825
    @athayaazaria1825 9 месяцев назад +1

    hi, can I get the full syntax at minute 49.07, I can't see the continuation. I need it for my current school assignment, and this will help me a lot😊😊😊

    • @SlumpyCatMEOW
      @SlumpyCatMEOW Месяц назад

      df3.query('race_length == "50mi"').groupby('athlete_age')['athlete_average_speed'].agg(['mean', 'count']).sort_values('mean', ascending=False).query('count > 19').head(15)

  • @Nighthunterm
    @Nighthunterm 8 месяцев назад

    Was just doing some python learning to get some more knowledge and and I just found your channel. I heard you say you ran your marathon around UCF. I'm a fellow alumni as well from there haha. Go knights!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  8 месяцев назад

      Haha charge on and 25 loops around campus. I’ll never run there again lol

  • @shrushtilonare1034
    @shrushtilonare1034 2 месяца назад

    I am not able to find this project in your github profile . please help ryan

  • @kokowin5851
    @kokowin5851 7 месяцев назад +8

    This is an easier way to remove USA from the event name = df2["Event name"] = df2["Event name"].str.replace("(USA)", " ")

    • @sandydalhousie
      @sandydalhousie 4 месяца назад

      yes this is better I agree. Also, I also tried using the split method as used by Ryan but all my entries in the "Event name" get replaced with "None" somehow! I don't understand.

  • @ayantikaC03
    @ayantikaC03 Год назад

    Great video Ryan!

  • @lujingyan6853
    @lujingyan6853 9 месяцев назад +1

    Thank you for your sharing. But when you use (df["Event name"].str.split("(").str.get(1).str.split(")").str.get(0) == "USA") to select all the USA races, it will ignore the events that contain more than one () in their name, such as Palisades Ultra Trail Series (PUTS) - Big Elk 50k (USA). It might be a good way to use df["Event name"].str.contains(r"\(USA\)".

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  9 месяцев назад

      Ah didn’t realize when doing this project. Great catch and thanks for commenting

  • @michaelg9359
    @michaelg9359 10 месяцев назад

    thanks for the vid -- very good - your camera view cuts off far right side of visual, though

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  10 месяцев назад

      Thanks and sorry :/ problem with face cams

    • @michaelg9359
      @michaelg9359 10 месяцев назад

      @@RyanAndMattDataScience I subscribed man - thanks a million for the vids - very good

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  10 месяцев назад

      @@michaelg9359 no problem

  • @Al-Ahdal
    @Al-Ahdal 7 месяцев назад

    In event_len column there are many row items with km, mi, h..... how can we check all these to get the correct count, and how to extract numbers only. Should we be using REGEX for that?

  • @navid7467
    @navid7467 3 месяца назад

    New subscriber here! Thank you for your good work. Just a quick question. To extract events held in USA, since we know we are looking for the 3 letters between the 5th last and last as USA, couldn't we use this condition: (df['Event name'].str[-4:-1]=='USA')? I used it but my dataframe returns 26524 rows which I thought might be due to difference in the version of dataset.
    I also tried (df['Event name'].str.endswith("(USA)")) and got the same number of rows.

  • @charlieadleydog
    @charlieadleydog 7 месяцев назад

    Hey Ryan, great video. Just wanted to ask how much RAM you suggest for these projects to be able to run quickly?

  • @michaelshepherdmunemo4414
    @michaelshepherdmunemo4414 6 месяцев назад

    Great work. Thank you i was following hands on. # Subscribed_and_Liked

  • @binarify4364
    @binarify4364 9 месяцев назад

    Brilliant Project !

  • @aminabahloul2175
    @aminabahloul2175 2 месяца назад

    thanks for the video I enjoy it. But I can not download the data !

  • @tarekhusam
    @tarekhusam Год назад

    You are amazing, keep it bro

  • @everywoman2774
    @everywoman2774 Год назад

    subscribed! great video. Thank you for this

  • @RRangel7b
    @RRangel7b 5 месяцев назад +1

    Hello
    1th of Thank you !!
    & how about:
    df = pd.DataFrame(data)
    usa_events = df[df['Event name'].str.contains('USA')]
    print(usa_events)

  • @stallonengobua8820
    @stallonengobua8820 6 месяцев назад

    Thank you very much Ryan

  • @chalamohamed2013
    @chalamohamed2013 Год назад

    Hello Ryan,
    Thanks for sharing your skills.
    I would like to understand why you have dropped Athlethe Club and Country ?
    I thinks it would be better if you had dropped rows whose have an empty value than you can modify the type of column.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад

      There’s a lot of ways you could look at a dataset. I did this a long time ago so can’t remember exactly why I did but for what I was working on I don’t believe it mattered

  • @jonathangarcia8124
    @jonathangarcia8124 7 месяцев назад

    Is this lesson possible in vscode or would I need to learn to use jupyternotebook?

  • @mikefranko2832
    @mikefranko2832 Год назад

    What is the reason behind cleaning up NaN values?

  • @mohamedzrirak5884
    @mohamedzrirak5884 3 месяца назад

    thank you👍

  • @geoffreycg5650
    @geoffreycg5650 11 месяцев назад

    Great video!

  • @rishidixit7939
    @rishidixit7939 7 месяцев назад

    Between Matplotlib and Seaborn which one should be used or both should be used ?

  • @mikefranko2832
    @mikefranko2832 Год назад

    What is the reason behind dropping columns?

  • @katehudson7405
    @katehudson7405 10 месяцев назад

    is it okay if I add this project to my portfolio after completing it? great video!

  • @dominiktokarski8054
    @dominiktokarski8054 Год назад

    Liked, subscribed and commented for stats. Keep going :)

  • @Naadiaajmal
    @Naadiaajmal 2 месяца назад

    NameError: name 'df' is not defined

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  2 месяца назад

      Ask in our discord server

    • @Naadiaajmal
      @Naadiaajmal 2 месяца назад

      @@RyanAndMattDataScience thank you. Please link it. I cant see it on your profile

    • @BreakoutCards
      @BreakoutCards 2 месяца назад

      @@Naadiaajmalpinned comment

    • @Naadiaajmal
      @Naadiaajmal 2 месяца назад

      @@BreakoutCards thanksss

  • @dennisbunarta1190
    @dennisbunarta1190 8 месяцев назад

    I can't find 2020 year of event.. Any solution?

    • @J4vierC
      @J4vierC 5 месяцев назад

      same problem here, i made with .contains() and i dont know why i cant return 2020 rows

  • @SriramKoyalkar
    @SriramKoyalkar 7 месяцев назад

    Where do I find this project source code?

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 месяцев назад

      I plan on putting all the code from videos on my website, but I need to scale up a bit dont have the resources atm

  • @onurdatascience
    @onurdatascience Год назад

    Great project!

  • @aliomar9594
    @aliomar9594 Год назад +1

    Great

  • @zanngoc
    @zanngoc 2 месяца назад

    30:00

  • @iniuntukutube
    @iniuntukutube Год назад

    halloo, ryan... can i ask something? is there any other tools (software/ application/ website) that can be used for using python? im so soorry for the question,, please dont laugh for me,, hehehehe... im very new beginner learning for data analyst... i have a dream to become business analyst... do u have some suggestion for me please?

  • @aadilazeem601
    @aadilazeem601 Месяц назад +1

    it has only 7M rows

  • @ammarsyrie1355
    @ammarsyrie1355 2 месяца назад

    you really need to expand the screen a little bit

  • @JC_333
    @JC_333 Год назад

    Subscribed!

  • @utkarshpandey2339
    @utkarshpandey2339 Месяц назад

    I want to drop many recommendations on this videos.
    But for now, here's one:
    #changing columns names
    dfn.columns = dfn.columns.str.replace(" ","_").str.lower()
    dfn.head(3)
    Thanks me Through likes..please 👁👁👁

  • @GreyHatGenX
    @GreyHatGenX 5 месяцев назад

    comment

  • @tosinwilliams9343
    @tosinwilliams9343 11 месяцев назад

    Thanks Ryan