Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)

Поделиться
HTML-код
  • Опубликовано: 9 фев 2025

Комментарии • 97

  • @RyanAndMattDataScience
    @RyanAndMattDataScience  6 месяцев назад +1

    Hey guys I hope you enjoyed the video! If you did please subscribe to the channel!
    Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
    If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg
    Want to solve Python data interview questions: stratascratch.com/?via=ryan
    I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
    *Both Datacamp and Stratascratch are affiliate links.

  • @KyaBroderick
    @KyaBroderick 7 месяцев назад +9

    this needs more views. was so in depth and perfect for a beginner!

  • @collingreens7
    @collingreens7 7 месяцев назад +3

    I loved the walkthrough, honestly the last about 35 mins I had no idea what was going on but it's really cool that people like you are giving free tutorials on such complex work. Thanks!

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  7 месяцев назад +1

      No problem. Everything I go over in this vid is covered in my ML and Python playlists. Check them out!

  • @RyanAndMattDataScience
    @RyanAndMattDataScience  Год назад +8

    Hope you enjoyed this video, it took so long to produce. If you enjoyed it, please subscribe to the channel.
    I just uploaded the 2nd part of this video where I improve the model (linked down below)
    Below are a few links that you should check out:
    Part 2: ruclips.net/video/KzK1pifa2Vk/видео.html&ab_channel=RyanNolanData
    Kaggle Code: www.kaggle.com/code/ryannolan1/titanic-wip-9-12
    Twitter: twitter.com/RyanNolanData
    LinkedIn: www.linkedin.com/in/ryan-p-nolan/
    SciKit-Learn Tutorials: ruclips.net/video/SjOfbbfI2qY/видео.html&ab_channel=RyanNolanData
    Practice SQL & Python Interview Questions: stratascratch.com/?via=ryan

  • @mericcapar2447
    @mericcapar2447 Месяц назад +1

    Thank you! I learned machine learning algorithms and made my first kaggle project with you! I am very grateful for that and i will watch your other videos. Thanks for great content.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Месяц назад +1

      Hey congrats man huge step forward! I appreciate you checking them out. Also join our discord

    • @mericcapar2447
      @mericcapar2447 Месяц назад

      @@RyanAndMattDataScience link is expired for this video. I will check other videos but maybe you should check other videos too.

    • @mericcapar2447
      @mericcapar2447 Месяц назад

      @@RyanAndMattDataScience also it can be about my country bec discord banned :D

  • @GregThatcher
    @GregThatcher Месяц назад

    Thanks!

  • @AmaRan31
    @AmaRan31 Год назад +2

    It was a super useful video and I am happy to have done my first Data Science project. Thank you very much.

  • @zacharygrant1829
    @zacharygrant1829 2 месяца назад

    Appreciate it man, graduated last April and this series has been a lifesaver to refresh my python skills before I begin working.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  2 месяца назад

      No problem, join our discord also! Plan on expanding out content to our website in 2025 for more details on vids and such

  • @yakosti
    @yakosti Год назад +5

    Thank you for this, your videos are so helpful. Keep it up!

  • @janneskleinau6332
    @janneskleinau6332 27 дней назад +1

    Why do we write
    X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, stratify = y, random_state=21)
    and then never use X_valid and y_valid again? Isn't that a useless waste of data for the training?

    • @Searchingxpeace
      @Searchingxpeace 26 дней назад

      they are used for testing the model's performance

  • @ritamchatterjee8785
    @ritamchatterjee8785 Год назад +1

    yes man, much appriciated for your efforts

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад +2

      Thanks for working on this project! My housing one will be out in November

  • @mdaniels6311
    @mdaniels6311 26 дней назад

    I checked it out and those without ages were MORE likely to survive.
    This is counter intuitive, as I thought that poorer people's ages may not have been entered, and thus would appear more in those who did not survive.
    I then wondered if perhaps those without ages were infants or babies, and perhaps as children were first on boats, they survived at higher rates.
    I will spend some time checking this data out to try and find an answer.
    But it also made me realise having good social and political understand of the world will help a data scientist and machine learning practioner as these understandings may enhance the ability to explain odd findings.

  • @ChillWebDeveloper
    @ChillWebDeveloper Год назад +1

    at least you explain in detail what you are typing for after copy your line of code. Nice video btw

  • @robertbenson8554
    @robertbenson8554 7 месяцев назад

    Excellent video. So much in it, thought process, code tips etc.

  • @autiematic7224
    @autiematic7224 6 месяцев назад

    So helpful !! Ideal demonstration for my first projects, going forwards

  • @kotonelm069
    @kotonelm069 4 дня назад

    Is is always better to cut age group into more smaller groups
    ?

  • @idontevenwanttomakea
    @idontevenwanttomakea Год назад +1

    Hi, great video. One idea - instead of writing out so many loc statements, it might be easier to just use labels=False when using qcut.

  • @samallen598
    @samallen598 10 месяцев назад

    Very cool video! Would love to see some of this type of content.

  • @aviluminos8759
    @aviluminos8759 9 месяцев назад

    I got 78% result using forest. Thanks for the brilliant explanation!

  • @jessedostal3256
    @jessedostal3256 Месяц назад

    Your head is over the code that you are generating. It would help substantially if you could move the part with your camera to a different part of the screen (perhaps upper left hand corner?).

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Месяц назад

      Hey, I should have the notebook for this vid available on my Kaggle account. Also going to make this into an article

  • @mehdismaeili3743
    @mehdismaeili3743 2 месяца назад

    Excellent .

  • @rakshitshukla4205
    @rakshitshukla4205 6 месяцев назад

    Thank you so much for the video!!

  • @alihajizadeh7749
    @alihajizadeh7749 Год назад

    It really helped a lot, thank you, keep going

  • @pasindugimhan5779
    @pasindugimhan5779 Год назад

    Great work brother! I have subscribed you and waiting for next Kaggle projects also

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад +1

      Thanks and I just uploaded one this week!

    • @pasindugimhan5779
      @pasindugimhan5779 Год назад

      @@RyanAndMattDataScience Also at the final steps, I've faced to some errors. So, is there any way to contact you please..

  • @AbelGriffen
    @AbelGriffen 11 месяцев назад

    Hi @Ryan Thanks for making this amazing video. I just want to understand why did use "Plus one for yourself" @25:05? Thank you!

  • @codingcambodia
    @codingcambodia Год назад +1

    keep up your good work

  • @ayushijainrkt
    @ayushijainrkt 9 месяцев назад

    1:17:00 i don't understand the usage of .transform('count'). Can someone explain with an example?

  • @elfincredible9002
    @elfincredible9002 10 месяцев назад

    Thanks... I really enjoyed and you explain so well.... Bless you.

  • @japyh4
    @japyh4 Год назад +1

    Thank you so much. Keep it up:)

  • @chamudigamage7417
    @chamudigamage7417 Месяц назад

    Can you show us the part where you get the input? I don't know if I'm doing it right

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Месяц назад +1

      Shoot me a question on discord with the link to your notebook I’ll answer it on Januarys Q/A

  • @Al-Ahdal
    @Al-Ahdal 9 месяцев назад

    @Ryan Nolan Data: Excellent vdo.

  • @katorechaitanya
    @katorechaitanya Год назад +1

    you explained it in fantastic way just one request
    will you please provide the valid link for notebook actually its not working

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад

      Hey, just checked the link it's working?
      www.kaggle.com/code/ryannolan1/titanic-wip-9-12

  • @umarmusisi8853
    @umarmusisi8853 8 месяцев назад

    Awesomely awesome...i had to sub

  • @onurdatascience
    @onurdatascience Год назад +1

    Amazing content!

  • @tosinwilliams9343
    @tosinwilliams9343 Год назад

    Just a suggestion your next video should be on using chatgbt for this project

  • @lecturesfromleeds614
    @lecturesfromleeds614 Месяц назад

    If they had not called it "un-sinkable" I doubt it would have got half the attention that it did? I thought that its maiden voyage was Liverpool to New York and it sank en route? But the Dataset has other locations. Also didn't many of the survivors commit suicide not long afterwards?

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Месяц назад

      Not sure? It did have a sister ship also. I do collect titanic cards though. Have some from 1911 and a bit later

  • @sildistruttore
    @sildistruttore 10 месяцев назад

    What's the point in splitting the dataset into train and validation if then at the end you are using only the training to do the grid search with cross validation? doesn't the grid search directly create the validation set on the training set you give it?

  • @ixcel87
    @ixcel87 Год назад

    great tutorial, can't wait to check out part 2!
    question on correlation map; how did you use it to narrow down your parameters/features?

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад

      Part 2 is out! And I did this project a long time ago will try to take a look at the code and see the reasoning

    • @ixcel87
      @ixcel87 Год назад

      @@RyanAndMattDataScience
      Thanks again! I will view part 2 today. Also, definitely let me know about the correlation map and how it was used!

  • @Orokusaki1986
    @Orokusaki1986 4 месяца назад

    quick question: does Kaggle give you a rating based on speed/efficiency? I'm wondering specifically about just importing the whole libraries.

  • @tosinwilliams9343
    @tosinwilliams9343 Год назад

    This was really helpful 🥳🥳🥳🥳

  • @alanjohnstone8766
    @alanjohnstone8766 10 месяцев назад

    The length of the name is dominated by married ladies who have their married name AND their maiden names in brackets. Here is the top few:
    'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)',
    'Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")',
    'Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")',
    'Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)',
    'Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)'
    So I think when noting that survival is related to name length you are actually picking up that name length is a predictor of being female who of course have a
    higher chance of surviving.
    Analysing this dataset is addictive - I must give it up!

  • @jacksun8129
    @jacksun8129 7 месяцев назад +1

    Hi Ryan, I am new to data science. I am a bit lost on what the point of analyzing the ticket number and passenger name. What is the goal of doing that? Same with qcuts, are we doing them to help with a decision tree model? Do we need to do any of this if we just build a regression model?

    • @mdaniels6311
      @mdaniels6311 26 дней назад

      As richer people tend to have longer names, it is worth seeing if there are any correlations or patterns in the data.

  • @lenkapang-ek4fe
    @lenkapang-ek4fe 3 месяца назад

    I don't understand pipeline.:( how can I do for that?

  • @md.ishraquebinshafique1968
    @md.ishraquebinshafique1968 Месяц назад

    One question to the community.
    Which one of the two is better:
    1. df.describe()
    2. print(df.describe())

  • @Ананімна
    @Ананімна Год назад

    hello! thank you for your video, i am trying to follow you and repeat all the steps. i have found a better way to assign labels for age groups :
    df['Age_Lebel'] = pd.qcut( df['Age'], 8, labels = np.arange(8) + 1 )
    hope it can be helpful!

  • @alanjohnstone8766
    @alanjohnstone8766 10 месяцев назад +1

    I do not think you meant to make the young French girls ‘noble’ which you did. I am just starting to learn pandas with your help but some of the complicated string editing you did would have been so much simpler and more understandable if done in an old fashioned ‘for loop’. I know it is frowned upon by ‘experts’ but the whole point of Python is that the code is readable.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  10 месяцев назад

      Probably a small mistake. Curious if it did better with not marking them as noble

  • @SYCS13YashGadhave
    @SYCS13YashGadhave 7 месяцев назад

    can we add this project to our resume ?

  • @ChillWebDeveloper
    @ChillWebDeveloper Год назад

    nice video tutorial chief! What kind of extension do you use ?

  • @saadAhmed-co1si
    @saadAhmed-co1si 9 месяцев назад

    can you make more project every month?
    i want learn about project more.

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  9 месяцев назад +1

      Eventually. I’m so busy though atm going through a backlog of videos to create

  • @J6rms
    @J6rms Год назад

    Am I the only one that can't see anything he types or clicks in the beginning of "Starting the Project" (from 9:00 for approx a minute)? :/

    • @RyanAndMattDataScience
      @RyanAndMattDataScience  Год назад

      Hey there was a small editing error but all the code is in the description through the Kaggle link

  • @mn4769
    @mn4769 9 месяцев назад

    Why on 40:57 my output of age is 0 every row

    • @DJAMOH1
      @DJAMOH1 3 месяца назад

      Ever found this?

  • @tamtam8420
    @tamtam8420 7 месяцев назад

    on 48:37 there is a shorter way of extracting Title: train_df['Title'] = train_df['Name'].str.extract(' ([A-Za-z]+)\.', expand=False)