Hey guys I hope you enjoyed the video! If you did please subscribe to the channel! Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg Want to solve Python data interview questions: stratascratch.com/?via=ryan I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com *Both Datacamp and Stratascratch are affiliate links.
I loved the walkthrough, honestly the last about 35 mins I had no idea what was going on but it's really cool that people like you are giving free tutorials on such complex work. Thanks!
Hope you enjoyed this video, it took so long to produce. If you enjoyed it, please subscribe to the channel. I just uploaded the 2nd part of this video where I improve the model (linked down below) Below are a few links that you should check out: Part 2: ruclips.net/video/KzK1pifa2Vk/видео.html&ab_channel=RyanNolanData Kaggle Code: www.kaggle.com/code/ryannolan1/titanic-wip-9-12 Twitter: twitter.com/RyanNolanData LinkedIn: www.linkedin.com/in/ryan-p-nolan/ SciKit-Learn Tutorials: ruclips.net/video/SjOfbbfI2qY/видео.html&ab_channel=RyanNolanData Practice SQL & Python Interview Questions: stratascratch.com/?via=ryan
Thank you! I learned machine learning algorithms and made my first kaggle project with you! I am very grateful for that and i will watch your other videos. Thanks for great content.
Why do we write X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, stratify = y, random_state=21) and then never use X_valid and y_valid again? Isn't that a useless waste of data for the training?
I checked it out and those without ages were MORE likely to survive. This is counter intuitive, as I thought that poorer people's ages may not have been entered, and thus would appear more in those who did not survive. I then wondered if perhaps those without ages were infants or babies, and perhaps as children were first on boats, they survived at higher rates. I will spend some time checking this data out to try and find an answer. But it also made me realise having good social and political understand of the world will help a data scientist and machine learning practioner as these understandings may enhance the ability to explain odd findings.
Your head is over the code that you are generating. It would help substantially if you could move the part with your camera to a different part of the screen (perhaps upper left hand corner?).
If they had not called it "un-sinkable" I doubt it would have got half the attention that it did? I thought that its maiden voyage was Liverpool to New York and it sank en route? But the Dataset has other locations. Also didn't many of the survivors commit suicide not long afterwards?
What's the point in splitting the dataset into train and validation if then at the end you are using only the training to do the grid search with cross validation? doesn't the grid search directly create the validation set on the training set you give it?
The length of the name is dominated by married ladies who have their married name AND their maiden names in brackets. Here is the top few: 'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)', 'Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")', 'Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")', 'Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)', 'Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)' So I think when noting that survival is related to name length you are actually picking up that name length is a predictor of being female who of course have a higher chance of surviving. Analysing this dataset is addictive - I must give it up!
Hi Ryan, I am new to data science. I am a bit lost on what the point of analyzing the ticket number and passenger name. What is the goal of doing that? Same with qcuts, are we doing them to help with a decision tree model? Do we need to do any of this if we just build a regression model?
hello! thank you for your video, i am trying to follow you and repeat all the steps. i have found a better way to assign labels for age groups : df['Age_Lebel'] = pd.qcut( df['Age'], 8, labels = np.arange(8) + 1 ) hope it can be helpful!
I do not think you meant to make the young French girls ‘noble’ which you did. I am just starting to learn pandas with your help but some of the complicated string editing you did would have been so much simpler and more understandable if done in an old fashioned ‘for loop’. I know it is frowned upon by ‘experts’ but the whole point of Python is that the code is readable.
Hey guys I hope you enjoyed the video! If you did please subscribe to the channel!
Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg
Want to solve Python data interview questions: stratascratch.com/?via=ryan
I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
*Both Datacamp and Stratascratch are affiliate links.
this needs more views. was so in depth and perfect for a beginner!
I loved the walkthrough, honestly the last about 35 mins I had no idea what was going on but it's really cool that people like you are giving free tutorials on such complex work. Thanks!
No problem. Everything I go over in this vid is covered in my ML and Python playlists. Check them out!
Hope you enjoyed this video, it took so long to produce. If you enjoyed it, please subscribe to the channel.
I just uploaded the 2nd part of this video where I improve the model (linked down below)
Below are a few links that you should check out:
Part 2: ruclips.net/video/KzK1pifa2Vk/видео.html&ab_channel=RyanNolanData
Kaggle Code: www.kaggle.com/code/ryannolan1/titanic-wip-9-12
Twitter: twitter.com/RyanNolanData
LinkedIn: www.linkedin.com/in/ryan-p-nolan/
SciKit-Learn Tutorials: ruclips.net/video/SjOfbbfI2qY/видео.html&ab_channel=RyanNolanData
Practice SQL & Python Interview Questions: stratascratch.com/?via=ryan
Thank you! I learned machine learning algorithms and made my first kaggle project with you! I am very grateful for that and i will watch your other videos. Thanks for great content.
Hey congrats man huge step forward! I appreciate you checking them out. Also join our discord
@@RyanAndMattDataScience link is expired for this video. I will check other videos but maybe you should check other videos too.
@@RyanAndMattDataScience also it can be about my country bec discord banned :D
Thanks!
Thanks Greg! Really appreciate it
It was a super useful video and I am happy to have done my first Data Science project. Thank you very much.
Congrats on completing your first project
Appreciate it man, graduated last April and this series has been a lifesaver to refresh my python skills before I begin working.
No problem, join our discord also! Plan on expanding out content to our website in 2025 for more details on vids and such
Thank you for this, your videos are so helpful. Keep it up!
Np new one tomorrow. New project this week
Why do we write
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, stratify = y, random_state=21)
and then never use X_valid and y_valid again? Isn't that a useless waste of data for the training?
they are used for testing the model's performance
yes man, much appriciated for your efforts
Thanks for working on this project! My housing one will be out in November
I checked it out and those without ages were MORE likely to survive.
This is counter intuitive, as I thought that poorer people's ages may not have been entered, and thus would appear more in those who did not survive.
I then wondered if perhaps those without ages were infants or babies, and perhaps as children were first on boats, they survived at higher rates.
I will spend some time checking this data out to try and find an answer.
But it also made me realise having good social and political understand of the world will help a data scientist and machine learning practioner as these understandings may enhance the ability to explain odd findings.
This is quite interesting
at least you explain in detail what you are typing for after copy your line of code. Nice video btw
Thank you!
Excellent video. So much in it, thought process, code tips etc.
Thank you
So helpful !! Ideal demonstration for my first projects, going forwards
Is is always better to cut age group into more smaller groups
?
Hi, great video. One idea - instead of writing out so many loc statements, it might be easier to just use labels=False when using qcut.
Thank you! And I’ll look into it for the next time I use it
Very cool video! Would love to see some of this type of content.
Have a lot of other vids to check out!
I got 78% result using forest. Thanks for the brilliant explanation!
No problem, awesome job
Your head is over the code that you are generating. It would help substantially if you could move the part with your camera to a different part of the screen (perhaps upper left hand corner?).
Hey, I should have the notebook for this vid available on my Kaggle account. Also going to make this into an article
Excellent .
Thank you so much for the video!!
It really helped a lot, thank you, keep going
Thank you
Great work brother! I have subscribed you and waiting for next Kaggle projects also
Thanks and I just uploaded one this week!
@@RyanAndMattDataScience Also at the final steps, I've faced to some errors. So, is there any way to contact you please..
Hi @Ryan Thanks for making this amazing video. I just want to understand why did use "Plus one for yourself" @25:05? Thank you!
keep up your good work
Thank you! Working on more videos this weekend!
1:17:00 i don't understand the usage of .transform('count'). Can someone explain with an example?
Thanks... I really enjoyed and you explain so well.... Bless you.
Thank you
Thank you so much. Keep it up:)
No problem
Can you show us the part where you get the input? I don't know if I'm doing it right
Shoot me a question on discord with the link to your notebook I’ll answer it on Januarys Q/A
@Ryan Nolan Data: Excellent vdo.
Much appreciated
you explained it in fantastic way just one request
will you please provide the valid link for notebook actually its not working
Hey, just checked the link it's working?
www.kaggle.com/code/ryannolan1/titanic-wip-9-12
Awesomely awesome...i had to sub
Np
Amazing content!
Thanks man, you helped a ton with this vid
Just a suggestion your next video should be on using chatgbt for this project
If they had not called it "un-sinkable" I doubt it would have got half the attention that it did? I thought that its maiden voyage was Liverpool to New York and it sank en route? But the Dataset has other locations. Also didn't many of the survivors commit suicide not long afterwards?
Not sure? It did have a sister ship also. I do collect titanic cards though. Have some from 1911 and a bit later
What's the point in splitting the dataset into train and validation if then at the end you are using only the training to do the grid search with cross validation? doesn't the grid search directly create the validation set on the training set you give it?
great tutorial, can't wait to check out part 2!
question on correlation map; how did you use it to narrow down your parameters/features?
Part 2 is out! And I did this project a long time ago will try to take a look at the code and see the reasoning
@@RyanAndMattDataScience
Thanks again! I will view part 2 today. Also, definitely let me know about the correlation map and how it was used!
quick question: does Kaggle give you a rating based on speed/efficiency? I'm wondering specifically about just importing the whole libraries.
This was really helpful 🥳🥳🥳🥳
Thanks
The length of the name is dominated by married ladies who have their married name AND their maiden names in brackets. Here is the top few:
'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)',
'Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")',
'Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")',
'Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)',
'Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)'
So I think when noting that survival is related to name length you are actually picking up that name length is a predictor of being female who of course have a
higher chance of surviving.
Analysing this dataset is addictive - I must give it up!
theres so much to take a look at
Hi Ryan, I am new to data science. I am a bit lost on what the point of analyzing the ticket number and passenger name. What is the goal of doing that? Same with qcuts, are we doing them to help with a decision tree model? Do we need to do any of this if we just build a regression model?
As richer people tend to have longer names, it is worth seeing if there are any correlations or patterns in the data.
I don't understand pipeline.:( how can I do for that?
we have a video on them
One question to the community.
Which one of the two is better:
1. df.describe()
2. print(df.describe())
hello! thank you for your video, i am trying to follow you and repeat all the steps. i have found a better way to assign labels for age groups :
df['Age_Lebel'] = pd.qcut( df['Age'], 8, labels = np.arange(8) + 1 )
hope it can be helpful!
Awesome! I may try to revisit this in the future.
I do not think you meant to make the young French girls ‘noble’ which you did. I am just starting to learn pandas with your help but some of the complicated string editing you did would have been so much simpler and more understandable if done in an old fashioned ‘for loop’. I know it is frowned upon by ‘experts’ but the whole point of Python is that the code is readable.
Probably a small mistake. Curious if it did better with not marking them as noble
can we add this project to our resume ?
Sure
nice video tutorial chief! What kind of extension do you use ?
Wdym extension?
for example as you are typing, pop up auto correct or something like that@@RyanAndMattDataScience
can you make more project every month?
i want learn about project more.
Eventually. I’m so busy though atm going through a backlog of videos to create
Am I the only one that can't see anything he types or clicks in the beginning of "Starting the Project" (from 9:00 for approx a minute)? :/
Hey there was a small editing error but all the code is in the description through the Kaggle link
Why on 40:57 my output of age is 0 every row
Ever found this?
on 48:37 there is a shorter way of extracting Title: train_df['Title'] = train_df['Name'].str.extract(' ([A-Za-z]+)\.', expand=False)