Hey guys I hope you enjoyed the video! If you did please subscribe to the channel! Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg Want to solve Python data interview questions: stratascratch.com/?via=ryan I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com *Both Datacamp and Stratascratch are affiliate links.
Fantastic tutorial! Your step-by-step guide on data cleaning in Python Pandas was excellent. Clear explanations and practical examples made it easy to follow along. Looking forward to more of your uploads. Keep up the great work!
Mistakes always help me learn because it forces me to recall new/old knowledge. Depending on how common the mistake was (>3) I end up retaining it and auto check, rarely do I see that mistake again.
Lots of good stuff here, but I finally gave up at 31:24. If you're confused about what's happening, imagine how confused we learners are as you bounce around from cell to cell copying-pasting-deleting-trying again, trying to figure things out.
why make changing the data type so long? at 29:00... cant we just use the same method we did for changing the data types for rookie year and final year?
I completed the project but I reopped it today and all the code was still there, but when I typed df it was the old table uncleaned? how do I make sure this doesn't happen again?
what if the country name or the player is written in a synonym or nickname? currently i want to merge index data from various countries but they are written differently in the dataframes (United States, United States of America) How do i handle synonyms here to have only one written name...its for over 180 country names so its a kinda big dataset to compare it manually
Totally it was a great effort and much appreciated for your hard work. I would like to know how to remove or drop null values from the columns. Thanks in advance
Cheers man... any advice how to remove year from a columns. for instances, if a column has numeric and year values and want to remove year (2004 in format)only.@@RyanAndMattDataScience
Thanks Ryan, great tutorial. I was pleasantly surprised that you knew the name of the great WI batsman, Sir Garry Sobers. Are you from the West Indies ?
df.drop_duplicates() or if u just want a subject of columns that are being repeated use df.drop_duplicates(subset=[ ] , keep="") specify whether u wanna keep the first , last when dropping
"FS Jackson played for Cambridge University, Yorkshire and England. He spotted the talent of Ranjitsinhji when the latter, owing to his unorthodox batting and his race, was struggling to find a place for himself in the university side, and as captain was responsible for Ranji's inclusion in the Cambridge First XI and the awarding of his Blue. According to Alan Gibson this was "a much more controversial thing to do than would seem possible to us now". He was named a Wisden Cricketer of the Year in 1894. He captained England in five Test matches in 1905, winning two and drawing three to retain The Ashes. Captaining England for the first time, he won all five tosses and topped the batting and bowling averages for both sides, with 492 runs at 70.28 and 13 wickets at 15.46. These were the last of his 20 Test matches, all played at home as he could not spare the time to tour."
Hi Ryan, I'd like to understand how you would have treated a file with millions or tens of millions of lines to spot those "*" and "-" and "+"? You spoted them here manually by eye. Anyone can help me figureout that? Thanks
Great tutorial, got this issue on the data types: AttributeError Traceback (most recent call last) Cell In[11], line 1 ----> 1 df['Inns']= df["Inns"].str.split(pat = '*').str[0] File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:5902, in NDFrame.__getattr__(self, name) 5895 if ( 5896 name not in self._internal_names_set 5897 and name not in self._metadata 5898 and name not in self._accessors 5899 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5900 ): 5901 return self[name] -> 5902 return object.__getattribute__(self, name) File ~\anaconda3\Lib\site-packages\pandas\core\accessor.py:182, in CachedAccessor.__get__(self, obj, cls) 179 if obj is None: 180 # we're accessing the attribute of the class, i.e., Dataset.geo 181 return self._accessor --> 182 accessor_obj = self._accessor(obj) 183 # Replace the property with the accessor object. Inspired by: 184 # www.pydanny.com/cached-property.html 185 # We need to use object.__setattr__ because we overwrite __setattr__ on 186 # NDFrame 187 object.__setattr__(obj, self._name, accessor_obj) File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:181, in StringMethods.__init__(self, data) 178 def __init__(self, data) -> None: 179 from pandas.core.arrays.string_ import StringDtype --> 181 self._inferred_dtype = self._validate(data) 182 self._is_categorical = is_categorical_dtype(data.dtype) 183 self._is_string = isinstance(data.dtype, StringDtype) File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:235, in StringMethods._validate(data) 232 inferred_dtype = lib.infer_dtype(values, skipna=True) 234 if inferred_dtype not in allowed_types: --> 235 raise AttributeError("Can only use .str accessor with string values!") 236 return inferred_dtype AttributeError: Can only use .str accessor with string values!
The star in the Highest score column means they were not out till the end of the match. Great tutorial Ryan. will it be possible for you to attach the notebook file here
Hey guys I hope you enjoyed the video! If you did please subscribe to the channel!
Join our Data Science Discord Here: discord.com/invite/F7dxbvHUhg
If you want to watch a full course on Machine Learning check out Datacamp: datacamp.pxf.io/XYD7Qg
Want to solve Python data interview questions: stratascratch.com/?via=ryan
I'm also open to freelance data projects. Hit me up at ryannolandata@gmail.com
*Both Datacamp and Stratascratch are affiliate links.
the star is to indicate that the player got those runs without getting out. "his highest score is 269 not out' for examle
Timestamps?
Fantastic tutorial! Your step-by-step guide on data cleaning in Python Pandas was excellent. Clear explanations and practical examples made it easy to follow along. Looking forward to more of your uploads. Keep up the great work!
Thank you! I’ll have another Python video up this week as well as more coming soon!
Now that’s some cool content. This is exact what I wanted. Thanks bro🙏🏼keep helping the poor students like us! 😌
No problem
you work super hard and put out really good content. Keep it up man, I'm looking forward to watching you grow!
Thank you! Have another video ready to go later this week as well as 90% done with another Python interview question video.
it was a very good training. Thank you for making this video. I have implemented the project myself and I am even thinking about moving forward.
Mistakes always help me learn because it forces me to recall new/old knowledge. Depending on how common the mistake was (>3) I end up retaining it and auto check, rarely do I see that mistake again.
i guess watching your videos while preparing my own portofolio , i am halfway there. Thanks a lot
No problem. My first batch of classification vids are done working on regression now
this video really helped me man, i was trying to leard about panda now it poped up on my notification, thanks for the video.
No problem check out my other pandas vids I have a full playlist
Brother you doing awesome…. Upload more videos related to data analysis
I have a full playlist of 70ish vids! Working on more though
this video saves ton of my hours , thanks for sharing your knowledges.
No problem if you want to learn more check out our discord
@Ryan Nolan: Excellent Video. Very clearly explained. I'm looking forward to watching you grow!
Much appreciated!
This is a best tutorial .... 👍👍👍👍👍👍👍👍👍👍👍👍👍👍
Means a ton thank you
Lots of good stuff here, but I finally gave up at 31:24. If you're confused about what's happening, imagine how confused we learners are as you bounce around from cell to cell copying-pasting-deleting-trying again, trying to figure things out.
Bugs are part of programming and no one is perfect. I show how it’s solved and why it happens
I am trying to move away from R, and this is a great video. Thanks Ryan!
No problem best of luck
why make changing the data type so long? at 29:00... cant we just use the same method we did for changing the data types for rookie year and final year?
Pretty Amazing :) and I'd say it's some dense content to fit in 40 mins ~~I learned a lot
Awesome
@Ryan Nolan: Your videos are great indeed. It is requested to have a comprehensive series on "Data Analytics & Visualization". Thanks
I have a full data Analyst playlist check it out
@@RyanAndMattDataScience , could you please tag or locate. Thanks
@@Al-Ahdal ruclips.net/p/PLcQVY5V2UY4JrrKi2bW7DdOD08shTs4QQ
Amazing! Very good presentation
Thank you
The star on highest score means that player was NOT OUT till the end of the match. Non star players were OUT right after the score achieved.
star at the end of the score means that was the runs scored -NOT OUT and currently batting
Yup learned from a few people. I don’t see Cricket statistics often in the US
I completed the project but I reopped it today and all the code was still there, but when I typed df it was the old table uncleaned? how do I make sure this doesn't happen again?
Ill add my code to github this weekend
save the cleaned data
what if the country name or the player is written in a synonym or nickname? currently i want to merge index data from various countries but they are written differently in the dataframes (United States, United States of America) How do i handle synonyms here to have only one written name...its for over 180 country names so its a kinda big dataset to compare it manually
nice lecture bro thanks for this it is use full video for me
No problem
Great Learning we gettin here ! Everything explained precisely 👍 jus having 1 doubt : why using (axis=1), (axis=0)????
Sir when we import data from site to table I'm not getting the option of table 0 what's the solution for that at 1:54.
Thanks . Appreciate for this tutorial. Just have a question on Q5. Why is it already in a data frame? while we have to use to_frame for Q4 ? Thanks
Totally it was a great effort and much appreciated for your hard work. I would like to know how to remove or drop null values from the columns.
Thanks in advance
Look up drop na
Cheers man... any advice how to remove year from a columns. for instances, if a column has numeric and year values and want to remove year (2004 in format)only.@@RyanAndMattDataScience
Thanks Ryan, great tutorial. I was pleasantly surprised that you knew the name of the great WI batsman, Sir Garry Sobers. Are you from the West Indies ?
Nope I just collect cards and have a few of sobers
That’s great. Card collecting can be financially rewarding !
This was so much helpfull, Thanks Man
No problem
No problem
Thank you Ryan nolan
no problem
In the highest inns score, why didn't you used rstrip to remove * instead of split??
hi, what is the mean of"hundreds, fifties, ducks(0)AVG by country"?
The * in Highest_Inns_Score means the player was not out in that inning.
You're video is very helpful! One question though, how do you remove duplicates in high dimensional data, lets say with 500 duplicates? Thanks
df.drop_duplicates() or if u just want a subject of columns that are being repeated use df.drop_duplicates(subset=[ ] , keep="") specify whether u wanna keep the first , last when dropping
bro...u should have used replace method with regex for cleaning *,+ etc chars from the columns
I used regex in my latest project and have a video coming out on it soon funny enough
star means not out with highest score, you don't need to remove it
Very nice buddy, some time like very tough and sometimes like easy , maybe it happens cause of lose confidence
Just keep going man. Stuff I’ve found confusing I look back at and think it’s not that bad
Thank you!!
"FS Jackson played for Cambridge University, Yorkshire and England. He spotted the talent of Ranjitsinhji when the latter, owing to his unorthodox batting and his race, was struggling to find a place for himself in the university side, and as captain was responsible for Ranji's inclusion in the Cambridge First XI and the awarding of his Blue. According to Alan Gibson this was "a much more controversial thing to do than would seem possible to us now". He was named a Wisden Cricketer of the Year in 1894.
He captained England in five Test matches in 1905, winning two and drawing three to retain The Ashes. Captaining England for the first time, he won all five tosses and topped the batting and bowling averages for both sides, with 492 runs at 70.28 and 13 wickets at 15.46. These were the last of his 20 Test matches, all played at home as he could not spare the time to tour."
Didn’t know this is a really cool story. Like Branch Rickey in baseball
Hi Ryan,
I'd like to understand how you would have treated a file with millions or tens of millions of lines to spot those "*" and "-" and "+"?
You spoted them here manually by eye.
Anyone can help me figureout that?
Thanks
Great tutorial, got this issue on the data types: AttributeError Traceback (most recent call last)
Cell In[11], line 1
----> 1 df['Inns']= df["Inns"].str.split(pat = '*').str[0]
File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:5902, in NDFrame.__getattr__(self, name)
5895 if (
5896 name not in self._internal_names_set
5897 and name not in self._metadata
5898 and name not in self._accessors
5899 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5900 ):
5901 return self[name]
-> 5902 return object.__getattribute__(self, name)
File ~\anaconda3\Lib\site-packages\pandas\core\accessor.py:182, in CachedAccessor.__get__(self, obj, cls)
179 if obj is None:
180 # we're accessing the attribute of the class, i.e., Dataset.geo
181 return self._accessor
--> 182 accessor_obj = self._accessor(obj)
183 # Replace the property with the accessor object. Inspired by:
184 # www.pydanny.com/cached-property.html
185 # We need to use object.__setattr__ because we overwrite __setattr__ on
186 # NDFrame
187 object.__setattr__(obj, self._name, accessor_obj)
File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:181, in StringMethods.__init__(self, data)
178 def __init__(self, data) -> None:
179 from pandas.core.arrays.string_ import StringDtype
--> 181 self._inferred_dtype = self._validate(data)
182 self._is_categorical = is_categorical_dtype(data.dtype)
183 self._is_string = isinstance(data.dtype, StringDtype)
File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:235, in StringMethods._validate(data)
232 inferred_dtype = lib.infer_dtype(values, skipna=True)
234 if inferred_dtype not in allowed_types:
--> 235 raise AttributeError("Can only use .str accessor with string values!")
236 return inferred_dtype
AttributeError: Can only use .str accessor with string values!
your data may be containing integer data, thats why you are getting the error
* in HS coulmns means that the player was not out at that match
Very helpful.
No problem
why we didnt use sql after typeies are changed
good work. keep it up
Thank you! I just uploaded a new video
How can I download this data?
thank you sir..
Np
The star in the Highest score column means they were not out till the end of the match. Great tutorial Ryan. will it be possible for you to attach the notebook file here
Thank you and I can look at adding the code to Github this weekend
where i can get the notes
Can you share the notebook used in this tutorial? @RyanNolanData
I need to make a website article on this. It’ll have the code in there
How am I supposed to know all the alphabets are named as those you just did???
link
Its a dictionary right? Not a list.
#rename multiple columns in a dictionary
19:09
Star * means the batsman was not out 😊
I appreciate it. Didn’t know
@@RyanAndMattDataScience , Yes * mean batsman not out, but it won't affect any calculations. Great work indeed.
👏👏👏❤❤
Headley @4 min mark 😂😁
Haha one day I’ll buy your dup
lol on excel is better lol
Nope