Python Pandas Tutorial (Part 8): Grouping and Aggregating - Analyzing and Exploring Your Data

Corey Schafer

Просмотров 439 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 дек 2024

Комментарии • 731

@coreyms 4 года назад ⁺²⁰⁴
I hope everyone had a great week! We've got a long video this week, but we go over a lot of important topics about how to analyze data in Pandas. We will learn how to answer very interesting questions such as "What is the most popular social media site by country?". I put timestamps together for this video so that you all can skip around if you need to go back and watch a specific section. Here are those timestamps:
Aggregate Column - 2:00
Aggregate DataFrame - 3:55
Value Counts - 7:51
Grouping - 12:30
Multiple Aggregates on Group - 26:00
People Who Know Python By Country - 27:20
Practice Question - 34:20
Concat Series - 37:27
Have a great weekend everybody!
@calebmbugua745 4 года назад ⁺⁶
Thanks so much bro,,,,much love from kenya
@anonymous-kl1un 4 года назад
Hey, is this series gonna continue?
@anonymous-kl1un 4 года назад ⁺¹
Can you explain all the types of joins
@anonymous-kl1un 4 года назад ⁺¹
And if possible please explain multi-level Indexing as well
@JoshuaDHarvey 4 года назад
Corey, is it safe to assume if your coming from a SQL background, that you can effectively use things like the 'pd.concat()' to replace the various joins (left, right, inner etc) workflows in SQL and just use SQLAlchemy or pyodbc libs to load the data and then do all the calculations with python that you would normally do in whatever SQL dialect?
@anubhavtomar1384 4 года назад ⁺⁵⁵⁰
3:10 median function
5:00 describe function
7:20 count()
8:05 value_counts()
12:51 grouping the data
14:39 groupby() function
16:07 get_group(), grabbing a specific group by name
17:30 doing same by using the filters
18:40 using value_counts on filters
20:20 value_counts() for groups
21:49 using loc to find for one country
23:40 percentage by using normalize
25:00 median by country group
26:13 agg function for multiple functions
27:30 using filtering to get python users by country
30:20 error on using same approach for groups
31:40 apply method to run that on group
35:40 finding the percentage of people using python in each country(group)
37:40 using concat for combining series in a dataframe
45:30 adding percentage column
@afdqwfqwqwdfqwdawdas 4 года назад ⁺²
thx, this is very useful. The videos already are very concise and to the point, but if I am just looking for how to do a proper groupby quickly on my own dataset....
@umutdemir2762 4 года назад
thanks a lot.
@ravishekharprakash4172 4 года назад
@@afdqwfqwqwdfqwdawdas sure
@80expertube 4 года назад ⁺³
FYI, the percentage problem can be solved alternatively as follows: country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count())
@sayarmandal1885 4 года назад
@@80expertube It throws a RuntimeWarning
@parthrawri3001 4 года назад ⁺²⁹⁹
I love the fact that there are no ads interrupting in the middle. So thoughtful. ❤️
@coreyms 4 года назад ⁺²⁰⁴
Yeah, I didn’t want the to ruin the flow of the videos. Glad you noticed :)
@parthrawri3001 4 года назад ⁺¹⁸
Corey Schafer OMG! Your reply just made my day!
@livingwithlinlin3122 4 года назад ⁺¹⁰
@@coreyms Thank you so much for doing this. You are such a considerable person with a big heart.
@JoshKonoff1 3 года назад ⁺⁵
Corey, do you have a Patreon page? Thank you for your exceptional videos; a huge help for me and so many people!
@kylebeckhorn885 4 года назад ⁺²⁰⁷
Yes please, do a video on the topic of MULTIPLE INDEXING!!
@j3553hh 6 месяцев назад ⁺¹
I would pay to see Corey's tutorial on this. Every time I encounter a multi-index, I'm on stack overflow. It just doesn't seem to stick.
@pewolo 3 года назад ⁺⁴²
Let's all admit that this dude is a hard working man and his work is just a wow!
I've been following him for quite some time now and I am always impressed by how thoughtful, tactical and clear his explanation is in every tutorial he makes.
Hat off to you, dude!
@zhenpan2048 Год назад ⁺¹⁰
numeric_columns = df.select_dtypes(include="number")
medians=numeric_columns.median()
print(medians)
# this is a way of getting the medians of numerical values as I use df.median(), it gave me value error that says could not convert string to float"I am not a student who is learning to code" thanks for great work. I learn more from you than from my professors. Thank you so much for great efforts!😎
@heretolearndshare Год назад
You saved my learning session, thanks!
@giovannimantovani795 10 месяцев назад
Thank you bro
@salehabdullahi9356 10 месяцев назад
Thank you, you save me alot of time,
@mn4769 8 месяцев назад
I found that you can shorten it by writing numeric_columns.median()
@sick-ol3jd 7 месяцев назад
Thanks man
@jongyoonsohn8559 4 года назад ⁺⁸⁹
I'd like to share my solution to the practice question.
ctr_knows_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python', na=False).value_counts(normalize=True))
ctr_knows_python.rename({False:'Don\'t know', True:'I know'}, inplace=True)
ctr_knows_python
Hope this helps too!
@coreyms 4 года назад ⁺¹³
Nice!
@moushumitamanna 4 года назад ⁺¹
Hi, can you please explain what "na=False" means here and why do we have to put this in the code? Thanks in advance
@tplano3794 4 года назад ⁺¹
@@moushumitamanna not applicable
@moushumitamanna 4 года назад
@@tplano3794 thanks. But why should we put na=false in this code
@tplano3794 4 года назад ⁺³
@@moushumitamanna in a column which is expected to have numbers, na does not make sense so we filter out these values. also if you run any functions (mean, median) then you may run into syntax errors
@felipegomez3047 3 года назад ⁺⁸²
I'd like to share my solution to the practice question:
country_grp['LanguageWorkedWith'].apply( lambda x: x.str.contains('Python').sum() / len(x) * 100 )
As you can see it's just as symple as adding " / len(x) * 100 " in the lambda function, where len(x) is the total number of users for each country.
@ironpolux 3 года назад
como se te ocurrio esto? ahem I mean, How did u come up with this? well played
@BCS_FahadAhmad 3 года назад ⁺⁵
I guess x.count() in place len(x) makes more sense, since there can be people who did not answer language(I highly doubt XD)
@nicocilia5871 3 года назад
@@BCS_FahadAhmad I think x.count() will not count NaN so I think len is better if you want to include people that skipped that question. I am assuming that was an option.
@gurjotsingh8631 2 года назад ⁺³
I was thinking the same, so i downloaded his repository and tried it and it works. Came here to comment and saw your comment. so , i just wasted 5-10 minutes of my day. whatever.hallelujah.
@kingler199 2 года назад
Damn well played
@valerioharvey7289 3 месяца назад
other gurus are just like "here's the code for this, copy it and don't ask why" but you are the only one who shows how things work. Thank you very much
@prakhararora8981 Год назад ⁺²⁴
hey if ur df.median() doesn't work and ur getting typeerror and valueerror u can do df.median(numeric_only=True)
@DilpreetSingh02 8 месяцев назад
Thanks man
@atienograce2520 7 месяцев назад
Thanks a bunch!
@anre3821 5 месяцев назад
was looking for an advice on this, thanks a lot!
@greentree9751 3 месяца назад
thanks a lot
@amir_forooghi 4 года назад ⁺⁷
YESSSS !!! Corey`s video for groupby. I press like before I watch it. Groupby is just a superpower. Thank you for this awesome series Corey. You are the best.
@elnazdehkharghani6121 4 года назад ⁺⁴
You make all your subscribers happy with just uploading your videos !!! Thanks, Corey
@coreyms 4 года назад ⁺⁴
Thank you all for watching!
@diegoalarcon6062 4 года назад ⁺⁹
I don't care if some of your videos are long, in other channels they're just redundant but that's not your case! If you start doing short videos we may be losing all that valuable information that you provide to us. So far, this is the best Python channel I've seen. Greetings from Medellín, Colombia.
@jorgetiz99 4 года назад ⁺⁵
This has to be one of the best videos on youtube about Pandas, thank you so much. Greetings from Perú.
@milrione8425 4 года назад ⁺³
I love how you are just using the same data throughout the whole series. Thank you so much, Corey!
@jiangxu3895 4 года назад ⁺⁶
I just discover that your way of teaching is to tell not only how to do it but why this is how to do it. thumb up!!
@YeekyYeeky 4 года назад ⁺⁵
one of the best thing that happened to me when I woke up (I am on the opposite side of the world to Corey Schafer) is finding that Corey just upload another Pandas tutorial video , thank you !
@Schmidt3k 4 года назад ⁺³¹
For your practise question, use .mean() instead of .sum()
.mean() on a Series of bool will give you the fractions in a quick and easy way. Multiply by 100 for %.
edit: As per discussion below, .mean() ignores NA values whereas Corey's approach treats NA as '0'. An alternative is thus:
mygroups['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').fillna(0).mean())
Now, the results should be equal to Corey's.
@davidsp7949 4 года назад
It looks like a nice solution but numbers from Corey's video are slightly different than those with .mean() and I do not know why.
For example:
for Afghanistan PctKnowsPython 18.181818, with mean is 20.512821
for Albania PctKnowsPython 26.744186, with mean 27.710843
Does anyone know why?
@sunramaroc 4 года назад ⁺²
@@davidsp7949 yeah i have the same doubt,,i guess that s due to the fact that mean() take in consideration only the respondents who effectively answered the question,,and sum() take all respondents even the ones with NaN for the question.so corey solution is Pct over all respondents,, and the mean() is over only the ones who answered this Q.
@jasleung2932 4 года назад ⁺²
.mean() neglects those "NAN" responses while if u use x.str.contains('Python').sum()/x.size instead, it would count those "NaN" as "no pythoner" which is what Corey was doing
@fjramons 2 года назад ⁺⁴
Well played. For me your solution is quite elegant.
BTW, in case you wanted to treat NA as zeros (to get the same results from the video), you can simply use .mean() with its 'skipna' option disabled. This would make:
mygroups['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').mean(skipna = False)
@mahanansari6152 2 года назад ⁺³
I just changed sum to value_count(normalize=True) and it worked
@Davidkiania 4 года назад ⁺¹⁹
Best video in the series loving them and normally can’t wait for the next.
@antonyjohne 4 года назад ⁺³⁸
Hey Corey! Thanks a million for the Pandas Series. As always, very intuitive and easy to follow.
Now that you've taught Matplotlib and Pandas, would love to see a new Numpy series in order to complete the Data Science trinity. Please consider adding a Numpy Series.
@panpan4433 3 года назад ⁺¹
For your exercise (What % knows Python) , I divided the sum in the lambda function by x.count() then multiplied by 100 :
country_group['LanguageWorkedWith'].apply(lambda x: 100 * x.str.contains('Python', ).sum() / x.count())
Thanks for the free content, awesome
@priyavratchaudhary9211 10 месяцев назад ⁺¹
use len(x) instead of x.count() because count() function exclude respondents who does not know any language.
@merajajam425 4 года назад ⁺⁸⁹
The level of my programming in Python has been substantially improved since I have started watching your great videos. Many thanks, Corey. Would you please prepare some videos regarding the networkx module as well?
@fvdvhome Год назад
Mr. Schafer, I am so happy I found your teaching. I have been on a journey to become a data analyst, and after completing the Google Analytics Course , I realized that I needed to learn much more. I am currently finishing a Python Course through Coursera offered by IBM.
Not every professional, no matter how good they are, have the natural ability to teach. Your method and technique are so amazing and helped me to overcome some of the confusions I had with coding in Python. I learned so much from just this video alone.
I will definitely visit the site you referenced, and look forward to learning more from your videos.
Thank you so much!
@gregoryogunna9527 Год назад
American?
@walternyc 3 года назад
Working on a project evaluating an employee survey and this is just what the doctor ordered. Thanks! One of the best channels in RUclips for data analysis hands down
@codewithluq 4 года назад ⁺²
Corey Again. Very fantastic tutor. I press the like button before I watch.
@shikharsaxena9989 4 года назад ⁺³
after this lecture i started loving the complex coding of pandas and matplotlib. really you are an amazing teacher
@saiakhil4751 3 года назад
I signed up for brilliant org just for Corey Schafer. Thanks for sponsoring him.
@WrongSmth 4 года назад
Hey, Corey. I'm a network engineer and I'm learning pandas to be able to do some packet analysis and your videos really help me a bunch!
This is my solution for the coding problem from the video. Hope it helps!
know_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
total_respondents = country_grp['LanguageWorkedWith'].apply(len)
know_python / total_respondents
@vagelisilias 3 года назад
I am a GIS student and I want to thank you because I'm doing my last assignment for university and I'm using Geopandas, matplotlib, pandas, cartopy and forth on and you helped so much with your videos, I have build a nice map and I have produced different tables with my data. Thanks god you are out there and sharing your knowledge free
@brewtalxxx 2 года назад ⁺¹
Thank you so much for this video. I learnt way more from this than the many hours I spent sitting in class listening to a teacher who just wanted to end the lesson early or have long lunch breaks. This is really precious. And thanks for the reassurance that if I find this difficult, there's nothing wrong with me LOL.
@Prasanna_Rahavendra 4 года назад ⁺⁵
Hey Corey! For the question you gave: The percentage of people by country who use python. There is an efficient solution too (Without creating a separate dataframe).
country_grp["LanguageWorkedWith"].apply(lambda x:(x.str.contains("Python").sum()/len(x))*100)
Actually what I am doing here is, in the lambda function, at the return part, I divide the No. of python users by the length of the given series and then multiply it by 100. This gives the percent of python users in each country. This approach might be a bit code efficient but can be a bit confusing for some.
@maheshmmmec 4 года назад ⁺²
len(x) might not give u total respondent since it is series on LanguageWorkedWith and people might have skipped.
@gonzalezgenaro 4 года назад ⁺¹
@@maheshmmmec Correct
@stevensukenik254 3 года назад ⁺¹
You can use len(x) in the lambda, it will include the na in the series.
You cannot use count(x) because it skips na. But you can use value_counts(x).
If you run the following code, it will verify that Prasanna solution is corret:
len_total = country_grp["LanguageWorkedWith"].apply(lambda x: len(x)).sort_values(ascending=False)
us_no_answer = country_grp["LanguageWorkedWith"].apply(lambda x: x.isna().sum())
us_answer = country_grp["LanguageWorkedWith"].apply(lambda x: x.notna().sum())
df_counts = df['Country'].value_counts()
df_counts = pd.concat([df_counts,len_total,us_answer,us_no_answer],axis='columns')
df_counts.columns=['df_value_counts','lambda_len','user_respond','user_did_not_respond']
df_counts
produces:
df_value_counts lambda_len user_respond user_did_not_respond
United States 20949 20949 20769 180
India 9061 9061 8844 217
@jeremine9259 Год назад
Just want to share here my solution for the practice question (but with the survey of 2022):
---
country_group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
---
And also give thanks to your wonderful videos, Corey!
It's been 3 years and they're still among one of the best tutorials.
@keepcontinue Год назад ⁺¹
I thought the same thing
@WillMoody-crmstorm 3 месяца назад
Perfect timing. Thank you for all the effort put into these videos. I've done that last jibe crash so many times, with the feet just off centre
@LibardoLambrano 4 года назад ⁺⁴
Thanks Corey for sharing these videos. Pretty clear explanations. You are a great teacher.
@Blueshockful 4 года назад ⁺³
Im browsing thru some of the videos to brush up on Python, and this is the first python video that didnt get me bored. Concise and brillliant. Love your videos! keep up the good work :)
@mohammadghouse25 4 года назад
Best Pandas playlist in youtube. One point solution for python learners
@sayantanchakraborty75 4 года назад ⁺¹
Best videos on pandas on RUclips by Corey Sir. Loving them and normally wait for the next videos. Lots of love for you from India.
@yuewang9623 4 года назад ⁺³
Every time I saw a new post, I click the 'like' button before watching:D
@tolex3 4 года назад ⁺²
I've been doing data analysis using Python & Pandas for a few years now. Still, I'm picking up new things from your videos. Very clearly explained! Thank you!
@chukwuemekamusic663 22 дня назад
Thanks!
@coreyms 22 дня назад
Thank you!
@qigangdeng8636 4 года назад
Hey, Corey, I see many people gave their own answer which are wonderful. So want to give my one here, which looks like more a beginner answer:
#create a new column called: 'Python use'
df['Python_use'] = (
df['LanguageWorkedWith'].str.contains('Python')
)
#.value_counts() the 'Python use' column as it is a boolean type:
country_grp = df.groupby(['Country'])
Python = country_grp['Python_use'].value_counts(normalize=True)
Thanks for your great Lectures! I watched from part 1. I am your big fan now.
@markkennedy9767 4 месяца назад
The groupby explanation comparing it to a filter is really good.
@deniscampana8345 4 года назад
Thanks so much Corey ! It's clearly impossible not to understand what you explain on all your videos : It's fluid, straightforward, crystal clear ! And more over your english : Whaoooo ... Congratulations !! I wonder if I've learned more Pandas or english !!
200% great !!
@bhargav1811 3 года назад
Every second of your python video are really worth it!!!
@nicholaspolino2657 4 года назад ⁺⁴
LOL @ "If I did this correctly, and it's definitely possible I made a mistake." Happy I found these videos, thanks.
@vijaybhatt4347 3 года назад
You're Genius
Love From India
You mentioned India in your tutorial
Love this gesture 🇺🇸🇮🇳
@omj7113 Год назад
Thanks a lot for your teaching! Here is the my solution at the end of the video:
# group object['column'] is a Series object, so the input of the function is a Series, ana the output value of the function is a float
def percent_know_python_each_country(countrySeries):
num_know_python = countrySeries.str.contains('Python').sum()
num_all = len(countrySeries)
percent = round((num_know_python / num_all * 100), 2)
return percent
country_group['LanguageWorkedWith'].apply(percent_know_python_each_country).sort_values(ascending=False).head(30)
@finncollins5696 Год назад
thanks so much for this series. started from the first video two weeks ago, now in the 8th. this series so far made a lot progress in me,. thanks so much, .May God Bless You. Love from Sri Lanka...
@ahmedhawater7522 4 года назад ⁺³
Man you are one of the best teachers who ever learned me something, much love and support ♥️
@stressfreetrading1341 4 года назад ⁺¹¹
NAMASTE!! Corey Schafer.. Love From India
@ashkanfarahani6532 4 года назад ⁺²⁶
Hi Corey. I think this might be a relevant simpler approach for getting percentage. I used value_counts(normalize=True) instead of sum.
df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
This of course return both percentage who know Python and Who don't know. So if we want to get for a specific country, for instance Japan, then:
df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True)).loc['Japan'][1]
@RegularDude95 Год назад ⁺¹
I have a similar approach to,i am happy to see that i am not the only one who always sees the easiest ways =)))
@JahidHasan-Aneek Год назад
instead of using : value_counts(), use sum() in second line. Then you'll get appropriate answer.
df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
@abdulkadirguven1173 Год назад ⁺¹
Great approach. Thanks for sharing
@thotarohith2060 Год назад
Here is my approach : filt=df['LanguageWorkedWith'].str.contains('Python',na=False)
python_count=df.loc[filt]['Country'].value_counts()
python_count.rename('p_c',inplace=True)
python_count
--
total_count=country_grp['Country'].value_counts()
total_count.rename('t_c',inplace=True)
total_count
--
result_horizontal = pd.concat([total_count, python_count], axis=1)
import numpy as np
result_horizontal.replace({'p_c':np.nan},0,inplace=True)
result_horizontal['perc']=(result_horizontal['p_c']/result_horizontal['t_c'])*100
result_horizontal
@yosephkurabachew6539 11 месяцев назад ⁺¹
perfect content. one flaw is that , you never explained what a "lamda" function is and went straight to using the function in your previous videos. you did the same thing here. now i have to first study lamda.
@lvcas9313 11 месяцев назад
Yeah I had to take a look into it. For anyone curious, lambda and "def return" is the same thing, but lambda are throway functions while "def return" assign a name to be applid further in the code. The "def return" syntax is easier to read and more clean than lambda
@sayarmandal1885 4 года назад
Thanks, Corey. This is one of the most comprehensive pandas tutorials on RUclips. Love from India.
I also noticed a subtle issue. We are adding the number of respondents who filled their Country and not who filled LanguageWorkedWith. Someone can fill Country and not LanguageWorkedWith.
@MatthewFoulk 4 года назад ⁺²
Really appreciate the addition of practice problems. It helps me to grasp the material
@parsahosseini4241 4 года назад
47 minutes of a pure pandas tutorial from a god in python, man you're a hero🔥🔥
@fiefiego2298 2 года назад ⁺¹
thank you Corey!! this is a wonderful pandas series!! you make the concept so easy that even a python beginner (that is me) without programming background in colleague can understand!!
i'd like to share my solution as well: (since i don't know concat method, i calculate the answer first then convert them into a dataframe by dictionary)
know_py = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())
answer = df['Country'].value_counts()
per_cent = know_py/ answer
result = pd.DataFrame({'answer': answer, 'Python': know_py, 'percentage': per_cent})
i find you stop uploading new tutorials for a long time, hope everything goes well with you. and strongly looking forward to hearing from you soooooon!!
thank you & greeting at 2022 sep 7th :)
@alejandropereyra438 3 года назад
This video is so useful , the simplification that python does for the problems is so helping. is the best language in the programming of code. And the proffesor of this video is really a genius. !!! thanks.
@rukhan8900 3 года назад ⁺¹
The reassurance at the end was so appreciated as a beginner. Thank you for your help !!!
@bobchannell3553 4 года назад ⁺¹
Thanks for doing this video in a detailed way, like you always do. Just under an hour is a good length for a video like this. Thanks!
@RavenEX1980 3 года назад
best tutorials ever, i have read lot of books, but your technique is global and works best...keep on the good word @Corey Schafer
@dennisp5302 3 года назад
I just went through Part 8 a second time. Thanks a bunch!! I learned a lot.
@Benny-g8m День назад
'dropna' parameter of value_counts, is another parameter i explore while digging the difference between my approach and corey's, on top of the 'na' parameter of contains.
btw, thx a lot sir corey, for all these helpful videos!
@mggarekar 4 года назад ⁺²
nice video :) i liked the q/a approach at the end where you left it open.
@joncochran9647 3 года назад
I've watched quite a variety of different data analysis tutorials and this one was easily one of the most engaging for me. Having interesting data really helps.
@aborucu 3 года назад ⁺¹
Perfect explanation. Making a convoluted yet so important concept crystal clear through step by step explaning and also giving connections to pandas object types. Cheers!
@Boat-xs8lm 7 месяцев назад
Thank you,I am very lucky that I found your tutorials.
@manish_chandra 2 года назад
One of the best and most easily understandable vid on Pandas. Thank you for creating this !!
@anubhavrauniyar3192 2 года назад
We Love You Corey Schafer!!!!!!!!!! Lots of Love From India❤
@Imrannaseem818 4 года назад ⁺¹
Thanks Corey.
I have waited for this video whole week.
Great explaination
@bobchannell3553 4 года назад
This was a lot to learn in one video. That's why I went back and watched it again this week. At the end, I added something I think would be useful in what I do. I added a filter to select records where the number of respondents is >= 5.
filt = python_df['NumRespondents'] >= 5
python_df.loc[filt]
@yuanchima 4 года назад ⁺⁴
My solution to the practice question:
def pctKnowsPython(x):
try:
return x.str.contains('Python', na=False).value_counts(normalize=True).loc[True]
except KeyError:
return 0
pctKnowsPython = country_grp['LanguageWorkedWith'].apply(pctKnowsPython)
@anantharjun9662 Год назад
Heyyy coreyyy I got drops in my eyes after watching the way you taught....you made my day❣️✨love you so much corey
@AugustoGeografo 4 года назад ⁺³
Always looking forward for your videos, Corey.
@way_to_be_analyst6042 3 года назад
im just diving into pandas and would like to say - GREAT THANK YOU for such nice and detailed explanation.
great job!
@rauberhozenplotz7009 4 года назад ⁺¹
Helps me to get into my PhD. Thanks a lot for uploading this!
@federicohan1458 4 года назад
I found amusing explaining what a percentage% is after going over apply & lambda methods, but that's exactly the thoroughness that makes your videos so loved :)
@jiangxu3895 4 года назад ⁺²
Very interesting tutorial. Thumb up! Particularly at 33:30 !!
@denizcicek7333 3 года назад
You are just wonderful, it makes so much fun watching your tutorials. I finde directly the answer, those I need.
God bless you brother.
@ankitranjan7670 4 года назад ⁺⁴
We can do this in the lambda function for getting the percentage :
lambda x : x.str.contains ('Python').value_counts(normalize=True)
And I would love to thankyou for all these videos!
@akashshelar4388 3 года назад
"a = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))"
this what i did is it correct or not i dont know??
@Yasharvl 4 года назад ⁺¹
Thanks Corey! This is pure gold!
@Tigrex281 4 года назад
Hey Corey,
first of all thank you very much for all those fantastic videos. I also have tried to answer the question of percentage knowing python for each country. I came up with following solution:
PrctKnowPython = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python', na=False).value_counts(normalize=True)).loc[:,True]
One advantage of this approach is, that you can just remove na=False and ignore NaN values in your data.
@СергейФролов-ъ5я 3 года назад
Corey, thank you very much for your free videos!
@pcordeirobr 3 года назад
Corey, the content of your videos are amazing. This tutorial in special is sensational.
@srich-k 4 года назад ⁺³
A much simpler way of finding the percentage of people from each country who know python is by using the mean() function
Something like this :
country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').mean())
P.S: Corey's method produces a much detail-oriented version of this result
@buzz.b 2 года назад
Thank you for the last example (percent that knows python). It was great to see how the different methods learnt can come together in a practical example; this really helped consolidate the knowledge gained.
@ironpolux 3 года назад
Really enjoying this series, thank u Corey!
@Am-hsb 4 года назад ⁺¹
Thanks a lot Corey! Got to learn complex syntax in simple ways. You are amazing teacher.
@turksonmichael1236 Год назад
Thank you for this. Had clearer understanding of pandas than before. Wish you the very best
@gwanghyeongim768 4 года назад
Hi. Again, thank you for all the good works you've done. Please keep it up. As for making ratio of python knowers by respondents of countries, I personally make the variable first, PctKnowsPython, and then rename it.
country_grp = df.groupby(['Country'])
numpy_country = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()).sort_values(ascending=False) -> number of people who know python by country
country_resp = df['Country'].value_counts().sort_values(ascending=False) -> number of respondents by country
pynum_by_con = (numpy_country / country_resp) * 100 -> my version of PctKnowPython
py_df = pd.concat([country_resp, numpy_country, pynum_by_con], axis='columns') -> showing the dataframe which has three columns. The last column name is 0.
py_df.rename(columns={'Country': 'NumRespondents' , 'LanguageWorkedWith': 'NumPythonKnower', 0: 'PctKnowsPython'}) -> showing the renamed dataframe.
py_df.rename(columns={'Country': 'NumRespondents' , 'LanguageWorkedWith': 'NumPythonKnower', 0: 'PctKnowsPython'}, inplace=True) -> applying the change.
Hope this may be helpful.
@AAND8805 3 года назад
I am following your pandas series since the last 3 days and may complete in 1 or 2 days max, I will come back to the series to revise it, very well made Series and keep up the good work !😀
@jasonlmay 4 года назад ⁺⁵
You could also solve the question in the video by using a pandas pivot table. here is the code:
table = pd.pivot_table(df, index='Country', values=['Hobbyist', 'LanguageWorkedWith'],
aggfunc={'Hobbyist': 'count',
'LanguageWorkedWith': lambda x: x.str.contains('Python').sum()})
table['percent'] = (table['LanguageWorkedWith'] / table['Hobbyist']) * 100
table.sort_values(by='percent', ascending=False)
@coreyms 4 года назад ⁺⁴
Nice! I’ll be covering pivot tables in a future video
@akosasuke5128 2 года назад ⁺¹
Corey Shafer deserves a RUclips Teacher award
@antoniodefalco6179 3 года назад
you're a amazing teacher man, thank you for this free content
@sumranms 4 года назад ⁺¹
I really like the way you speak. Your language is clearly understandable and you have a great accent. :)
@shanayasingh6864 3 года назад ⁺¹
I am a huge fan of your videos. Thanks a lot for making this wonderful tutorial😇
@saadchaudhry9110 Год назад
you teach really well, I am learning a lot...Thanks. I have also learned other topics from your videos....I was stuck on opening and reading a csv file in python, till I saw your video and learned it....I am an absolute beginner...😅...Thanks
@belleriveblvd 4 года назад
Corey, I learn a lot from your videos. But this one has been especially helpful. Thanks.
@charimuvilla8693 2 года назад
At 35:40 since you want to get the percentage of boolean values that are 1 you can pretty much just replace .sum() with .mean(). It's sum/length.
@jiangxu3895 4 года назад
for the question, I just test this one and it works, just use value_counts(normalize=True) instead of using sum()
country_group['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True))
@Terence818 4 года назад
Yes Corey, having a future video on multi-index will be very helpful!

Следующие

Автовоспроизведение

Python Pandas Tutorial (Part 9): Cleaning Data - Casting Datatypes and Handling Missing Values