Hey very good video, thanks you. One question the isin is use for exact matches is there a way for match partial coincidences? using your example you put 'Health', Biology', 'Life Science', suppose the data frame contains some bad words like 'Life.Science' or two category that contains 'Life' in it how would you do that? like the filter option in excel. Thanks!
Hi Jonathan! Glad you found it useful. Great question! For this purpose, you can use the .str.contains() method. You can use regex to account for capitalization, or you can convert the column to lower case. So for example, you could write: df[df['column_name'].str.lower().str.contains('health\biology|life')] This would search for any rows containing health, biology or life. Hope this helps! Nik
@@datagy Helps a lot, in fact I did that, but use the 'health' | 'biology' not all in one string that was my big mistake that i couldn't figure it out and you solved like that. Much appreciated Nik!!!
Thanks for your comment! You're very close: You'll want to wrap the .isin() parameter as a list: df[(df["Major_category"].isin(["Biology and Science", "Health"])) & (df["ShareWomen"] > 0.5) ]
I have to admit it was way beyond what I expected... Thanks...
thanks a lot!
excellent!
Thank you!
Thank You Very Much :)
Thanks Elfrid! :)
Hey very good video, thanks you. One question the isin is use for exact matches is there a way for match partial coincidences? using your example you put 'Health', Biology', 'Life Science', suppose the data frame contains some bad words like 'Life.Science' or two category that contains 'Life' in it how would you do that? like the filter option in excel. Thanks!
Hi Jonathan! Glad you found it useful. Great question!
For this purpose, you can use the .str.contains() method. You can use regex to account for capitalization, or you can convert the column to lower case. So for example, you could write:
df[df['column_name'].str.lower().str.contains('health\biology|life')]
This would search for any rows containing health, biology or life.
Hope this helps!
Nik
@@datagy Helps a lot, in fact I did that, but use the 'health' | 'biology' not all in one string that was my big mistake that i couldn't figure it out and you solved like that. Much appreciated Nik!!!
Happy to help! Glad it’s working :)
God Bless u, Keep up buddy ♥
Thanks! Glad you enjoyed it!
Cool
Thanks!
Chalenge answer is df[(df["Major_category"].isin("Biology and Science", "Health")) & (df["WomenShare"] > 0.5) ]. Right?
Thanks for your comment! You're very close: You'll want to wrap the .isin() parameter as a list:
df[(df["Major_category"].isin(["Biology and Science", "Health"])) & (df["ShareWomen"] > 0.5) ]
Where we get the dataset ?
Hey Rin! You can download it here: github.com/fivethirtyeight/data/raw/master/college-majors/women-stem.csv
thanks
The content is great, the vocal fry on the other hand, just keep me distracted...
Your videos are really helping me! - I tried to shoot you an email but nik@datagy.io returned an error, just a heads up
Ah thank you! I’ll fix that shortly. Nik@datagy.ca should work.