@2:02:39 I believe it is best practice to avoid using for loops since the pandas operations are built on numpy and are vectorized. You can do something like df.loc[df['Do_Not_Contact'] != 'Y'] to filter out the y's and then set that whole column to N with df['Do_Not_Contact'] = 'N'
Now do it again, but for Polars - the superior dataframe library! It's WAY faster, can handle WAY more data, uses WAY less memory, the API is MUCH cleaner (i.e. more readable) and I truly believe it's the future of dataframe libraries. I say that after using Pandas for 2 years, and Polars for 2 months. No more abusing the index, when you really just want to do a group_by.
create a folder and keep the xlsx file and pynb in same folder & right click on on the file u want to read copy and placce it in df=pd.read_csv(r"file path") now read it if it works tey to copy the path from folder and paste it next \file name enter i hope this might work
Hi All, I need help for th GroupBy section. I'm getting error while applying aggregator on group_by_frame. group_by_frame = df.groupby('Base Flavor') --> run succesfully group_by_frame.mean() --> gave error TypeError: Could not convert ChocolateRocky RoadChocolte Fudge Brownie to numeric
I also did get an error but i tried this and it worked. I am not sure how it works though. Try placing a column name with integers in the bracket. so i used: group_by_frame.mean('Flavor Rating') 'Flavor Rating' contains int. You can use any other column which contain integers/numbers. with this i was able to get the mean values
@@elhyjhayqulchat4049 In actuality, the argument you are passing goes into the numeric_only parameter, which takes a boolean value. Thus the “Flavor Rating” value is converted to a boolean value and the mean method is executed.
@1:48:55 formatting the phone number with lambda function does NOT work, we have to handle if the phone number is missing or not a 10 digit first... CODE: import numpy as np # Define a function to format phone numbers def format_phone_number(x): if pd.isna(x) or len(x) != 10: return np.nan # Return NaN if the phone number is missing or not 10 digits return x[0:3] + '-' + x[3:6] + '-' + x[6:10] # Apply the formatting function df3['Phone_Number'] = df3['Phone_Number'].apply(lambda x: format_phone_number(str(x))) print(df3) Courtesy of Copilot Thank you @AlexTheAnalyst for the video!
Make a video where you fetch data of the Olympic 2024 medalists using web scraping and display it on the frontend using Flask or Streamlit, with a feature for filtering as well. This project will give many ideas, and there isn't a video like this on RUclips.
Hello Alex. This line is not working **fl.groupby('Base Flavor').mean()**. I see error TypeError: agg function failed [how->mean,dtype->object]. But it is working on your jupyter?
Man....the quality you are providing free of any penny, is amazing. Salute to your contribution to community man
Thanks! If this tutorial is half of the quality compared to the latest SQL tutorial, then it is really worth watching 👌
so excited for this. I'm going to kill my procrastination and i will finish this :) thank you Alex.
@2:02:39 I believe it is best practice to avoid using for loops since the pandas operations are built on numpy and are vectorized. You can do something like df.loc[df['Do_Not_Contact'] != 'Y'] to filter out the y's and then set that whole column to N with df['Do_Not_Contact'] = 'N'
You're right - definitely could have done it that way
you sir are a GEM !! thank you so much
Great stuff Alex! Thanks for sharing
For merge, I use left_on and right_on when the columns that I am merging on have different names in the two tables.
Thanks for the knowledge sharing sir. Since when I started watching your JavaScript tutorial, it was understandable and clear.
Спасибо Вам огромное! Вот это действительно очень полезный урок!
It is needed ❤
Thanks!
Another save in my coding playlist
Now do it again, but for Polars - the superior dataframe library! It's WAY faster, can handle WAY more data, uses WAY less memory, the API is MUCH cleaner (i.e. more readable) and I truly believe it's the future of dataframe libraries.
I say that after using Pandas for 2 years, and Polars for 2 months. No more abusing the index, when you really just want to do a group_by.
we need matplotib, seaborn that are used only in data analyst, can you do that
Alex do you added any new topics in this video or is it same as the one in the bootcamp ? @Alex
Same all put into one video
1:44:42
3 rows which have number in format 0000000000 got Nan value why
And how to fix it?
I downloaded this zip file from GitHub but when I copy the path in python it is returning a not found error, How do I do it?
create a folder and keep the xlsx file and pynb in same folder & right click on on the file u want to read copy and placce it in df=pd.read_csv(r"file path") now read it if it works tey to copy the path from folder and paste it next \file name enter i hope this might work
hi
is that work in notebook in Microsoft Fabric ???
Thank you for the video
Hi All, I need help for th GroupBy section. I'm getting error while applying aggregator on group_by_frame.
group_by_frame = df.groupby('Base Flavor') --> run succesfully
group_by_frame.mean() --> gave error
TypeError: Could not convert ChocolateRocky RoadChocolte Fudge Brownie to numeric
I also did get an error but i tried this and it worked. I am not sure how it works though. Try placing a column name with integers in the bracket. so i used:
group_by_frame.mean('Flavor Rating')
'Flavor Rating' contains int. You can use any other column which contain integers/numbers.
with this i was able to get the mean values
@@elhyjhayqulchat4049 In actuality, the argument you are passing goes into the numeric_only parameter, which takes a boolean value. Thus the “Flavor Rating” value is converted to a boolean value and the mean method is executed.
@@elhyjhayqulchat4049 you can also use this if you getting error:-
group_by_frame.mean(numeric_only=True)
@1:48:55 formatting the phone number with lambda function does NOT work, we have to handle if the phone number is missing or not a 10 digit first...
CODE:
import numpy as np
# Define a function to format phone numbers
def format_phone_number(x):
if pd.isna(x) or len(x) != 10:
return np.nan # Return NaN if the phone number is missing or not 10 digits
return x[0:3] + '-' + x[3:6] + '-' + x[6:10]
# Apply the formatting function
df3['Phone_Number'] = df3['Phone_Number'].apply(lambda x: format_phone_number(str(x)))
print(df3)
Courtesy of Copilot
Thank you @AlexTheAnalyst for the video!
Hey thank you for making the best content
Make a video where you fetch data of the Olympic 2024 medalists using web scraping and display it on the frontend using Flask or Streamlit, with a feature for filtering as well. This project will give many ideas, and there isn't a video like this on RUclips.
Hello Alex. This line is not working **fl.groupby('Base Flavor').mean()**.
I see error TypeError: agg function failed [how->mean,dtype->object]. But it is working on your jupyter?
Use group_by_frame.mean(numeric_only=True)
@@nerads Thank you very much
2:03:30
This will work for making 'NNN' to 'N'
df['Do_Not_Contact']=df['Do_Not_Contact'].str.strip(' ').replace(' ', 'N')