What about in a loop scenario, eg preparing train and test? data=[train, test] for df in data: df.drop(columns = ['BLOCKID','SUMLEVEL','primary'], inplace=True ) thanks
The Pandas core developers have said, for several years now, that using inplace=True is deprecated, provides no advantages, and will go away in a future version. For that reason alone, I would avoid using it. The example you gave could be handled with tuple unpacking and a list comprehension: train, test = [df.drop(columns=['BLOCKID', 'SUMLEVEL', 'primary'] for df in [train, test]] There are probably other, better ways to do it, but this strikes me as a pretty good and quick alternative.
I stopped using inplace=True once I started doing method chaining. The other bonus is that you can play with the data without changing the original dataframe, then once everything is good during EDA, you can then overwrite the original dataframe later.
I've seen it in a bunch of places, but here's a core Pandas developer saying that it'll go away in Pandas 2.0: github.com/pandas-dev/pandas/issues/16529 That clearly didn't happen, but we should expect it to be the case in the near future. Update: I didn't really answer your question, did I? Whoops! More below... I've definitely seen the core developers say that you cannot know what's going on behind the scenes. It feels like inplace=True will save memory, but you cannot be sure of this, in any way. Even if you don't see a new data frame, one is almost certainly being created behind the scenes. Here are some places where it is at least mentioned that you shouldn't use it and/or that there's more memory use than meets the eye: • www.dataschool.io/future-of-pandas/#inplace • github.com/pandas-dev/pandas/issues/48682 So... I didn't see a direct statement from the core developers. But it has been mentioned so often that I'm sure there is a source for this that I just didn't find.
Awesome advice. I'm working in Pandas to subset database records. I was considering inplace=True for memory efficiency. Ill assign back to the original variable for the reasons mentioned. Thanks for the tip.
That's not intuitively what a new Pandas user might expect inplace=True to do. System memory will definitely be a major concern for my work project that others will use on aged government computers. I've prototyped on a small scale but need to scale it up. Definitely helpful advice. Thank you!
I think it's pretty important to lead off with the fact that the Pandas development team is working to remove "inplace" support. Then mention chaining and no performance/memory improvement.
Sorry if I didn't make it clear enough -- inplace=True is a bad idea for a whole lot of reasons, and the core Pandas developers have made it clear that it's going to go away at some point. They've been saying this for a long time, so it's easy to say that we can ignore this statement, but one of these days, people will get an unpleasant surprise. I keep seeing people use inplace=True in my Pandas courses, and they're often convinced that it's the right thing to do... so I decided to give them a bit of a video scolding.
Interesting and I'll bear that in mind. But didn't you once say that setting a dataframe back to itself (ie df=df.reset_index(), or somesuch) was problematic? Can't remember why. Don't you have to force it to make a copy or something like that. That's why I've been using Inplace=True up to now. ps I really enjoy these bite-sized insights
I'm glad you're enjoying the videos! As for whether inplace=True is a good idea, I might have said, years ago, that this is the case. But for a while now, it has been known that inplace=True doesn't save any memory, and that the odds are good that it'll be going away in a future version of Pandas.
Let's say you have a list of pandas dataframes, and you want to rename some of the columns of each frame using a dictionary. If you loop through the dataframes and apply the rename method, the column names will not change unless you use inplace = True. Prove me wrong.
You're right... but you're also wrong. You're right, in that if you iterate over a list of data frames, and use the replace method on each data frame without inplace=True, then each data frame will remain unchanged. But there's an easy solution to this, namely using a list comprehension and then assigning the list of data frames back to the original list variable. That is, you would say something like: all_dfs = [one_df.replace({'a':'b'}) for one_df in all_dfs] This seems wasteful, and like it should be slower and use more memory. But the core Pandas developers have said for years that this is not the case, and that we shouldn't be using inplace=True. Moreover, inplace=True is going away in the next major version of Pandas. So even if you're right, and you love it, you won't be able to use it for much longer. The core developers are pushing us to use method chaining, and this is part of that push.
@@ReuvenLerner i don't like inplace = True at all. The idea to use a list comprehesion came to me at the time you responded so seeing that you have the same method as a solution is a positive. I'm not able to test it for another few hours though.
Not the most convincing video. The memory issue aside the rest of this was just filler of basic play around stuff which has nothing to do with why someone would "inplace" in the first place, which is because they are happy with their exploratory edits and now explicitly want to change the df for good.
Sorry it didn't work for you; I hadn't thought about that use case, where people don't use inplace=True to explore, and then do use it as a "final" way of doing things. The core Pandas developers have warned us for several years now that inplace=True is going away, and that we should avoid using it, regardless of our motivations. Method chaining is the preferred way to do things, both when exploring and when we finish.
What about in a loop scenario, eg preparing train and test?
data=[train, test]
for df in data:
df.drop(columns = ['BLOCKID','SUMLEVEL','primary'], inplace=True
)
thanks
The Pandas core developers have said, for several years now, that using inplace=True is deprecated, provides no advantages, and will go away in a future version.
For that reason alone, I would avoid using it.
The example you gave could be handled with tuple unpacking and a list comprehension:
train, test = [df.drop(columns=['BLOCKID', 'SUMLEVEL', 'primary'] for df in [train, test]]
There are probably other, better ways to do it, but this strikes me as a pretty good and quick alternative.
I stopped using inplace=True once I started doing method chaining. The other bonus is that you can play with the data without changing the original dataframe, then once everything is good during EDA, you can then overwrite the original dataframe later.
I totally agree!
Can you point where Pandas core developer have mentioned that it does not save memory? I was in illusion that it saves memory.
I've seen it in a bunch of places, but here's a core Pandas developer saying that it'll go away in Pandas 2.0: github.com/pandas-dev/pandas/issues/16529
That clearly didn't happen, but we should expect it to be the case in the near future.
Update: I didn't really answer your question, did I? Whoops! More below...
I've definitely seen the core developers say that you cannot know what's going on behind the scenes. It feels like inplace=True will save memory, but you cannot be sure of this, in any way. Even if you don't see a new data frame, one is almost certainly being created behind the scenes.
Here are some places where it is at least mentioned that you shouldn't use it and/or that there's more memory use than meets the eye:
• www.dataschool.io/future-of-pandas/#inplace
• github.com/pandas-dev/pandas/issues/48682
So... I didn't see a direct statement from the core developers. But it has been mentioned so often that I'm sure there is a source for this that I just didn't find.
@@ReuvenLerner Got it. There is intent to remove inplace at various issues on Github.
Thanks for the detailed response.
Awesome advice. I'm working in Pandas to subset database records. I was considering inplace=True for memory efficiency. Ill assign back to the original variable for the reasons mentioned. Thanks for the tip.
Glad it helped!
That's not intuitively what a new Pandas user might expect inplace=True to do. System memory will definitely be a major concern for my work project that others will use on aged government computers. I've prototyped on a small scale but need to scale it up. Definitely helpful advice. Thank you!
I think it's pretty important to lead off with the fact that the Pandas development team is working to remove "inplace" support. Then mention chaining and no performance/memory improvement.
Sorry if I didn't make it clear enough -- inplace=True is a bad idea for a whole lot of reasons, and the core Pandas developers have made it clear that it's going to go away at some point. They've been saying this for a long time, so it's easy to say that we can ignore this statement, but one of these days, people will get an unpleasant surprise.
I keep seeing people use inplace=True in my Pandas courses, and they're often convinced that it's the right thing to do... so I decided to give them a bit of a video scolding.
Interesting and I'll bear that in mind. But didn't you once say that setting a dataframe back to itself (ie df=df.reset_index(), or somesuch) was problematic? Can't remember why. Don't you have to force it to make a copy or something like that. That's why I've been using Inplace=True up to now.
ps I really enjoy these bite-sized insights
I'm glad you're enjoying the videos!
As for whether inplace=True is a good idea, I might have said, years ago, that this is the case. But for a while now, it has been known that inplace=True doesn't save any memory, and that the odds are good that it'll be going away in a future version of Pandas.
I had the None problem today and could not figure out why. Once I removed inplace=True, it worked.
Excellent!
Thanks. Basically it says to apply method chaining on the fly.
Let's say you have a list of pandas dataframes, and you want to rename some of the columns of each frame using a dictionary. If you loop through the dataframes and apply the rename method, the column names will not change unless you use inplace = True.
Prove me wrong.
You're right... but you're also wrong.
You're right, in that if you iterate over a list of data frames, and use the replace method on each data frame without inplace=True, then each data frame will remain unchanged.
But there's an easy solution to this, namely using a list comprehension and then assigning the list of data frames back to the original list variable. That is, you would say something like:
all_dfs = [one_df.replace({'a':'b'}) for one_df in all_dfs]
This seems wasteful, and like it should be slower and use more memory. But the core Pandas developers have said for years that this is not the case, and that we shouldn't be using inplace=True.
Moreover, inplace=True is going away in the next major version of Pandas. So even if you're right, and you love it, you won't be able to use it for much longer.
The core developers are pushing us to use method chaining, and this is part of that push.
@@ReuvenLerner i don't like inplace = True at all. The idea to use a list comprehesion came to me at the time you responded so seeing that you have the same method as a solution is a positive. I'm not able to test it for another few hours though.
Update:
The list comprehension worked and all inplace = True statements are gone!
@@cybern9ne Amazing!
I also thought that i would save memory by using "inplace=True", will stop using it from now on, thanks for the hint.
You're not alone in thinking this! Glad I could help.
Thanks for the tip! Will implement that from now on 😊
Not the most convincing video. The memory issue aside the rest of this was just filler of basic play around stuff which has nothing to do with why someone would "inplace" in the first place, which is because they are happy with their exploratory edits and now explicitly want to change the df for good.
Sorry it didn't work for you; I hadn't thought about that use case, where people don't use inplace=True to explore, and then do use it as a "final" way of doing things.
The core Pandas developers have warned us for several years now that inplace=True is going away, and that we should avoid using it, regardless of our motivations. Method chaining is the preferred way to do things, both when exploring and when we finish.
@@ReuvenLerner Roger that. Thanks for the warning :)
Very good and helpful.
Glad to know it helped!
Actually since I started following your channel, I received my Higher diploma in Data analysis. Now I'm going for my MSc.😆
@@quadropheniaguy9811 That's fantastic! Keep it up!
nice
Glad it helped!
never thought this . Thanks