Stop using inplace=True in Pandas!

Поделиться
HTML-код
  • Опубликовано: 7 янв 2025

Комментарии • 35

  • @johnbainbridge1931
    @johnbainbridge1931 Год назад +1

    What about in a loop scenario, eg preparing train and test?
    data=[train, test]
    for df in data:
    df.drop(columns = ['BLOCKID','SUMLEVEL','primary'], inplace=True
    )
    thanks

    • @ReuvenLerner
      @ReuvenLerner  Год назад +2

      The Pandas core developers have said, for several years now, that using inplace=True is deprecated, provides no advantages, and will go away in a future version.
      For that reason alone, I would avoid using it.
      The example you gave could be handled with tuple unpacking and a list comprehension:
      train, test = [df.drop(columns=['BLOCKID', 'SUMLEVEL', 'primary'] for df in [train, test]]
      There are probably other, better ways to do it, but this strikes me as a pretty good and quick alternative.

  • @christophertyler1882
    @christophertyler1882 Год назад +5

    I stopped using inplace=True once I started doing method chaining. The other bonus is that you can play with the data without changing the original dataframe, then once everything is good during EDA, you can then overwrite the original dataframe later.

  • @tushartiwari7929
    @tushartiwari7929 Год назад +1

    Can you point where Pandas core developer have mentioned that it does not save memory? I was in illusion that it saves memory.

    • @ReuvenLerner
      @ReuvenLerner  Год назад

      I've seen it in a bunch of places, but here's a core Pandas developer saying that it'll go away in Pandas 2.0: github.com/pandas-dev/pandas/issues/16529
      That clearly didn't happen, but we should expect it to be the case in the near future.
      Update: I didn't really answer your question, did I? Whoops! More below...
      I've definitely seen the core developers say that you cannot know what's going on behind the scenes. It feels like inplace=True will save memory, but you cannot be sure of this, in any way. Even if you don't see a new data frame, one is almost certainly being created behind the scenes.
      Here are some places where it is at least mentioned that you shouldn't use it and/or that there's more memory use than meets the eye:
      • www.dataschool.io/future-of-pandas/#inplace
      • github.com/pandas-dev/pandas/issues/48682
      So... I didn't see a direct statement from the core developers. But it has been mentioned so often that I'm sure there is a source for this that I just didn't find.

    • @tushartiwari7929
      @tushartiwari7929 Год назад +1

      ​@@ReuvenLerner Got it. There is intent to remove inplace at various issues on Github.
      Thanks for the detailed response.

  • @BrownStain_Silver
    @BrownStain_Silver 8 месяцев назад +1

    Awesome advice. I'm working in Pandas to subset database records. I was considering inplace=True for memory efficiency. Ill assign back to the original variable for the reasons mentioned. Thanks for the tip.

    • @ReuvenLerner
      @ReuvenLerner  8 месяцев назад +1

      Glad it helped!

    • @BrownStain_Silver
      @BrownStain_Silver 8 месяцев назад +1

      That's not intuitively what a new Pandas user might expect inplace=True to do. System memory will definitely be a major concern for my work project that others will use on aged government computers. I've prototyped on a small scale but need to scale it up. Definitely helpful advice. Thank you!

  • @sloanlance
    @sloanlance Год назад +3

    I think it's pretty important to lead off with the fact that the Pandas development team is working to remove "inplace" support. Then mention chaining and no performance/memory improvement.

    • @ReuvenLerner
      @ReuvenLerner  Год назад

      Sorry if I didn't make it clear enough -- inplace=True is a bad idea for a whole lot of reasons, and the core Pandas developers have made it clear that it's going to go away at some point. They've been saying this for a long time, so it's easy to say that we can ignore this statement, but one of these days, people will get an unpleasant surprise.
      I keep seeing people use inplace=True in my Pandas courses, and they're often convinced that it's the right thing to do... so I decided to give them a bit of a video scolding.

  • @WildRover1964
    @WildRover1964 Год назад +2

    Interesting and I'll bear that in mind. But didn't you once say that setting a dataframe back to itself (ie df=df.reset_index(), or somesuch) was problematic? Can't remember why. Don't you have to force it to make a copy or something like that. That's why I've been using Inplace=True up to now.
    ps I really enjoy these bite-sized insights

    • @ReuvenLerner
      @ReuvenLerner  Год назад

      I'm glad you're enjoying the videos!
      As for whether inplace=True is a good idea, I might have said, years ago, that this is the case. But for a while now, it has been known that inplace=True doesn't save any memory, and that the odds are good that it'll be going away in a future version of Pandas.

  • @method341
    @method341 Год назад +1

    I had the None problem today and could not figure out why. Once I removed inplace=True, it worked.

  • @elu1
    @elu1 Год назад +1

    Thanks. Basically it says to apply method chaining on the fly.

  • @cybern9ne
    @cybern9ne 4 месяца назад +1

    Let's say you have a list of pandas dataframes, and you want to rename some of the columns of each frame using a dictionary. If you loop through the dataframes and apply the rename method, the column names will not change unless you use inplace = True.
    Prove me wrong.

    • @ReuvenLerner
      @ReuvenLerner  4 месяца назад +1

      You're right... but you're also wrong.
      You're right, in that if you iterate over a list of data frames, and use the replace method on each data frame without inplace=True, then each data frame will remain unchanged.
      But there's an easy solution to this, namely using a list comprehension and then assigning the list of data frames back to the original list variable. That is, you would say something like:
      all_dfs = [one_df.replace({'a':'b'}) for one_df in all_dfs]
      This seems wasteful, and like it should be slower and use more memory. But the core Pandas developers have said for years that this is not the case, and that we shouldn't be using inplace=True.
      Moreover, inplace=True is going away in the next major version of Pandas. So even if you're right, and you love it, you won't be able to use it for much longer.
      The core developers are pushing us to use method chaining, and this is part of that push.

    • @cybern9ne
      @cybern9ne 4 месяца назад +1

      @@ReuvenLerner i don't like inplace = True at all. The idea to use a list comprehesion came to me at the time you responded so seeing that you have the same method as a solution is a positive. I'm not able to test it for another few hours though.

    • @cybern9ne
      @cybern9ne 4 месяца назад +1

      Update:
      The list comprehension worked and all inplace = True statements are gone!

    • @ReuvenLerner
      @ReuvenLerner  4 месяца назад +1

      @@cybern9ne Amazing!

  • @franky12
    @franky12 Год назад +1

    I also thought that i would save memory by using "inplace=True", will stop using it from now on, thanks for the hint.

    • @ReuvenLerner
      @ReuvenLerner  Год назад

      You're not alone in thinking this! Glad I could help.

  • @Arne_Boeses
    @Arne_Boeses Год назад +1

    Thanks for the tip! Will implement that from now on 😊

  • @lade_edal
    @lade_edal Год назад +2

    Not the most convincing video. The memory issue aside the rest of this was just filler of basic play around stuff which has nothing to do with why someone would "inplace" in the first place, which is because they are happy with their exploratory edits and now explicitly want to change the df for good.

    • @ReuvenLerner
      @ReuvenLerner  Год назад

      Sorry it didn't work for you; I hadn't thought about that use case, where people don't use inplace=True to explore, and then do use it as a "final" way of doing things.
      The core Pandas developers have warned us for several years now that inplace=True is going away, and that we should avoid using it, regardless of our motivations. Method chaining is the preferred way to do things, both when exploring and when we finish.

    • @lade_edal
      @lade_edal Год назад +1

      @@ReuvenLerner Roger that. Thanks for the warning :)

  • @quadropheniaguy9811
    @quadropheniaguy9811 Год назад +1

    Very good and helpful.

    • @ReuvenLerner
      @ReuvenLerner  Год назад

      Glad to know it helped!

    • @quadropheniaguy9811
      @quadropheniaguy9811 Год назад +1

      Actually since I started following your channel, I received my Higher diploma in Data analysis. Now I'm going for my MSc.😆

    • @ReuvenLerner
      @ReuvenLerner  Год назад +1

      @@quadropheniaguy9811 That's fantastic! Keep it up!

  • @horus4862
    @horus4862 Год назад +1

    nice

  • @mrityunjaypathak8792
    @mrityunjaypathak8792 Год назад +1

    never thought this . Thanks