Boolean indexing in Pandas made simple

Поделиться
HTML-код
  • Опубликовано: 8 авг 2023
  • Boolean indexing (aka mask indexing) is the main way that we retrieve values in Pandas. But to many people, it looks and feels both weird and unintuitive. In this video, I show you how it works and how to think about it, both with series and data frames. You'll finally understand boolean indexing, and be able to use it in your owrk!
  • НаукаНаука

Комментарии • 14

  • @RAJACENA1996
    @RAJACENA1996 5 месяцев назад +1

    I have seen multiple videos of pandas filtering and indexing but none of them explained how actually giving a list of booleans works on a DataFrame. Thank u.

  • @arjunpsk
    @arjunpsk 9 месяцев назад +2

    You did a really good job explaining this.

  • @KingScrounger
    @KingScrounger 7 месяцев назад +1

    wow, very helpful

  • @khanhnguyenle9959
    @khanhnguyenle9959 8 месяцев назад +1

    This is really good explanation 🎉🎉🎉

  • @ahmedabukar7341
    @ahmedabukar7341 10 месяцев назад +1

    Say something about polars library

    • @ReuvenLerner
      @ReuvenLerner  10 месяцев назад

      I'm planning to spend some time learning Polars, and will do some videos about it in the coming months!

  • @speedyg2295
    @speedyg2295 10 месяцев назад +1

    Great Video. Scenario for you. I will try to describe how i get there and the question is how do I get the index's of the rows that are all true. So have an excel, i create dataframe of certain channel numbers. In this one has 8 columns and 48 rows. Not every row has a number value and every row does have a nan. i write ```isTrue = df.loc[0:48].isnull(); print(isTrue.head)``` gives me what i am looking for. A DataFrame of T/F in 48 rows. Now there are some rows that have 8 True's. Telling me that there was no number value in that row.I am Trying to figure out a mask that will tell me the index of the row with 8 True's. And 8 just happens to be the number this time. But thats beyond what I am after here. Is there any guidance that you can send me looking to get what I am after? Thank you for your time.

    • @ReuvenLerner
      @ReuvenLerner  10 месяцев назад

      I can't give a complete answer here. But you can use isna().sum() and get the number of NaN values in a given column. Or you can use axis='rows' and get it the other way around. Then you can compare that with the full number, and select the rows with a sum of 8.Another option would be to use dropna but with the "subset" keyword argument, specifying the column(s) that are of interest to you.
      I hope that this helps!

    • @speedyg2295
      @speedyg2295 10 месяцев назад

      @@ReuvenLerner Thanks for the insigt. Having a little trouble trying the items you spoke above. 1 is getting a sum. 2. not quite sure how to put the axis='rows'. I have tried a few different ways. and i come up with 0 in the sum. trueCount = isTrue.isna().sum()
      print(trueCount). I tried putting axis='rows' behind the ) within another set of () with a ".". with a ",". Looked up some of its uses but not finding a good example of where the bugger goes. isTrue is the DF from the .isnull() from before. I get some counts when i go to the df and they are not adding up to the nan in the original df. the leaving isna() gives me the same list as isnull(). Which is the DF i want, i'm just not getting i guess the result will be a df of just the row with all True in it. Still toyin around. I tried this from something I found and got a df of True where nan was and nan where a value was. But I am still trying to get the index of the row that I can pull into a variable to do something with. after the original isnull() i tried print(isTrue[isTrue[0:47] == True]) which gives me the reverse and those lines I want are there with True in everycolumn. I'm just having a hard time grabbing those rows. So in a long work around I have taken the original df and created a new df of wattages which will create a new df of amps. from which i will have a 0 in the index where i need the 0 to be for my xlwings part of this to work. But i am sure that there has to be a way to get those lines in pandas.