How do I use the "axis" parameter in pandas?

Поделиться
HTML-код
  • Опубликовано: 19 янв 2025

Комментарии • 231

  • @dataschool
    @dataschool  7 месяцев назад +9

    Is the mean() method not working for you? You need to include the argument numeric_only=True, for example: drinks.mean(numeric_only=True).
    This is a new requirement in pandas for cases in which you want to calculate the mean of numeric rows or columns and the DataFrame contains non-numeric data. Hope that helps!

    • @raneshmitra8156
      @raneshmitra8156 6 месяцев назад +1

      Thank you for your update...... Your explanation is truly awesome.......

  • @ArijitBiswasdotcom
    @ArijitBiswasdotcom 6 лет назад +38

    Your explanations are so accurate and so eloquent! I feel I can't thank you enough man! Thank you very much, I appreciate your efforts so much and wish all the best for the future!

    • @dataschool
      @dataschool  6 лет назад +3

      Wow, thank you so much for your kind comment! You are very welcome!

  • @AS-ws2se
    @AS-ws2se 4 года назад +4

    Your explanations are wonderful. You speak slowly and in a concise manner, that makes it easy to follow and understand. Thank you!
    Also, your bonus tips at the end of videos are always so useful!

  • @Philippe.C.A-R
    @Philippe.C.A-R 5 лет назад +4

    When people like what they do it shows. Thanks for your explanations, their clarity and the quality of your enunciation!

  • @monotonous_0
    @monotonous_0 7 месяцев назад +2

    If mean is not working for you:
    We first have to drop 'country' and 'continent' columns, these columns contain strings so we can't do mean with them.
    drinks = drinks.drop(['continent','country'],axis = 1)

    • @ujan_saheli
      @ujan_saheli 7 месяцев назад

      Thanks

    • @dataschool
      @dataschool  7 месяцев назад +2

      Alternatively, you can include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). That way, you can still perform the mean operation without dropping data that you might want to keep. Hope that helps!

  • @RavikiranS
    @RavikiranS 5 лет назад +8

    Fantastic explanations Kevin! Really enjoy learning from you

  • @vaishalisah7061
    @vaishalisah7061 2 года назад +3

    2022 , 6 years old video e yet it is so accurate from others. Please upload more content. Thnk you😊

  • @farhaannishtar8090
    @farhaannishtar8090 5 лет назад +2

    Dude I love you, I was confused and a little frustrated with this topic and you explained it perfectly

  • @MS-tz1ml
    @MS-tz1ml 4 года назад

    I'm taking an online course but I'm actually learning all the stuff from these Data School videos cause they are better.

  • @Chanchothecorgi
    @Chanchothecorgi 5 лет назад +17

    Python Newb here (about one month), so please take it easy....
    In the first example, we wanted to get rid of 'continent', which is a column, so we had axis=1.
    but when we started getting into the mean, and we wanted to get the mean of a column, it became axis=0...
    Obviously, I am missing something here, but it looks to me like it flipped... What am I missing.. I am having some difficulty with axis.

    • @dataschool
      @dataschool  5 лет назад +25

      That's a great question! When you are removing something, you are specifying the axis from which you want to remove something. To remove a column, the axis from which you want to remove something is axis 1.
      When you are aggregating something, you are specifying the axis along which you want the aggregation to occur. Thus if you want to aggregate all rows with the mean function, the axis along which you are aggregating is axis 0. The result is that you have a mean of each column, but the key point is that you aggregated all rows, and the row axis is 0.
      Does that help?

    • @psingh2463
      @psingh2463 3 года назад

      @@dataschool absolutely it helped..
      Thanks

  • @taifurrahman8281
    @taifurrahman8281 5 лет назад +1

    The way you explain stuffs is amazing. Immediately subscribed to your channel after watching this video. Thanks for your kind support towards the data enthusiasts.

    • @dataschool
      @dataschool  5 лет назад

      Awesome! Thanks for subscribing and for your kind words :)

  • @ramakanthrayanchi8888
    @ramakanthrayanchi8888 8 лет назад

    The hand movement trick to remember about axis is really helpful. Thanks . Excellent video !!

    • @dataschool
      @dataschool  8 лет назад

      Great to hear! It's hard for me to know if those visual tricks are useful to people, so I'm glad to hear that it works for you!

  • @ItsWithinYou
    @ItsWithinYou 3 года назад +1

    Excellent explanation in a concise way...very helpful.

  • @sahbiatia3570
    @sahbiatia3570 3 года назад +1

    you are an artist,I appreciate your efforts so much

  • @a1x45h
    @a1x45h 4 года назад

    The reason I find "index" and "column" confusing is, in a dataframe, the index of the training example is actually a (m,1) vector, where m is the total number of training examples.
    On the other hand, features is a (n,1) vector where n is the total number of features in the given dataset.
    So, basically we are calling the features as "index" here, which is confusing.
    I will stick to 0 and 1. Thank you for the amazing explanation. :)

  • @jaikishank
    @jaikishank 4 года назад

    That was great explanation which i was eagerly looking for the parameter axis since all documentations are able to explain clearly the same. Thanks for your very informative videos which is supporting beginner level people.

  • @atishayshukla1117
    @atishayshukla1117 5 лет назад +1

    Can't get better than this. You are a great teacher. One question: If the data set have a lot of NaN values in it, and they are like random so lets say some of the values in series are NaN for some index and some are filled, how can we get a data frame without NaN and then save it as a new sheet?

    • @dataschool
      @dataschool  5 лет назад

      I don't completely understand your question, I'm sorry!

  • @viniciusguimaraessantana5455
    @viniciusguimaraessantana5455 3 года назад

    Best classes about Pandas. Thank you!

  • @Leonardo-jv1ls
    @Leonardo-jv1ls 5 лет назад +1

    axis = 0 could be seen as 'show the results for this axis', in the case 'x'. And for the other axis is the same idea. Amazing videos.

    • @dataschool
      @dataschool  5 лет назад

      Thanks for sharing! Glad you like the videos :)

  • @khangnguyendac7184
    @khangnguyendac7184 Год назад +1

    Thank you so much for your explanations. It's so easy to understand and very helpful to me!

  • @ASNPersonal
    @ASNPersonal 4 года назад

    very useful video. Nicely explained axis=0, axis=1 & mean. If could explained with inplace=True very gratful.
    Thanks for sharing.

  • @cristian.nitoiu
    @cristian.nitoiu 2 года назад

    Great explanation.
    It's not the case anymore as pandas dropped the Panel but I feel like adding another dimension, say using numpy, clarifies it even further.
    So when having 3 dimensions axis=0 refers to the list matrixes, axis=1 to the rows of the matrixes and axis=2 to the columns of the matrixes.
    Then when using the axis for a reduction like 'sum' or 'mean' you will basically have a result of the remaining 2 dimensions, except the one you specified in the axis parameter.

  • @mmmmmmm12828
    @mmmmmmm12828 Год назад +1

    drinks.mean(axis=1) or drinks.mean(axis=0) both give the same error. TypeError: can only concatenate str (not "int") to str. How can solve it?

    • @dataschool
      @dataschool  Год назад +1

      Thanks for the question! In the current version of pandas, if a DataFrame contains non-numeric data and you want to calculate the mean of numeric rows or columns, you have to include the argument numeric_only=True. Hope that helps!

  • @finalpurez
    @finalpurez 2 года назад +2

    Thanks for the video. You actually made it feel so easy to learn coding hahaha

    • @dataschool
      @dataschool  2 года назад

      That's awesome to hear!

    • @dr.kingschultz
      @dr.kingschultz 2 года назад

      @@dataschool that is true, but unfortunately it is not true... LOL

  • @didierleprince6106
    @didierleprince6106 5 лет назад +4

    un grand Merci for your knowledge and your diction...

  • @saraths9044
    @saraths9044 4 года назад

    The best explanation for axis parameter that I have ever gotten, 0 for moving down and 1 for moving right . But for dropping a column, The axis =1 right? how is that possible?

  • @maheshsgour3904
    @maheshsgour3904 5 лет назад

    thanks a lot bro. you are doing social work by educating people in the current trend.

  • @fet1612
    @fet1612 6 лет назад +1

    Q&A Series: Video #11 _3(TC01:36)
    drinks.drop()

  • @fet1612
    @fet1612 6 лет назад

    01:12
    Each row represents a country and their reported alcohol consumption per adult. To REMOVE a COLUMN (eg continent column), we'd use the DROP method (DataFrame method)
    drinks.drop('continent')

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: Video #11 _8(TC04:00)
    drinks.mean()
    .mean()
    DOT MEAN()

  • @khurshidkhan3984
    @khurshidkhan3984 8 лет назад

    axis parameter had always confused me, Now i understand how it actually works.Thanks buddy.
    If possible please create a playlist for matplotlib.

    • @dataschool
      @dataschool  8 лет назад +1

      Glad I could be of help!
      Thanks for your video suggestion - I'll consider it for the future.

    • @brentskoumal2986
      @brentskoumal2986 8 лет назад

      he's right, I would love to see your explanation of matplotlib...

  • @fet1612
    @fet1612 6 лет назад +1

    Q&A Series: Video #11 _5(TC01:56)
    drinks.drop('continent', axis=1).head()

  • @nataliaagudelo8635
    @nataliaagudelo8635 5 лет назад +1

    As always, thank you for such a valuable video!

  • @bowenliu807
    @bowenliu807 5 лет назад

    Thanks a lot Kevin for this great video! The visual explanation works better than StackOverlow answers. What got me confused with the Pandas axis is the pandas.concat function. I couldn't figure out why axis = 0 is vertical concat and axis = 1 horizontal. With the logic illustrated in this video, I guess I shall consider concat as a kind of operation where axis = 0 is moving along the index? Thanks agian.

    • @dataschool
      @dataschool  4 года назад

      axis=0 is the "rows" or "index" axis, meaning concatenate (stack) rows. axis=1 is the "columns" axis, meaning concatenate columns. Hope that helps!

  • @fet1612
    @fet1612 6 лет назад +1

    Q&A Series: Video #11 _12(TC05:24)
    drinks.mean(axis=0)
    ================
    Kevin says, "The way I'd like to IMAGINE is THESE FOUR NUMERIC COLUMNS are BEING COLLAPSED DOWN into a SINGLE SET of FOUR NUMBERS that represents the MEAN of EACH COLUMN."

  • @fet1612
    @fet1612 6 лет назад +1

    Q&A Series: Video #11 _7(TC02:40)
    drinks.drop(2, axis=0).head()
    ============================
    .drop(2, axis = 0)
    =======================

  • @jackbotman
    @jackbotman 7 лет назад +2

    Thanks, this was a great video, going to come in handy

  • @kopilkaiser8991
    @kopilkaiser8991 Год назад

    Thank you for teaching us the two different methods and how axis is operating. I have no idea why they have sticked with two different for two axis used in two regions: DataFrame and Mathematical operation. This honestly, complicates and creates confusion between understanding the two areas. But again, it is the fault of founders of pd.df not to sticking with mean axis, which I think they should have done. 😊

  • @anandsandhu
    @anandsandhu 3 года назад

    How you set the data frame display like table/excel? Like your data frame output is showing.

  • @vigneshm8450
    @vigneshm8450 4 года назад

    Hi axis parameter behaves differently in dropna() method. When my axis=1 the na values in each columns are dropped, whereas in case of 0, the na values of each rows are dropped. Kindly explain this. Thanks

  • @zoid44
    @zoid44 3 года назад

    Great video! In the specific case for mean method, I think that a DataFrame of student grades in a semester would me more meaningful in both axis directions. You could get the average class grade for each exam or the average grade for each student.
    Nevertheless, I am finding very useful to study pandas through your content! Thank you!

  • @julieye2260
    @julieye2260 4 года назад +1

    Excellent explanation!!! Thank you so much!

  • @michaeldufton2298
    @michaeldufton2298 7 лет назад

    A really useful video series. Thanks so much.

  • @gabqcm
    @gabqcm 7 лет назад

    Thanks for the series. I have a question for a code thats not quite working as I expected.
    I am using your dataset of ufo, so I am trying to drop all rows that "Shape_reported" == Other.
    But for some reason it drops the first row which is not an "Other".
    Here is the ufo.head(3)
    City Shape_Reported State Time
    0 Ithaca TRIANGLE NY 6/1/1930 22:00
    1 Willingboro OTHER NJ 6/30/1930 20:00
    2 Holyoke OVAL CO 2/15/1931 14:00
    So if I drop rows with the "Other" Shape_reported I should get indexes 0,2 and so on, but 0 is not appearing in the list:
    in[]: ufo.drop(ufo['Shape_Reported']=='OTHER',axis=0).head()
    out[]: City Shape_Reported State Time
    2 Holyoke OVAL CO 2/15/1931 14:00
    3 Abilene DISK KS 6/1/1931 13:00
    4 New York Worlds Fair LIGHT NY 4/18/1933 19:00
    5 Valley City DISK ND 9/15/1934 15:30
    As you can see, Ithaca NY does not appear on the table, does anyone knows why?
    I am using Jupyter console, not the notebook

    • @dataschool
      @dataschool  7 лет назад

      I think you want to be filtering, not using the drop method. I suggest checking out this video: ruclips.net/video/2AFGPdNn4FM/видео.html
      Let me know if that helps!

    • @gabqcm
      @gabqcm 7 лет назад

      I watched that video before but never ocurred to me to do:
      ufo[ufo["Shape Reported"] != "OTHER"]
      Which works nicely, thanks for the help!

    • @dataschool
      @dataschool  7 лет назад

      You're very welcome!

  • @sharathnandalike8108
    @sharathnandalike8108 5 лет назад

    Dear Sir, In the 'axis parameter' video (at 2.20 mins) you mention that " I DID NOT use the in place parameter, so did not remove the column , this is temporary". What ' in place' actually does.
    Thanks.

    • @dataschool
      @dataschool  5 лет назад

      See this video: ruclips.net/video/XaCSdr7pPmY/видео.html

  • @Vishal-kk6dr
    @Vishal-kk6dr 8 месяцев назад +1

    .mean() is not working for me

    • @monotonous_0
      @monotonous_0 7 месяцев назад +1

      Drop the country and continent axis first. You can't do sum or mean with strings

    • @dataschool
      @dataschool  7 месяцев назад +1

      Alternatively, you can include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). That way, you can still perform the mean operation without dropping data that you might want to keep. Hope that helps!

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: VIDEO #11 _1 (TC01:05)
    import pandas as pd

  • @IncredibleGrim
    @IncredibleGrim 4 года назад +1

    after dropping a row, how to re-index the table?

    • @DywanJohnson
      @DywanJohnson 4 года назад

      i think its DataFrame.reset_index

    • @karangupta725
      @karangupta725 4 года назад

      @@DywanJohnson it didnt work

  • @igornovichenko8677
    @igornovichenko8677 3 года назад

    Confusing topic.
    When I think of dataframe as the "collections of Series" that share the same "index",
    and not like "rows and columns", things become more clear.
    The main question for me is:
    "Why do we use 'axis=1' when dropping column?"
    From my pondering I may only conclude that dropping is vectorized operation (not atomic).

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: Video #11 _4(TC01:50)
    DataFrame.drop()

  • @fet1612
    @fet1612 6 лет назад

    00:28
    DROPPING ROWS and COLUMNS: RECAP
    > parameter
    import pandas as pd
    drinks = pd.read_csv(' bit.ly/drinksbycountry ') 01:01

  • @Astute_
    @Astute_ 7 месяцев назад

    while performing the mean operation, it shows that it could not convert the country's name to numeric , its an error. What to do?

    • @dataschool
      @dataschool  7 месяцев назад

      You need to include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). This is a new requirement in pandas for cases in which you want to calculate the mean of numeric rows or columns and the DataFrame contains non-numeric data. Hope that helps!

  • @akhilsoni729
    @akhilsoni729 4 года назад

    I tried to recap the "the dropping the column concept" and wrote "drinks.drop(columns=['beer_servings'],axis=1,inplace=True) " and it still worked, how is it so?

  • @satnamprajapati9024
    @satnamprajapati9024 4 года назад

    If in a column there are some word repeating. Then how can we count their occurance nd how can we filter it.

  • @nadavkedem4649
    @nadavkedem4649 5 лет назад +1

    Isn't it a bit a contradiction? for drop, axis =1 means a column, while for mean (of a column) we use axis = 0 ?

    • @dataschool
      @dataschool  5 лет назад

      When you are removing something, you are specifying the axis from which you want to remove something. To remove a column, the axis from which you want to remove something is axis 1. When you are aggregating something, you are specifying the axis along which you want the aggregation to occur. Thus if you want to aggregate all rows with the mean function, the axis along which you are aggregating is axis 0. The result is that you have a mean of each column, but the key point is that you aggregated all rows, and the row axis is 0. Hope that helps!

    • @nadavkedem4649
      @nadavkedem4649 5 лет назад +1

      @@dataschool It is still somewhat counter-intuitive, but this is the way it is. Anyhow, thank you for your quick answer and for this entire great RUclips channel. You are doing a great work!

    • @dataschool
      @dataschool  5 лет назад

      Thanks so much for your kind words!

  • @sumanshrestha6585
    @sumanshrestha6585 7 лет назад

    One of the column of Dataframe contains integer, float and missing value(i.e empty) and its dtype shows 'object' . How do i iterate to get part of the column to analyse ? Please help

    • @dataschool
      @dataschool  7 лет назад

      When you say "get part of the column to analyse", what exactly do you mean? Could you give a specific example? Thanks!

  • @paresdas1530
    @paresdas1530 4 года назад

    Your presentation is fantastic .....how can I see you all presentation on Pandas in series at one sitting?

    • @dataschool
      @dataschool  4 года назад

      Is this what you're looking for? ruclips.net/p/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y

  • @HaiderAli-lr9fw
    @HaiderAli-lr9fw 4 года назад +1

    Thanks for wonderful explanation.I just have a question "Does the behaviour of axis parameter changes according to methods ".

    • @NeelSandellISAWESOME
      @NeelSandellISAWESOME 4 года назад

      Yeah it does. In the beginning when we wanted to drop, there was no "motion", but when we did the mathematical calculations, the axis was used as a "collapsor"

  • @dstraining2409
    @dstraining2409 7 лет назад +2

    Awesome...Awesome...Superb and thanks for this videos

  • @susmithaankireddy3375
    @susmithaankireddy3375 6 лет назад

    for example, sometimes we have country names as indexes and how we use boolean masking on data frame. how we get that index value.

    • @dataschool
      @dataschool  6 лет назад

      I'm sorry, I don't understand your question. Could you clarify? Thanks!

    • @susmithaankireddy3375
      @susmithaankireddy3375 6 лет назад

      what is Boolean Masking?

    • @dataschool
      @dataschool  6 лет назад

      A boolean mask is basically how you filter by condition. This video should help: ruclips.net/video/2AFGPdNn4FM/видео.html

  • @barbaradilucchio4251
    @barbaradilucchio4251 6 лет назад +1

    I really enjoy your videos and instruction. you are a really great instructor and I am always looking for your videos first because they make the most sense. Do you teach any courses? That would be great I would take them for sure. Thanks very much, Barbara Dilucchio

    • @dataschool
      @dataschool  6 лет назад

      Thanks! Here is the one course that I currently teach: www.dataschool.io/learn/
      You can hear about future courses by subscribing to my newsletter: www.dataschool.io/subscribe/

  • @vasileioskolias4728
    @vasileioskolias4728 8 лет назад +1

    Thanks again for another great video.
    My question is: are there any axes greater than 1, and if so what are they used for? Regards!

    • @dataschool
      @dataschool  8 лет назад +1

      You're welcome! Regarding your question, the only time (I can think of) when axis is greater than 1 is when using the Panel data structure, a container for 3-dimensional data: pandas.pydata.org/pandas-docs/stable/dsintro.html#panel

  • @esissthlm
    @esissthlm 7 лет назад +1

    Awesome tutorial, thank you!

    • @dataschool
      @dataschool  7 лет назад

      Glad it was helpful to you!

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: Video #11 _10(TC04:36)
    drinks.mean()
    ===========
    Kevin says, "Why did it give us the mean of each pandas SERIES?" He goes on to say, "The Default behavior of the mean is axis=0.
    Cf. (compare)
    drinks.mean() drinks.means(axis=0)
    ===================================
    Both produce the same result. Think why?

  • @subramanianramajayam2467
    @subramanianramajayam2467 4 года назад

    how do we add columns'rows ?

  • @LonglongFeng
    @LonglongFeng 7 лет назад

    in sklearn. preprocessing, I used normalize(), which has 'axis=' parameter, I experiment with both axis=0 and axis=1. the model scores are quite different. axis=1 is the default. In practical work, should I try both 0 and 1 axis to capture the best model score?

    • @dataschool
      @dataschool  7 лет назад

      The axis parameter defines the direction along which the normalization should take place. Depending on your reason for normalization, there is a correct axis to use. So, I would not recommend trying both and picking which one works better. Rather, I would recommend figuring out whether you are trying to normalize samples or features, and then using the axis which does that. Hope that helps!

  • @rahulagrawal5074
    @rahulagrawal5074 4 года назад

    Sir... Love your series.... 👌
    Hats off to you
    I just have a question..
    See .drop(axis=1) removes a coloumn
    Whereas .sum(axis=1) gives sum of a row. I'm a bit confused. Can u help??

  • @shobharoy2033
    @shobharoy2033 7 лет назад

    I face a problem when I try to create conditional loops(if,elif,else statements) while using pandas. The error says "Truth value of a series is ambiguous". Any ideas how to fix it?

    • @dataschool
      @dataschool  7 лет назад

      I think if you search Stack Overflow with that error message, you will find some answers that explain what you are doing wrong. Good luck!

  • @exili
    @exili 6 лет назад

    Your tutorials are great! I am just learning Jupyter/pandas and looking forward to learning more. I have a quick question on how to drop multiple rows? For example, I have a csv file in which every other row is blank, this reads as NaN, and just clutters the DataFrame, I have not been successful in removing/deleting them.

    • @dataschool
      @dataschool  6 лет назад

      One of these videos might help:
      ruclips.net/video/fCMrO_VzeL8/видео.html
      ruclips.net/video/2AFGPdNn4FM/видео.html

  • @BobBelford
    @BobBelford 3 месяца назад

    OK, I am confused
    print(drinks.mean(axis=1, numeric_only = True).head())
    print(drinks.drop(["country","continent"], axis=1).mean())
    In the first line I have axis=1 and it gives the mean of each row
    In the second line I have axis=1 and it gives the mean of each column
    I am assuming pandas axis comes from 2D Numpy array, and axis 0=row or x value and axis 1=column or y value
    My confused guess is somehow in the first command we are going down the column and taking the mean of every row, so the one refers to columns we are somehow iterating through
    While in the second, we removed two columns that had strings, and then took the mean of each columns, which makes more sense to me.
    Does someone have a better way to explain this?

  • @apAKALIptoos
    @apAKALIptoos Год назад

    how can we drop rows 20 to 30?

  • @mazkaibil9108
    @mazkaibil9108 6 лет назад

    Hello, how do i create a web scraping application in python? Thank you!

    • @dataschool
      @dataschool  6 лет назад

      This should help: ruclips.net/p/PL5-da3qGB5IDbOi0g5WFh1YPDNzXw4LNL

  • @sundeepradhakrishnan8187
    @sundeepradhakrishnan8187 7 лет назад

    HI Kevin ,thank you for the video.
    How can we retrieve the column or row dropped with the inplace=True parameter.

    • @dataschool
      @dataschool  7 лет назад

      There is no way to retrieve a column or row that has already been dropped. Sorry!

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: Video #11 _6(TC02:10)
    drinks.drop('continent', axis=1).head()
    --------------------------------------------------------------
    Kevin says: "I did not use the > parameter, so it did NOT actually REMOVE the COLUMN."

  • @suchitraoak1815
    @suchitraoak1815 6 лет назад

    Is there a concept of nested dataframes? for example, if a dataframe had 3 columns, can the 3rd column be a dataframe for each row?

    • @dataschool
      @dataschool  6 лет назад

      I'm not quite sure... you can definitely have a Series that contains Python objects like lists or dictionaries, however. And you can use multi-level indexing.

  • @rryann088
    @rryann088 3 года назад

    so clear and so smooth

  • @Renan-st1zb
    @Renan-st1zb 8 лет назад

    Hi Kevin! Thank you for the class, it was great!
    Interesting... You have dropped the row 2 (drinks;drop(2, axis=0). However, when you use the mean by row, you can still can see the mean of this row. I thought that when you drop the row you would not be able to return any values regarding this row.Apparently, I was wrong and the drop does not invalidate other function like ".mean()".

    • @Renan-st1zb
      @Renan-st1zb 8 лет назад

      Kevin, I realized that you did not made an attribution. Something like: "drinks = drinks.drop(2, axis=0)" instead of the operation "drinks.drop(2, axis=0)". Thanks again!!

    • @dataschool
      @dataschool  8 лет назад +1

      That's correct, I didn't overwrite the original drinks object, or perform the operation "inplace". Thus, the DataFrame didn't change.
      Glad you like the video!

  • @asneogy
    @asneogy 8 лет назад

    The df.mean() you illustrated was a great example of applying a single function to multiple columns at once. This is quite handy in many operations. On the other hand, is there an efficient way to apply multiple functions to multiple columns? Say, I wanted to do mean of beer_servings, sum of wine_servings, sd of spirit_servings and median of total alcohol?

    • @dataschool
      @dataschool  8 лет назад

      I think it would be best to apply each of those functions in separate steps.

  • @susmithaankireddy3375
    @susmithaankireddy3375 6 лет назад

    what is Boolean masking ?
    and How get index which a type of String.

    • @dataschool
      @dataschool  6 лет назад

      I'm sorry, I don't understand your question. Could you clarify? Thanks!

  • @RichardGreco
    @RichardGreco 4 года назад

    Can there be more than 2 axis (0 and 1) in a Pandas dataframe? I guess more fundamentally can there be more than 2 dimensional datasets and can axis be used to point to it? Maybe I'm talking about a hierarchical dataframe.

    • @dataschool
      @dataschool  4 года назад

      Yes, you can have more than 2 axes. Yes, that is a hierarchical index aka MultiIndex. Hope that helps!

  • @RalphNgOfficial
    @RalphNgOfficial 4 года назад

    Great video. By the way, do you have a course about Joining CSV (different structure but common key) ? I'm keen to learn about this

  • @beaunorgeot5427
    @beaunorgeot5427 8 лет назад

    Hi Kevin, thanks for the video. As others have noted: The axis coordinates for drop() and mean() appear to have opposite behaviors.
    For drop('name', axis=1) the method scans in row-wise direction until it finds a column named 'name' and then drops all of values in a column-wise direction.
    For mean(axis =1) the method scans in a column-wise direction and then calculates the mean of all the values in a row-wise direction.
    Why would anyone think it would be a good idea to write methods that have opposite behaviors for referencing and operating? I'm sure there's a reason, but I can't see it.

    • @dataschool
      @dataschool  8 лет назад +2

      I don't believe they have opposite behaviors, but I certainly understand that viewpoint. Here is how I think about it:
      0 is the row axis, and 1 is the column axis. When you drop with axis=1, that means drop a column. When you take the mean with axis=1, that means the operation should "move across" the column axis, which produces row means.
      In other words, I think of a mean as an operation that "scans", and I think of a drop as an operation that an operation that "selects". However, if you think of both operations as scanning, then I agree that the behaviors would seem to be opposite.
      Hope that helps!

  • @xiaoyuhuang536
    @xiaoyuhuang536 7 лет назад

    Hey Kevin, thanks for the video. I have a question, when applying a function to a df,
    pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html,
    it says:
    "axis : {0 or ‘index’, 1 or ‘columns’}, default 0
    0 or ‘index’: apply function to each column
    1 or ‘columns’: apply function to each row",
    How to understand this? When axist = 0, shouldn't the function be applied to each row instead of each column?

    • @dataschool
      @dataschool  7 лет назад

      I know it's confusing! Basically, when you use apply with axis=0, you are saying you want to apply a function along axis 0, which means on each column. I talk about it more in this video: ruclips.net/video/P_q0tkYqvSk/видео.html
      Hope that helps!

  • @Zoronoa01
    @Zoronoa01 2 года назад

    Finally! Thank you so much

  • @jaikishank
    @jaikishank 4 года назад

    when i tried to import the csv by
    drinks=pd.read_csv('bit.ly/drinksbycountry')
    i am unable to import to the dataframe and throwing error as below in the last line of a very long message.
    Any help please to use the dataframe to follow thro the Video

  • @diogodias2634
    @diogodias2634 6 лет назад +1

    Thank you, you are the best!!!

  • @chidibede2417
    @chidibede2417 7 лет назад +4

    You are the best

  • @shreddersengupta7384
    @shreddersengupta7384 4 года назад

    Great video. There is always issue understand axis when it comes to 3D array :(

  • @sriharim1596
    @sriharim1596 3 года назад

    Some cases we use axis =0 for row ... for some area axis = 1 for rows... its confusing....

    • @ItsWithinYou
      @ItsWithinYou 3 года назад +1

      0 is always for rows and 1 is always for columns. If you are confused, you can use the alternative like he suggested...use 'index' for rows and 'columns' for columns...

  • @guruprakashsoma9143
    @guruprakashsoma9143 7 месяцев назад

    sir the mean function is not working for me

    • @dataschool
      @dataschool  7 месяцев назад +1

      You need to include the argument numeric_only=True, for example: drinks.mean(numeric_only=True). This is a new requirement in pandas for cases in which you want to calculate the mean of numeric rows or columns and the DataFrame contains non-numeric data. Hope that helps!

  • @Omprakash-nb3mk
    @Omprakash-nb3mk 7 лет назад

    can i drop both columns and rows at the same time ???

    • @dataschool
      @dataschool  7 лет назад

      I don't think that is possible, sorry!

  • @JayHandles-J8S
    @JayHandles-J8S 3 года назад

    Great explanation thankyou

  • @bainadeashish
    @bainadeashish 8 лет назад

    hey man,great videos..Thanks alot....i have question:) how i can compare two dataframe & highlight the difference?

    • @dataschool
      @dataschool  8 лет назад

      I can't think of an efficient way to do this, without knowing more specifics about the DataFrames. Do they have the same number of rows? Columns? Same index? Same column names? And so on. The solution would depend on those factors.

    • @bainadeashish
      @bainadeashish 8 лет назад

      dataframes have same column but rows might differ,
      i used below code which gives me difference but it is taking lot time .If you can tell some more efficient way than this then it would be really help me alot.
      def report_diff(x):
      return x[0] if x[0] == x[1] else '{} ---> {}'.format(*x)
      # We want to be able to easily tell which rows have changes
      def has_change(row):
      if "--->" in row.to_string():
      return "Y"
      else:
      return "N"
      # Read in both excel files
      df1 = pd.read_csv(r'FL_insurance_first.csv')#,index_col='policyID', parse_dates=True)
      df2 = pd.read_csv(r'FL_insurance_second.csv')
      # # Make sure we order by account number so the comparisons work
      df1.sort(columns="policyID")
      df1=df1.reindex()
      df2.sort(columns="policyID")
      df2=df2.reindex()
      # # Create a panel of the two dataframes
      diff_panel = pd.Panel(dict(df1=df1,df2=df2))
      # #Apply the diff function
      diff_output = diff_panel.apply(report_diff, axis=0)
      diff_output.to_csv(r'my-diff-1.csv',index=False)

    • @dataschool
      @dataschool  8 лет назад

      How about just:
      (df1 == df2).sum()
      That will tell you how many differences exist in each column, as long as there are the same number of rows and columns in df1 and df2.
      If that works, you should be able to use filtering to reveal the rows that are different (I'd have to experiment to figure out the exact code).
      Hope that helps!

  • @vaibhavkathale5527
    @vaibhavkathale5527 4 года назад

    Sir, How to convert a Excel file into a CSV?

    • @vaibhavkathale5527
      @vaibhavkathale5527 4 года назад

      Thanks a lot for your videos helping me a lot during lockdown

  • @JainmiahSk
    @JainmiahSk 6 лет назад

    Thanks for great videos. I executed the code : df.mean(axis='index') but the result I got is : Series([], dtype: float64) . Why this occured.

    • @dataschool
      @dataschool  6 лет назад

      That depends on what is contained in your DataFrame. Sorry, that's all I can say!

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: Video #11 _9(TC04:25
    )
    drinks.mean()
    ===========
    type(drinks.mean())
    ((pandas.core.series.Series))
    ==========================
    1) beer_serving: 106.16.....
    pandas.Series
    2) spirit_servings: 80.99....
    pandas.Series
    3) wine_servings: 49.45...
    pandas.Series
    4) total_liters_of_pure_alcohol: 4.71...pandas Series

  • @luqmanhakim7781
    @luqmanhakim7781 4 года назад

    how did you import csv file using bit.ly link that? isn't it need to be a csv file from local computer? i tried to search the link in the browser and it brings me to your github file.. but the link change to a longer link compared to what you writen.. how did you do that using bit.ly.. it really awesome :) anyone can teach me?

    • @dataschool
      @dataschool  4 года назад +1

      read_csv can read from a URL, not just a local file! The bit.ly link simply points to a CSV file that is hosted on GitHub. Does that help?

    • @luqmanhakim7781
      @luqmanhakim7781 4 года назад

      @@dataschool i think i understand.. so basically bit.ly is just a shorter version of the link for the file that is hosted in the github right? Thank you for the reply :) i really appreciate it

    • @dataschool
      @dataschool  4 года назад +1

      Exactly!

  • @nazmussalehin7512
    @nazmussalehin7512 8 лет назад

    Awesome tutorial :) Thanks :)

  • @fet1612
    @fet1612 6 лет назад

    Q&A Series: Video #11 _11(TC05:19)
    drinks.mean(axis=0)
    ================
    Kevin says, "The direction I want the operation to occur is DOWN."

  • @sammcgee5434
    @sammcgee5434 8 лет назад

    You are the best.Thanks