Matplotlib Boxplots | Creating Single and Multiple Boxplots in Python

Поделиться
HTML-код
  • Опубликовано: 23 окт 2024

Комментарии • 44

  • @mohammadkeshtkar9655
    @mohammadkeshtkar9655 3 года назад +4

    We are very lucky to be able to see these useful videos. Thank you Andy🙏🙏

  • @johnowusukonduah2305
    @johnowusukonduah2305 Год назад +1

    I always know my answer is certain with Andy! Thank you for your great videos, I've learnt a lot from you. You're a genius

  • @sabrinakadirova7084
    @sabrinakadirova7084 2 года назад +1

    I liked it so much! Please, keep doing such videos, you're saving my nerves..

    • @AndyMcDonald42
      @AndyMcDonald42  2 года назад +2

      Thanks. I have plenty more to come 😁

  • @alirezarahnama2096
    @alirezarahnama2096 8 месяцев назад

    Hi Andy! I have been trying to make a box plot with a simple break in y-axis and have not been able to. any tips?

  • @annadomas2484
    @annadomas2484 Год назад +1

    Thank you! I am learning and your videos help a lot! I tried to use your code for my dataset but I faced with an error and do not understand where is the problem. TypeError Traceback (most recent call last)
    in
    4
    5 for i, ax in enumerate(axes.flat):
    ----> 6 ax.boxplot(data1.iloc[:,i])
    7 ax.set_title(data1.columns[i], fontsize=20, fontweight='bold')
    8 ax.tick_params(axis='y', labelsize=14)
    TypeError: unsupported operand type(s) for +: 'method' and 'float'

  • @vito135c
    @vito135c Год назад +1

    Thanks.

  • @chisoo6903
    @chisoo6903 3 года назад +1

    after knowing the outlier in the boxplot , what is the python command we could use to remove them from our analysis?

    • @AndyMcDonald42
      @AndyMcDonald42  3 года назад +1

      Hi Chi, you can use a small piece of code, like the one below, to remove the outliers identified by the boxplot.
      #Calculate the Quartiles
      Q1 = df.quantile(0.25)
      Q3 = df.quantile(0.75)
      #Calculate the IQR
      IQR = Q3 - Q1
      #Remove the outliers
      df_clean = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
      Source: stackoverflow.com/questions/50461349/how-to-remove-outlier-from-dataframe-using-iqr

  • @19neetish
    @19neetish Год назад

    Hi, Could it be possible that using a box plot and interquartile range may not always be a good idea? for example, the formation can have n number of combinations, and fluid properties may vary too. It may result is a very wide data spread. Could it be possible that a point outside the range might be true and represent a unique rock type? Shouldn't we confirm that from the mud log?

    • @AndyMcDonald42
      @AndyMcDonald42  Год назад +1

      Yes. That is very possible. Any outliers detected by these methods should always be checked to confirm that they are real outliers. When applying boxplots to petrophysical data I often do it by filtering for specific formations/ rock types.
      The key is not to use one method in isolation. Same principle as not trying to do an analysis based on a single curve.

    • @19neetish
      @19neetish Год назад

      @@AndyMcDonald42 In the case of this field. Would you suggest doing the outlier analysis based on the geological age of the rock? This data is present in the dataset.
      Also, is it possible to figure out whether the log data is processed or not? I mean whether all the necessary correction has been applied by the logging company or not? Just looking at the PFE data, I can see mud has barite, and PEF readings are off the chart. It makes me think resistivity and other data might not have been corrected for borhole environment too. That would definitely mess up the model training.

  • @slee3083
    @slee3083 2 года назад

    Hi Andy, looking at the last exercise using subplots, would this still work if the columns had a different number of data points from each other? I've tried similar to this video except with reading a simple csv file containing a few columns of data, with some columns having more data points than others, and the box plots with less data points (NaN) just don't show up at the end. Is there a way around this? If I plot the data separately or on the same graph (same axis) it has no problem, but only some of the subplots with fewer data points just wouldn't plot at all. Thanks

    • @AndyMcDonald42
      @AndyMcDonald42  2 года назад

      Hi S Lee. I am not 100%certain on this and would have to try. But some plots don’t handle nan values and you unfortunately have to remove them by dropping them.
      This seems to be the case with this stackoverflow question which sounds similar to what you are experiencing
      stackoverflow.com/questions/44305873/how-to-deal-with-nan-value-when-plot-boxplot-using-python

  • @mjones410
    @mjones410 2 года назад

    super helpful thank you Andy

  • @espanolaturitmoint
    @espanolaturitmoint 2 года назад

    Hi, thanks a lot for the content! I need help with a boxplot... Could you tell me how you can show the points inside the boxplot and annotate a number for each point? I have a dataset only of 49 points

    • @AndyMcDonald42
      @AndyMcDonald42  2 года назад

      No problem. One way to do that is add a jitter plot on top of the box plot. I’m not so sure annotating each point would be a good idea as it may become too cluttered.
      You can see an example here
      www.python-graph-gallery.com/36-add-jitter-over-boxplot-seaborn

  • @iliusmondal2098
    @iliusmondal2098 2 года назад

    Hi Andy, Is there any way to remove the outliers?

    • @AndyMcDonald42
      @AndyMcDonald42  2 года назад

      Yes there is. You can apply the boxplot equations to a dataframe and remove points that way : datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot

  • @balajig8522
    @balajig8522 2 года назад

    Really nice vedio! please share the original DataFrame you used

  • @yippiyee1
    @yippiyee1 2 года назад

    Thanks for the informative video.

  • @anamalbulushi5332
    @anamalbulushi5332 3 года назад

    Thank you Andy 👍🏻

  • @josedavidbastoaguirre2099
    @josedavidbastoaguirre2099 3 года назад

    Really nice video! Thanks.
    It would be great if you could also explain how to interpret the graphics. for instance, what is the meaning of having a lot of outliers in GR Log.
    Again, thank you very much.

    • @josedavidbastoaguirre2099
      @josedavidbastoaguirre2099 3 года назад +1

      I mean... probably some of them are just wrong data, but maybe some outliers represent a particular lithology.

    • @AndyMcDonald42
      @AndyMcDonald42  3 года назад +1

      Thanks Jose. I am planning to cover that in a small series on outlier detection in the near future. These initial videos are focusing on how to create the plots with Python.
      I also covered this topic very briefly at this years SPWLA conference and in more detail in my Data Quality paper, which you can find at the link below.
      www.researchgate.net/publication/351607547_Data_Quality_Considerations_for_Petrophysical_Machine_Learning_Models
      You are correct that some of the outliers could be incorrectly measured data, which could be a result of tool/sensor issues, borehole washout, system issues...etc. But they could potentially reflect a particular lithology, for example a spike in the GR data may be caused by a hot sand/hot shale. That is why we need to treat some of these outlier detection methods with caution and also use our domain expertise to make the final decision.

  • @timut1830
    @timut1830 2 года назад

    Thank you so much for your video!

  • @kararshah6056
    @kararshah6056 2 года назад

    man u explained sooooooooooo good

  • @coldtea9755
    @coldtea9755 2 года назад

    Thank you really helpful

  • @victorjohnlaobena7099
    @victorjohnlaobena7099 6 месяцев назад

    help me out alot than you!😀😀😀

  • @nzambabignoumba445
    @nzambabignoumba445 3 года назад

    Thank you!!

  • @cypherecon5989
    @cypherecon5989 2 года назад

    data["income"].plot(kind="box"); but it doesnt show me the y and x axis. Does anybody know why that is?

    • @cypherecon5989
      @cypherecon5989 2 года назад

      5:01 even with the plt. command the boxplot gets plotted but without y and x axis...

    • @AndyMcDonald42
      @AndyMcDonald42  2 года назад

      I’m not sure. Have you checked over your data to make sure it’s ok and you are calling the correct column? I believe anything like nans should be handled by the plotting.
      If you are still having trouble Stackoverflow is a great place to get help and it allows you to share your code and data, which you can’t really do here

    • @cypherecon5989
      @cypherecon5989 2 года назад

      @@AndyMcDonald42 it was my dark theme. I had to do plt.figure(facecolor="white"). :D

    • @AndyMcDonald42
      @AndyMcDonald42  2 года назад +1

      @@cypherecon5989 Glad you got it sorted. Its always the small things that catches us out. 😁

  • @gamuchiraindawana2827
    @gamuchiraindawana2827 7 месяцев назад

    Lovely

  • @sanisalisu4929
    @sanisalisu4929 5 месяцев назад

    I can send you the data and the type of boxplot Im talkig about

  • @GreyHatGenX
    @GreyHatGenX Год назад

    commnet