Merge, Join, Append, Concat - Pandas

Поделиться
HTML-код
  • Опубликовано: 18 сен 2024
  • “There should be one-and preferably only one-obvious way to do it,” - Zen of Python. I certainly wish that were the case with pandas. In reading the docs it feels like there are a thousand ways to do each operation. And it is hard to tell if they do the exact same thing or which one you should use. That's why I made An Opinionated Guide to pandas-to present you one consistent (and a bit opinionated) way of doing data science with pandas and cut out all the confusion and cruft.
    I'll talk about which methods I use, why I use them and most importantly tell you the stuff that I've never touched in my years of data science practice. If this sounds helpful to you then please watch and provide feedback in your comments.
    This series is beginner-friendly but aimed most directly at intermediate users.
    “Opinionated Guide-Combining DataFrames” GitHub repo:
    github.com/kna...
    Helpful links:
    pandas.DataFrame. merge(), concat(): pandas.pydata....
    SQL joins: www.w3schools....
    Link to GitHub repo including environment setup for tutorials: github.com/kna...
    PEP 20 - The Zen of Python link: www.python.org...

Комментарии • 31

  • @luisabriful
    @luisabriful 4 года назад +8

    Thank you so much for your video! I've just started learning python for data science as self thought and found your videos by chance . I've watched several python tutorials so far but yours are particularly good. Clear explanation, good quality video, easy to follow, not too long, not too short and you go staringht to the point. Very well done and thanks for sharing! 👏

  • @Moiez101
    @Moiez101 2 года назад +2

    Budding data analyst here currently learning Panda on udemy(just finished up with Numpy) - Love your opionated take on Pandas! I was confused on what cases and why I should use join, merge, concat etc.

  • @arhataria
    @arhataria 4 года назад +10

    Lecture notes - Merge, Join, Append, Concat
    1. Merge(=join)
    -df.reset_index() - before or after merging -> can turn it into useful format
    -specify the on=['a','b'], then pandas will pick it up
    -inner/outer
    -indicator=True -> show you what type of merge was done
    -columns with the same name -> suffixes=('_left', '_right")
    2. Concat(=append)
    -multiple dataframes / merge(stack) rows together
    -keys=['from1','from2'] : add an extra index(I'd like to know where that data source come from)

  • @aaronbaldwin2845
    @aaronbaldwin2845 3 года назад

    Thanks for your video and examples. Very clear.

  • @annainsf2561
    @annainsf2561 3 года назад +1

    Would like to see your video about "combine_first, merge_ordered, merge_asof".

  • @TheWhyNotSeries
    @TheWhyNotSeries 2 года назад

    Thanks super clear

  • @DisturbeD802
    @DisturbeD802 Год назад +1

    3:42 I dont understand what index means sometimes people refere to index kinda like to a primary key autoincrement in sql but sometimes on something completly different im not a native english speaker and that is what tripps me up, to my eye at 3:42 theres 3 columns sex, smoker and tip , and when you do the merge u specify right_index=True how am i supposed to know what that referes to , lets assume theres more columns than just 3 how would u join the tables in that case with merge , i just started learning about pandas btw

  • @andresmc210
    @andresmc210 4 года назад +1

    Cool videos, thanks!

  • @fabio336ful
    @fabio336ful 2 года назад

    It was really helpful, thank you.

  • @barbaralucia4990
    @barbaralucia4990 3 года назад

    Great video!

  • @divyanshrana4257
    @divyanshrana4257 2 года назад

    please clarify that if we use merge for he same category and keys ...will the values assigned to them will also add up or not.

  • @haneulkim4902
    @haneulkim4902 4 года назад +3

    Great video! do u know reason pandas created concat and append if they are the same thing?

    • @DataTalks
      @DataTalks  4 года назад +6

      Great question!
      First off - definitely no shade on pandas. They are an awesome library and having different ways to do the same thing can be helpful!
      Ultimately I think there are two reasons:
      1) Pandas generally likes to have an "Object Oriented" and a "Functional" way to do things (loosely defined). This means there is a way that will look like: df.something and another that will look like pd.something. For example: pd.merge and df.join. Or pd.concat and df.append.
      2) Append came before concat (see here: pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#concatenating-using-append). Instead of changing append to be more flexible or overloading the method, they decided to create a new one .
      All that being said - if you like append instead of concat, go for it. My preference is concat because it is more expressive (and I'm a functional type of person) - but really to each there own.
      My best friend still uses df.join instead of pd.merge ¯\_(ツ)_/¯

  • @raml24
    @raml24 3 года назад

    hey..! thanks for the video.i am just trying append a newsheet to excel using pandas.it should create a sheet by running the code and old sheets data should be there.can you guide?

  • @omerfarukelma9320
    @omerfarukelma9320 3 года назад

    hello, how can i add df coloumn to geodataframe?

  • @TopicalAuthority
    @TopicalAuthority 3 года назад +1

    Skip the first 2 minutes before watching.

  • @niteshmaruthi8784
    @niteshmaruthi8784 4 года назад

    Great video - thank you. You can maybe add 3 way joins as well?

  • @wesleymutalekapolyo3909
    @wesleymutalekapolyo3909 3 года назад +1

    Thank you very much, you have saved me. I have a question, How do you merge two data frames with different date format?

    • @DataTalks
      @DataTalks  3 года назад

      you'll need to extract the parts of the date that you want to merge on and add them to the index! That'll be the best way to merge

  • @akzork
    @akzork 2 года назад +1

    I think I am gonna jump on the merging bandwagon.

  • @ManoharNathGuptaMann
    @ManoharNathGuptaMann 4 года назад

    Can we write sas macro in python

  • @miraraudhatuljannah
    @miraraudhatuljannah 4 года назад

    merge done

  • @李爽-s4h
    @李爽-s4h 4 года назад

    is it possible to fully replace concat with merge?

    • @DataTalks
      @DataTalks  4 года назад

      Unfortunately not. You can use join and merge interchangeably. You can also use append and concat interchangeably. But you can't replace concat with merge :(

    • @李爽-s4h
      @李爽-s4h 4 года назад

      Data Talks I am not sure if I should understand in this way that merge/join requires at lease one col or row overlapping so as to join up two matrices via matching values in the overlapped cols or rows, while concat/append has no such limitation that it only placing two matrices in a larger matrix and if there are no overlapping cols or rows. Therefore, on the output of such two funcs, merge actually blend two matrix of data (by aligning up matched values), making the count of total elements in the output equal to or less than the sum of two imports, whereas concat spits out exactly the same amount of elements. Am I right?

    • @DataTalks
      @DataTalks  4 года назад +2

      ​@@李爽-s4h Yeah great question! There are two ways that you can combine data frames (matrixes) in pandas. You can combine them row wise or column wise. If A has x rows and B has y rows their result will be a dataframe with x + y rows. Column wise will mean that if A and B have x and y non-overlapping columns the result will have x+y overlapping columns.
      The diffs between concat and merge are:
      1) merge is for (complex or simple) column wise combinations of two data frames
      2) concat is for simple column and row wise combinations of 2 or more data frames
      Hope that helps!

    • @李爽-s4h
      @李爽-s4h 4 года назад

      Data Talks thx a lot, been puzzled for quite a time. Thanks again

  • @juandiegogomezbolivar6738
    @juandiegogomezbolivar6738 2 года назад

    Por qué no nací hablando inglés 😭😭😭