How do I merge DataFrames in pandas?

Поделиться
HTML-код
  • Опубликовано: 27 янв 2025

Комментарии • 294

  • @milindbebarta2226
    @milindbebarta2226 3 года назад +14

    You are really good at explaining things. One of the better teachers on youtube. Thanks a ton for this video and I hope there's more coming.

  • @NoName-qx9zc
    @NoName-qx9zc 4 года назад +16

    I'd like to thank the author. You really do a great job. Everything is structured, decomposed and coherent. Some guys just jump in complex coding without really explaining what's going on there.

  • @LuisRivera-ce9lm
    @LuisRivera-ce9lm 2 года назад +3

    I just wanted to thank you for such a great explanation of joins. I did not have it explained to me and struggled for the longest time to understand them. It takes a good teacher and someone who can understand it simply for one to understand it. Seriously, you are amazing!!

  • @amrita301157
    @amrita301157 2 года назад +2

    This is one of the best ever videos on pandas functions that I have watched. Well done Data School. I will look forward to more such videos.

  • @mschuer100
    @mschuer100 4 года назад +7

    This, by far, is the best explanation of these concepts. Thanks for sharing.

    • @dataschool
      @dataschool  4 года назад

      Wow, thank you so much for your kind words! 🙏

  • @lolkids7833
    @lolkids7833 4 года назад +9

    Thanks, Kevin.. this is the clearest explanation of the merge I have seen.

  • @lenny95sehs
    @lenny95sehs 2 года назад

    Your videos are fantastic. I really appreciate the simple "common sense" approach to the teaching. It is quite easy for instructors to dive right into python lingo.

    • @dataschool
      @dataschool  2 года назад

      Thank you so much for your kind words!

  • @gardnmi
    @gardnmi 5 лет назад +4

    The best new feature with merge is the validate option to make sure your join is 1:1, 1:M, etc. This is very useful for machine learning projects or end user reports that rely on upstream data that is updated regularly. It's saved me headaches a few times.

    • @dataschool
      @dataschool  5 лет назад +3

      The "validate" option is great, I agree! I also like "indicator", which I explained here: twitter.com/justmarkham/status/1153653794829418496

  • @pogoclub8495
    @pogoclub8495 4 года назад

    This is 1st time i walked into your video and i am very much impressed by your explaination and your english speaking pace is perfect. loved your content. Thanks a lot. :)

    • @dataschool
      @dataschool  4 года назад

      Thanks so much for your kind words!

  • @summerzhang9484
    @summerzhang9484 4 года назад +12

    Thanks for the videos Kevin! I love your teaching style and how you make each concept so crystal clear. Please keep making these videos! Just signed up to become a patron of yours and am taking your course on Data Camp (I wish you taught more courses on there!) Once I master Pandas will try out your machine learning course too :) ps your son is so adorable

    • @dataschool
      @dataschool  4 года назад

      You are too kind, Summer! Thank you SO much for your kind words AND for becoming a patron! 🙌

  • @JesperHolmPedersen
    @JesperHolmPedersen 4 года назад +1

    Supercool. Very impressive how you manage to explain the pretty complicated functionality of merge. Thanks.

  • @danielchacreton2401
    @danielchacreton2401 4 года назад

    Your videos are always amazing. You are a national treasure in my book. Don't change a thing, but for viewer 1.75 speed is the speed to watch these in.

  • @vipinamar8323
    @vipinamar8323 4 года назад +1

    Nice teaching method. precision over pace.

  • @CristianBittel
    @CristianBittel 4 года назад +2

    Great as teacher, calm, taking your time to clearly explain fundamentals!

    • @dataschool
      @dataschool  4 года назад

      Thanks so much for your kind words, I truly appreciate it!

  • @DJoeyJordi5on
    @DJoeyJordi5on 3 года назад +1

    This was a well-paced, clear and complete explanation of the topic, thank you very much! It helped me a lot

    • @dataschool
      @dataschool  3 года назад +1

      That's awesome to hear!

  • @sanjay123644
    @sanjay123644 3 года назад +1

    Excellent way of teaching. Thanks Kevin

    • @dataschool
      @dataschool  3 года назад

      Glad it was helpful! 🙌

  • @kickassbass
    @kickassbass 4 года назад

    Kevin you are a super hero of Data science, best videos on tube...

  • @jonass7456
    @jonass7456 3 года назад +3

    Dude! Let me tell you, you saved me a lot of time and work! Thank you so much!

  • @vitoroliveira6363
    @vitoroliveira6363 4 года назад +8

    wonderfull, loved your slow passed english, that helped me a lot

  • @shashi_kamal_chakraborty
    @shashi_kamal_chakraborty 2 года назад +1

    Thanks! very nicely explained. Now, I can perform joins using Pandas, quite effortlessly.

  • @autonish
    @autonish 4 года назад +4

    Brilliant Stuff, All videos are awesome. Clearly explained all fundamentals...Thanks for making this stuff easy.
    On a different line, you remind me of "Sheldon" from the TV series The Big bang theory and this is a compliment. :)

    • @dataschool
      @dataschool  4 года назад

      Ha! So many people have said that 😄

  • @rawanfouda2291
    @rawanfouda2291 4 года назад +11

    That was honestly really good! thank you so much for your work

  • @brunoreighner1780
    @brunoreighner1780 4 года назад +2

    You're an amazing teacher. Thanks a lot for these.

  • @citizen_deb
    @citizen_deb 4 года назад +1

    Thank you so much Kevin, your neat explanation along with the file you share makes it so clear, was really needing it!

  • @mr.stemedutv5514
    @mr.stemedutv5514 4 года назад +1

    Very easy to follow, and thanks for making very useful video!

  • @jcbcorner8464
    @jcbcorner8464 4 года назад

    Finally a clear explaination of merge function !! Thanks, subscribed

  • @da_ta
    @da_ta 5 лет назад +6

    Thanks Kevin I have been looking for this for long time!

    • @dataschool
      @dataschool  4 года назад +1

      Awesome! I'm so glad to hear this is the video you needed! 🙌

  • @BC-gc7bv
    @BC-gc7bv 4 года назад +1

    You are an excellent teacher!!! I'm a fan. TY.

  • @saraghafelehbashi5808
    @saraghafelehbashi5808 3 года назад +1

    Please keep making these videos! You are awesome!

  • @tommonks2490
    @tommonks2490 4 года назад +1

    Excellently explained as always. Keep up the great work!!

  • @avelinoamado4568
    @avelinoamado4568 3 года назад +1

    This video was very helpful and clear. Thank you for this content.

  • @bongi_nkuna
    @bongi_nkuna 3 года назад

    This video is pure GOLD, absolutely wonderful, loved the clear explanations , thank you...

    • @dataschool
      @dataschool  3 года назад

      So glad to hear it was helpful to you! 🙌

  • @hieungotrung5411
    @hieungotrung5411 5 лет назад

    Great to see you again as well as your high-quality content in your video

    • @dataschool
      @dataschool  5 лет назад

      Thanks so much for your kind words! 😄

  • @NiireNolweva
    @NiireNolweva 3 года назад +1

    Very clear and informative. Thank you very much.

  • @TheNobody04
    @TheNobody04 3 года назад

    Wow, I've seen some of your videos and I just can say THANK YOU. It's so easy to understand you :3

    • @dataschool
      @dataschool  3 года назад

      Thanks for your kind words! Glad you like my videos!

  • @SamSam-mh5jt
    @SamSam-mh5jt 3 года назад

    Thank you so much for the clear and concise explanation

  • @joseluisbeltramone599
    @joseluisbeltramone599 3 года назад +1

    Thank you very much for the precise explanation, just what I needed to know!

    • @dataschool
      @dataschool  3 года назад

      You're very welcome! 🙏

  • @JustJoelTV
    @JustJoelTV 2 года назад +1

    Great video, informative and clear. Thanks

  • @cutestbear3327
    @cutestbear3327 Год назад

    thnx for the video, that's awesome, particular the parts on explaining joins. clear and concise

  • @gregf9160
    @gregf9160 4 года назад +2

    Thank you so much for the concise clear explanation. Much appreciated.

  • @feroncia
    @feroncia 4 года назад

    Thank you so much for explaining it clearly. Now I understand on merging dataframe more. TQVM

  • @svengunther7653
    @svengunther7653 4 года назад +3

    You are doing a really great job with this. Thank you so much! :)

  • @nowyouknow2249
    @nowyouknow2249 5 лет назад +1

    Thanks a lot Kevin
    We have missed you.

  • @JainmiahSk
    @JainmiahSk 5 лет назад +2

    Good to see you. I love the logic you teach.

    • @dataschool
      @dataschool  5 лет назад +1

      Thank you! Glad my videos are helpful to you 👍

  • @jalego800
    @jalego800 Год назад

    Hi Kevin, thanks to your turtoring, I learn a lot from your channel, it's amazing! Since I just learn Pandas, I'm a little bit confused about concat(), melt(), merge(), pivot(), stack()...They're really annoying to me >< I really hope we have a one for all solution of how to use these functions XD Thank you!

    • @dataschool
      @dataschool  Год назад

      I agree, it's tricky to separate out when you should use each one of those!

  • @zapy422
    @zapy422 4 года назад +1

    Thank you for this video.
    I have been struggling with merge and concat today :)

    • @dataschool
      @dataschool  4 года назад +1

      You're very welcome! Glad it's helpful to you!

  • @g.jignacio
    @g.jignacio 4 года назад

    Excelent video! keep sharing content like this. Greetings from Argentina

  • @alankarshukla4385
    @alankarshukla4385 5 лет назад +1

    Not wait too much to watch this.

    • @dataschool
      @dataschool  5 лет назад +1

      I hope the video is helpful to you!

  • @maxvinella941
    @maxvinella941 5 лет назад +1

    Missing your pandas tutorials.. thanks

    • @dataschool
      @dataschool  5 лет назад

      It's nice to be missed! You can find all of my pandas tutorials here: ruclips.net/p/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y

  • @themustknowfacts510
    @themustknowfacts510 4 года назад +3

    I'm not able to read that file "u.item" , I copied the same code from GitHub but pandas wasn't able to read that. It showed me Unicode Error... How do I solve that issue..

    • @ChrisMao_708
      @ChrisMao_708 3 года назад

      insert this encoding='latin-1' and you will be fine

  • @mochammadirfanbaihaqi279
    @mochammadirfanbaihaqi279 3 года назад +1

    Love the way you explain it, thanks for your vids. Keep it up (thumbs)

  • @cmovilidad1
    @cmovilidad1 4 года назад +1

    Máster! Regresó! chévere.

  • @osmanhussein3893
    @osmanhussein3893 3 года назад +1

    This is very helpful. Thank you so much.

  • @omidadib5052
    @omidadib5052 3 года назад +1

    Awesome tutorial, Thank you very much man!

  • @akinsikuelizabeth5780
    @akinsikuelizabeth5780 4 года назад +1

    Superb!!!
    I got Evey explanation, thanks

  • @vijayreddy1730
    @vijayreddy1730 4 года назад

    Hi Kevin , First of all thanks for the wonderful lecturer , I am facing a problem to merge two data frames which i have shown you below ..
    Data frame 1:
    BackupServer BackupDay StartDate ClientName BackupStatus Backup re-run(Y/N) Incident Reason for the Backup Failures Backup Final Outcome
    RGSIBAK004 01-05-2020 2020-04-30 06:40:29 RGBPLNM110 Completed NaN NaN NaN NaN
    RGSIBAK004 01-05-2020 2020-04-30 06:53:07 RGPIAPP037 Completed NaN NaN NaN NaN
    RGSIBAK004 01-05-2020 2020-04-30 15:32:38 RGPIISD001 Failed Yes IN893523 VM disconnected Failed
    RGSIBAK004 01-05-2020 2020-04-30 18:00:08 RGPPFTP005 Completed NaN NaN NaN NaN
    RGSIBAK004 01-05-2020 2020-04-30 18:00:02 RGPQWEB069 Completed NaN NaN NaN NaN
    Data Frame 2 :
    BackupServer BackupDay StartDate Client Name Backup Status Backup Rerun (Y/N) Incident Failures Backup Final Result
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpqbda112.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc051.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc050.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc011.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpdbda105.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    Although the two data frames have three column names "Backupserver" , "Backupday" and start date ...the content in the columns is different and i am not able to merge these two data frames into one ? Can you help me on this?

  • @sch0ll1
    @sch0ll1 3 года назад +1

    Thanks man! You saved my weekend :*

  • @Isabel-ec2sq
    @Isabel-ec2sq 4 года назад +1

    Thank you!! I finally got the dataframe I wanted!

  • @AsMa-eg
    @AsMa-eg 3 года назад +1

    thank u so much. very clear and to the point.

  • @zezodiaa1025
    @zezodiaa1025 2 года назад

    great video. my question is when im working on project when exactly i have to combine ?

  • @juliakristavilladiego245
    @juliakristavilladiego245 4 года назад

    Thank you! Crystal clear explanation.

  • @Zahraa_005
    @Zahraa_005 4 года назад

    This is the best explanation
    Thanks so much!

  • @bilalahmad9177
    @bilalahmad9177 7 месяцев назад

    You are a great instructor. I have learned a lot from you regarding pandas.
    The video with title "How do I merge DataFrames in pandas?" has left some queries in my mind. I would be thankful to you if you clear those too.
    What type of join is used here movie_ratings = pd.merge(movies , ratings)?
    if it is inner join it should result in 1682 rows in total in movie_ratings dataframe, as movies dataframe has 1682 rows. But in video i have observed that movie_ratings results in 100,000 rows of data.

  • @SR-lf3ic
    @SR-lf3ic 2 года назад +1

    hi, when I used pd.concat([df1,df2]), I got a tuple object instead of a dataframe object. I am using Python 3.9 environment. I would like to know what should I do to get a dataframe object rather than a tuple object?

  • @fschmidkonz
    @fschmidkonz 4 года назад +1

    You're great teacher! I see the despite having a large 100K row file, the number of rows do not get expanded after the merge. They beautifully stay the same and just add the movie titles to the reviews. Can you comment on why this is not always the case. I have tried and my output file gets expanded by a few rows (17 out of 1000) and I have not been able to figure out why. I have checked multiple videos and some come absurd not practical solutions (like the files are the same size) or arbitrarily eliminate any dups (despite some may be valid rows), but none explain the reason and how to identify those rows that could be dups. Your comments are appreciated.

  • @adedolapoogungbire7088
    @adedolapoogungbire7088 3 года назад

    Just what I needed.

  • @dannylockett9445
    @dannylockett9445 2 года назад +1

    I really enjoy your tutorials, thanks so much! I have 5 csv files that come out daily each containing a date column. i want to merge them all using the date as the merge field. i tried a basic merge with 2 of the csv files and date was used as the merge-on field by default - so it worked. ultimately i just need one date column in my masterfile with all the other column data merged. should I continue to do this or is it better to set the date column as the index, or something else?

  • @Moc2Talk
    @Moc2Talk 4 года назад

    slowly talk is very helpfull to me. I have 2 questions. The first is : What's if i want merge only one certain column (rating) from df rating to df movie . The second: What's if I want to sum the rate of each Movie_Id . Tks you so much and looking for your answer.

  • @alndr4u
    @alndr4u 2 года назад +1

    How to merge two dataframes based on 4 common columns with repatative elements?

  • @jaysoni7812
    @jaysoni7812 4 года назад +1

    Were is the link of the data set which has been used in this video.
    I want to practice this with your data set can you please send me link?

  • @hectoralvarorojas1918
    @hectoralvarorojas1918 5 лет назад +2

    Great work as allways.
    Very useful.
    Thanks for sharing it!
    By the way, any chance you get some video done about PySpark? It will be very usefull to treat this from the biginning considering examples based on a local connection (one computer) first and then a couple of examples emulating a cluster connection.

    • @dataschool
      @dataschool  5 лет назад +1

      Thanks for your kind words as always, Hector! Sorry, I don't have any videos about PySpark, but I appreciate the suggestion! 👍

    • @hectoralvarorojas1918
      @hectoralvarorojas1918 5 лет назад +1

      @@dataschool I would love for you to do that. I am possitive that you will get a lot of interested guys, among them me of course.
      My best regards!

  • @jaydhanwant4072
    @jaydhanwant4072 4 года назад +1

    I wish we had 3x on youtube, great video!

  • @ayodejiakinfenwa
    @ayodejiakinfenwa 2 года назад

    Plesae i am trying to merge two datasets as you have explained but it is giving an error that i should check for duplicates

  • @eliasaudi2877
    @eliasaudi2877 3 года назад

    What would we use to show ONLY all the values that do not match ? .... i.e. anything other that inner join

  • @ramachalprajapati1176
    @ramachalprajapati1176 4 года назад

    How to get the common mobile number from two different csv file having the different column name

  • @cgpmth6449
    @cgpmth6449 2 года назад

    How to merge multiple large dataframes in a fast way? I joined with usual merge() but it seems too slow. I found a clue of using pandas.Index() with the merge method, but i don't know how to use it.

  • @Octaphea
    @Octaphea 2 года назад +1

    Great video. However I have a little issue. I have 3 data frames that I am trying to merge together. The first is a pretty long database with columns (cust_id, gained_on gained_from_supplier, lost_to_supplier, sales_channel_id) the second is the supplier data frame (supplier_name, supplier_id) what I am trying to do is merge the supplier id and name from the second data frame, to the database frame which has the ID so supplier id to the number using the lefton/right on but instead it returns both columns - the supplier ID and name of both dataframes. Then the same with the channel data frame (sales_channel_name, sales_channel_id) and merge this with the sales_channel_id in the database dataframe and show the name instead. Any help would be appreciated, thank you!

  • @ericmindyc
    @ericmindyc 3 года назад

    Hello. Great vid. But how do I follow along? Other videos had the bitly link. I can’t find the dataset for this exercise.

    • @dataschool
      @dataschool  3 года назад

      Datasets are here: github.com/justmarkham/pandas-videos/tree/master/data

  • @MIH20788
    @MIH20788 11 месяцев назад

    thanks Kevin, but where is the concat video

  • @zhalie12345
    @zhalie12345 4 года назад

    Thanks for the vid data school !

  • @shivamsaway6803
    @shivamsaway6803 4 года назад

    Does it happen while merging two data frames, only heads get to merge, No data get merged inside the new data frame?

  • @job2k6
    @job2k6 3 года назад

    Very helpful, thank you.

  • @sivababu2753
    @sivababu2753 2 года назад

    Thanks for the video,
    I have a query sir,
    Let's consider if I have a table 1 with features (order Id) and (product Id) and table 2 with features (order Id) and (product Id).How to fetch the observations which is present in table 1 that not present in table 2

    • @dataschool
      @dataschool  2 года назад

      Great question! See trick 16 in this video: ruclips.net/video/tWFQqaRtSQA/видео.html

  • @christleiroezi8878
    @christleiroezi8878 4 года назад

    I have a data frame and I have a list and a tuple , I want to merge all three together . I am aware merge can only do two tables at a time, but do you have any helpful hints on how to go about merged the table , list and df. I want make to make the result a new data frame

  • @vinayakchikkorde8151
    @vinayakchikkorde8151 3 года назад

    I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?

  • @nandineeuma6659
    @nandineeuma6659 3 года назад

    How to concatenate multiple row into single row separated by comma

  • @ДмитрийИгнатьев-з5т

    Hello, Many thanks for you tutorial. It's great!!! But i.m stuck is any techics to join two dataframes if one of them stack other not stack?

  • @donalike1206
    @donalike1206 3 года назад

    thank you so much! it really helped me

  • @mehnazjabeen
    @mehnazjabeen 3 года назад

    How to verify if all the columns are incorporated in the merged DataFrame by using simple comparison Operator in Python after merging two DataFrame?

  • @AnoNymous-dh2sv
    @AnoNymous-dh2sv 2 года назад

    What's the concat video? You say there is one, but I can't find it with search.

    • @dataschool
      @dataschool  2 года назад +1

      It's at the end of this video: ruclips.net/video/15q-is8P_H4/видео.html
      Hope that helps!

  • @ruthliganad8274
    @ruthliganad8274 4 года назад

    how about not a specific file? for example all .csv or all .tsv file? how to concatenate a header to that file? Thanks

  • @dhirajp4677
    @dhirajp4677 4 года назад

    Hello Data school,I need to convert below dataframe into datetime dtypes
    period
    0 28.02.2020 10:32:17:640
    1 28.02.2020 10:32:18:656
    2 28.02.2020 10:32:19:656
    3 28.02.2020 10:32:20:671
    4 28.02.2020 10:32:21:687
    5 28.02.2020 10:32:22:687
    6 28.02.2020 10:32:23:703
    df['period'] = pd.to_datetime(df['period'])
    i used above code but it is throwing error ValueError: ('Unknown string format:', '28.02.2020 10:32:17:640')
    how do i go ahead..?

  • @saikiranhr
    @saikiranhr 2 года назад

    Thanks for the amazing video. One simple question. How to join tables on multiple indices (like 4 or 5)?

  • @comparethis-p1g
    @comparethis-p1g 2 года назад

    the resulting dataset I got has a value of null. What do i do?

  • @hardikvegad3508
    @hardikvegad3508 4 года назад

    Sir if we have hundreds of columns without the name. Then how can we name them using pandas and a for loop or lambda function because if we try to name them using names=[] it will be a very time-consuming process. The name of the columns can be col1, col2 , col3...etc.

  • @wilsonmupfururirwa6523
    @wilsonmupfururirwa6523 4 года назад

    Hi wanted to ask how you check for data consistency in columns. Like checking for integers in a string column or trying to find values like 2A in a column with double letter values eg. AA, BB etc

    • @dataschool
      @dataschool  4 года назад

      Great question, though there's no "one way" to catch all of these issues! Here are some tricks that might be helpful, though: ruclips.net/video/RlIiVeig3hc/видео.html

  • @АлексейДуховный-ф1г
    @АлексейДуховный-ф1г 3 года назад +1

    Единственный англоговорящий человек, которого можно понять не зная английский

  • @BrotoBhattacharjee
    @BrotoBhattacharjee 4 года назад

    Awesome video

  • @philongtran9455
    @philongtran9455 3 года назад

    hello, i cant retrieve merged df in another cell, how can i fix that ?

  • @ОлександрГорбатюк-и7ж

    Thats great! Thank you so much!