How do I merge DataFrames in pandas?

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 295

  • @milindbebarta2226
    @milindbebarta2226 2 года назад +13

    You are really good at explaining things. One of the better teachers on youtube. Thanks a ton for this video and I hope there's more coming.

  • @NoName-qx9zc
    @NoName-qx9zc 4 года назад +15

    I'd like to thank the author. You really do a great job. Everything is structured, decomposed and coherent. Some guys just jump in complex coding without really explaining what's going on there.

  • @LuisRivera-ce9lm
    @LuisRivera-ce9lm 2 года назад +3

    I just wanted to thank you for such a great explanation of joins. I did not have it explained to me and struggled for the longest time to understand them. It takes a good teacher and someone who can understand it simply for one to understand it. Seriously, you are amazing!!

    • @dataschool
      @dataschool  2 года назад +1

      Thank you so much! 🙏

  • @mschuer100
    @mschuer100 4 года назад +6

    This, by far, is the best explanation of these concepts. Thanks for sharing.

    • @dataschool
      @dataschool  4 года назад

      Wow, thank you so much for your kind words! 🙏

  • @lolkids7833
    @lolkids7833 4 года назад +8

    Thanks, Kevin.. this is the clearest explanation of the merge I have seen.

  • @gardnmi
    @gardnmi 4 года назад +4

    The best new feature with merge is the validate option to make sure your join is 1:1, 1:M, etc. This is very useful for machine learning projects or end user reports that rely on upstream data that is updated regularly. It's saved me headaches a few times.

    • @dataschool
      @dataschool  4 года назад +3

      The "validate" option is great, I agree! I also like "indicator", which I explained here: twitter.com/justmarkham/status/1153653794829418496

  • @amrita301157
    @amrita301157 2 года назад +2

    This is one of the best ever videos on pandas functions that I have watched. Well done Data School. I will look forward to more such videos.

  • @summerzhang9484
    @summerzhang9484 4 года назад +12

    Thanks for the videos Kevin! I love your teaching style and how you make each concept so crystal clear. Please keep making these videos! Just signed up to become a patron of yours and am taking your course on Data Camp (I wish you taught more courses on there!) Once I master Pandas will try out your machine learning course too :) ps your son is so adorable

    • @dataschool
      @dataschool  4 года назад

      You are too kind, Summer! Thank you SO much for your kind words AND for becoming a patron! 🙌

  • @CristianBittel
    @CristianBittel 4 года назад +2

    Great as teacher, calm, taking your time to clearly explain fundamentals!

    • @dataschool
      @dataschool  4 года назад

      Thanks so much for your kind words, I truly appreciate it!

  • @rawanfouda2291
    @rawanfouda2291 4 года назад +11

    That was honestly really good! thank you so much for your work

  • @lenny95sehs
    @lenny95sehs Год назад

    Your videos are fantastic. I really appreciate the simple "common sense" approach to the teaching. It is quite easy for instructors to dive right into python lingo.

    • @dataschool
      @dataschool  Год назад

      Thank you so much for your kind words!

  • @UrBigSisKey
    @UrBigSisKey 2 года назад +1

    Really appreciate the work you put in for this video :) I've been working with dataframes for a few months, and this merge/join/concat always was a bit confusing for me. This video was perfect for clearing some of my doubts, and I'll definitely be back to reference again soon ;)

    • @dataschool
      @dataschool  2 года назад

      Thanks so much for your kind words! 🙏

  • @vitoroliveira6363
    @vitoroliveira6363 4 года назад +8

    wonderfull, loved your slow passed english, that helped me a lot

  • @JesperHolmPedersen
    @JesperHolmPedersen 4 года назад +1

    Supercool. Very impressive how you manage to explain the pretty complicated functionality of merge. Thanks.

  • @jonass7456
    @jonass7456 3 года назад +3

    Dude! Let me tell you, you saved me a lot of time and work! Thank you so much!

  • @sanjay123644
    @sanjay123644 3 года назад +1

    Excellent way of teaching. Thanks Kevin

    • @dataschool
      @dataschool  3 года назад

      Glad it was helpful! 🙌

  • @vipinamar8323
    @vipinamar8323 3 года назад +1

    Nice teaching method. precision over pace.

  • @autonish
    @autonish 4 года назад +4

    Brilliant Stuff, All videos are awesome. Clearly explained all fundamentals...Thanks for making this stuff easy.
    On a different line, you remind me of "Sheldon" from the TV series The Big bang theory and this is a compliment. :)

    • @dataschool
      @dataschool  4 года назад

      Ha! So many people have said that 😄

  • @DJoeyJordi5on
    @DJoeyJordi5on 3 года назад +1

    This was a well-paced, clear and complete explanation of the topic, thank you very much! It helped me a lot

    • @dataschool
      @dataschool  3 года назад +1

      That's awesome to hear!

  • @brunoreighner1780
    @brunoreighner1780 4 года назад +2

    You're an amazing teacher. Thanks a lot for these.

  • @da_ta
    @da_ta 4 года назад +6

    Thanks Kevin I have been looking for this for long time!

    • @dataschool
      @dataschool  4 года назад +1

      Awesome! I'm so glad to hear this is the video you needed! 🙌

  • @danielchacreton2401
    @danielchacreton2401 4 года назад

    Your videos are always amazing. You are a national treasure in my book. Don't change a thing, but for viewer 1.75 speed is the speed to watch these in.

  • @pogoclub8495
    @pogoclub8495 3 года назад

    This is 1st time i walked into your video and i am very much impressed by your explaination and your english speaking pace is perfect. loved your content. Thanks a lot. :)

    • @dataschool
      @dataschool  3 года назад

      Thanks so much for your kind words!

  • @mileguitar
    @mileguitar 4 года назад

    Kevin you are a super hero of Data science, best videos on tube...

  • @shashi_kamal_chakraborty
    @shashi_kamal_chakraborty 2 года назад +1

    Thanks! very nicely explained. Now, I can perform joins using Pandas, quite effortlessly.

  • @BC-gc7bv
    @BC-gc7bv 4 года назад +1

    You are an excellent teacher!!! I'm a fan. TY.

  • @mr.stemedutv5514
    @mr.stemedutv5514 4 года назад +1

    Very easy to follow, and thanks for making very useful video!

  • @citizen_deb
    @citizen_deb 4 года назад +1

    Thank you so much Kevin, your neat explanation along with the file you share makes it so clear, was really needing it!

  • @themustknowfacts510
    @themustknowfacts510 3 года назад +3

    I'm not able to read that file "u.item" , I copied the same code from GitHub but pandas wasn't able to read that. It showed me Unicode Error... How do I solve that issue..

    • @ChrisMao_708
      @ChrisMao_708 3 года назад

      insert this encoding='latin-1' and you will be fine

  • @jcbcorner8464
    @jcbcorner8464 4 года назад

    Finally a clear explaination of merge function !! Thanks, subscribed

  • @saraghafelehbashi5808
    @saraghafelehbashi5808 2 года назад +1

    Please keep making these videos! You are awesome!

  • @alankarshukla4385
    @alankarshukla4385 4 года назад +1

    Not wait too much to watch this.

    • @dataschool
      @dataschool  4 года назад +1

      I hope the video is helpful to you!

  • @bongi_nkuna
    @bongi_nkuna 3 года назад

    This video is pure GOLD, absolutely wonderful, loved the clear explanations , thank you...

    • @dataschool
      @dataschool  3 года назад

      So glad to hear it was helpful to you! 🙌

  • @NiireNolweva
    @NiireNolweva 3 года назад +1

    Very clear and informative. Thank you very much.

  • @tommonks2490
    @tommonks2490 4 года назад +1

    Excellently explained as always. Keep up the great work!!

  • @jaysoni7812
    @jaysoni7812 4 года назад +1

    Were is the link of the data set which has been used in this video.
    I want to practice this with your data set can you please send me link?

  • @nowyouknow2249
    @nowyouknow2249 4 года назад +1

    Thanks a lot Kevin
    We have missed you.

  • @JustJoelTV
    @JustJoelTV 2 года назад +1

    Great video, informative and clear. Thanks

  • @svengunther7653
    @svengunther7653 4 года назад +3

    You are doing a really great job with this. Thank you so much! :)

  • @gregf9160
    @gregf9160 4 года назад +2

    Thank you so much for the concise clear explanation. Much appreciated.

  • @avelinoamado4568
    @avelinoamado4568 2 года назад +1

    This video was very helpful and clear. Thank you for this content.

  • @joseluisbeltramone599
    @joseluisbeltramone599 3 года назад +1

    Thank you very much for the precise explanation, just what I needed to know!

    • @dataschool
      @dataschool  3 года назад

      You're very welcome! 🙏

  • @bilalahmad9177
    @bilalahmad9177 3 месяца назад

    You are a great instructor. I have learned a lot from you regarding pandas.
    The video with title "How do I merge DataFrames in pandas?" has left some queries in my mind. I would be thankful to you if you clear those too.
    What type of join is used here movie_ratings = pd.merge(movies , ratings)?
    if it is inner join it should result in 1682 rows in total in movie_ratings dataframe, as movies dataframe has 1682 rows. But in video i have observed that movie_ratings results in 100,000 rows of data.

  • @SR-lf3ic
    @SR-lf3ic 2 года назад +1

    hi, when I used pd.concat([df1,df2]), I got a tuple object instead of a dataframe object. I am using Python 3.9 environment. I would like to know what should I do to get a dataframe object rather than a tuple object?

  • @JainmiahSk
    @JainmiahSk 4 года назад +2

    Good to see you. I love the logic you teach.

    • @dataschool
      @dataschool  4 года назад +1

      Thank you! Glad my videos are helpful to you 👍

  • @mochammadirfanbaihaqi279
    @mochammadirfanbaihaqi279 3 года назад +1

    Love the way you explain it, thanks for your vids. Keep it up (thumbs)

  • @hieungotrung5411
    @hieungotrung5411 4 года назад

    Great to see you again as well as your high-quality content in your video

    • @dataschool
      @dataschool  4 года назад

      Thanks so much for your kind words! 😄

  • @maxvinella941
    @maxvinella941 4 года назад +1

    Missing your pandas tutorials.. thanks

    • @dataschool
      @dataschool  4 года назад

      It's nice to be missed! You can find all of my pandas tutorials here: ruclips.net/p/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y

  • @SamSam-mh5jt
    @SamSam-mh5jt 3 года назад

    Thank you so much for the clear and concise explanation

  • @omidadib5052
    @omidadib5052 2 года назад +1

    Awesome tutorial, Thank you very much man!

  • @zapy422
    @zapy422 4 года назад +1

    Thank you for this video.
    I have been struggling with merge and concat today :)

    • @dataschool
      @dataschool  4 года назад +1

      You're very welcome! Glad it's helpful to you!

  • @cutestbear3327
    @cutestbear3327 Год назад

    thnx for the video, that's awesome, particular the parts on explaining joins. clear and concise

  • @osmanhussein3893
    @osmanhussein3893 3 года назад +1

    This is very helpful. Thank you so much.

  • @TheNobody04
    @TheNobody04 3 года назад

    Wow, I've seen some of your videos and I just can say THANK YOU. It's so easy to understand you :3

    • @dataschool
      @dataschool  3 года назад

      Thanks for your kind words! Glad you like my videos!

  • @jalego800
    @jalego800 Год назад

    Hi Kevin, thanks to your turtoring, I learn a lot from your channel, it's amazing! Since I just learn Pandas, I'm a little bit confused about concat(), melt(), merge(), pivot(), stack()...They're really annoying to me >< I really hope we have a one for all solution of how to use these functions XD Thank you!

    • @dataschool
      @dataschool  Год назад

      I agree, it's tricky to separate out when you should use each one of those!

  • @g.jignacio
    @g.jignacio 3 года назад

    Excelent video! keep sharing content like this. Greetings from Argentina

  • @sch0ll1
    @sch0ll1 3 года назад +1

    Thanks man! You saved my weekend :*

  • @feroncia
    @feroncia 3 года назад

    Thank you so much for explaining it clearly. Now I understand on merging dataframe more. TQVM

  • @akinsikuelizabeth5780
    @akinsikuelizabeth5780 4 года назад +1

    Superb!!!
    I got Evey explanation, thanks

  • @AsMa-eg
    @AsMa-eg 2 года назад +1

    thank u so much. very clear and to the point.

  • @Isabel-ec2sq
    @Isabel-ec2sq 4 года назад +1

    Thank you!! I finally got the dataframe I wanted!

  • @jaydhanwant4072
    @jaydhanwant4072 4 года назад +1

    I wish we had 3x on youtube, great video!

  • @dannylockett9445
    @dannylockett9445 Год назад +1

    I really enjoy your tutorials, thanks so much! I have 5 csv files that come out daily each containing a date column. i want to merge them all using the date as the merge field. i tried a basic merge with 2 of the csv files and date was used as the merge-on field by default - so it worked. ultimately i just need one date column in my masterfile with all the other column data merged. should I continue to do this or is it better to set the date column as the index, or something else?

  • @hectoralvarorojas1918
    @hectoralvarorojas1918 4 года назад +2

    Great work as allways.
    Very useful.
    Thanks for sharing it!
    By the way, any chance you get some video done about PySpark? It will be very usefull to treat this from the biginning considering examples based on a local connection (one computer) first and then a couple of examples emulating a cluster connection.

    • @dataschool
      @dataschool  4 года назад +1

      Thanks for your kind words as always, Hector! Sorry, I don't have any videos about PySpark, but I appreciate the suggestion! 👍

    • @hectoralvarorojas1918
      @hectoralvarorojas1918 4 года назад +1

      @@dataschool I would love for you to do that. I am possitive that you will get a lot of interested guys, among them me of course.
      My best regards!

  • @adedolapoogungbire7088
    @adedolapoogungbire7088 3 года назад

    Just what I needed.

  • @Octaphea
    @Octaphea 2 года назад +1

    Great video. However I have a little issue. I have 3 data frames that I am trying to merge together. The first is a pretty long database with columns (cust_id, gained_on gained_from_supplier, lost_to_supplier, sales_channel_id) the second is the supplier data frame (supplier_name, supplier_id) what I am trying to do is merge the supplier id and name from the second data frame, to the database frame which has the ID so supplier id to the number using the lefton/right on but instead it returns both columns - the supplier ID and name of both dataframes. Then the same with the channel data frame (sales_channel_name, sales_channel_id) and merge this with the sales_channel_id in the database dataframe and show the name instead. Any help would be appreciated, thank you!

  • @user-xq3qy4qv5z
    @user-xq3qy4qv5z 3 года назад +1

    Единственный англоговорящий человек, которого можно понять не зная английский

  • @juliakristavilladiego245
    @juliakristavilladiego245 4 года назад

    Thank you! Crystal clear explanation.

  • @cmovilidad1
    @cmovilidad1 4 года назад +1

    Máster! Regresó! chévere.

  • @fschmidkonz
    @fschmidkonz 3 года назад +1

    You're great teacher! I see the despite having a large 100K row file, the number of rows do not get expanded after the merge. They beautifully stay the same and just add the movie titles to the reviews. Can you comment on why this is not always the case. I have tried and my output file gets expanded by a few rows (17 out of 1000) and I have not been able to figure out why. I have checked multiple videos and some come absurd not practical solutions (like the files are the same size) or arbitrarily eliminate any dups (despite some may be valid rows), but none explain the reason and how to identify those rows that could be dups. Your comments are appreciated.

  • @alndr4u
    @alndr4u Год назад +1

    How to merge two dataframes based on 4 common columns with repatative elements?

  • @Zahraa_005
    @Zahraa_005 4 года назад

    This is the best explanation
    Thanks so much!

  • @vijayreddy1730
    @vijayreddy1730 4 года назад

    Hi Kevin , First of all thanks for the wonderful lecturer , I am facing a problem to merge two data frames which i have shown you below ..
    Data frame 1:
    BackupServer BackupDay StartDate ClientName BackupStatus Backup re-run(Y/N) Incident Reason for the Backup Failures Backup Final Outcome
    RGSIBAK004 01-05-2020 2020-04-30 06:40:29 RGBPLNM110 Completed NaN NaN NaN NaN
    RGSIBAK004 01-05-2020 2020-04-30 06:53:07 RGPIAPP037 Completed NaN NaN NaN NaN
    RGSIBAK004 01-05-2020 2020-04-30 15:32:38 RGPIISD001 Failed Yes IN893523 VM disconnected Failed
    RGSIBAK004 01-05-2020 2020-04-30 18:00:08 RGPPFTP005 Completed NaN NaN NaN NaN
    RGSIBAK004 01-05-2020 2020-04-30 18:00:02 RGPQWEB069 Completed NaN NaN NaN NaN
    Data Frame 2 :
    BackupServer BackupDay StartDate Client Name Backup Status Backup Rerun (Y/N) Incident Failures Backup Final Result
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpqbda112.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc051.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc050.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc011.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpdbda105.fdnet.com Activity completed successfully. NaN NaN NaN NaN
    Although the two data frames have three column names "Backupserver" , "Backupday" and start date ...the content in the columns is different and i am not able to merge these two data frames into one ? Can you help me on this?

  • @mohammadj.shamim9342
    @mohammadj.shamim9342 4 года назад

    Dear Kiven, I have some difficulties in fine tuning PLSRegression sklearn.cross_decomposition.PLSRegression. Can you please touch this issue one day?

    • @dataschool
      @dataschool  4 года назад

      Thanks for your suggestion!

  • @shivamsaway6803
    @shivamsaway6803 4 года назад

    Does it happen while merging two data frames, only heads get to merge, No data get merged inside the new data frame?

  • @cgpmth6449
    @cgpmth6449 2 года назад

    How to merge multiple large dataframes in a fast way? I joined with usual merge() but it seems too slow. I found a clue of using pandas.Index() with the merge method, but i don't know how to use it.

  • @JunaidInHenan
    @JunaidInHenan 4 года назад

    above logic is beautifully explained, hi kevin, i have a question if you could please reply,
    I have three csv files csv1(20000 rows), csv2(20000 rows),cvs3(20000 rows), i want to merge these files into single data frame without losing a single record? Like i want to read these files into a one data frame that should have 60000 rows ideally.
    P.S: All the files have same columns (PostID, time, tweetURL, Content, RetweetNum , LikeNum, CommentsNum, Verified, Following, Follower). And in the resulting data frame i want to have all these columns at once as heading and want all 60000 rows. Is it possible ? kevin i will wait for your reply man, i know this post is old, maybe your read my question. THANK YOU

  • @zhalie12345
    @zhalie12345 3 года назад

    Thanks for the vid data school !

  • @hardikvegad3508
    @hardikvegad3508 4 года назад

    Sir if we have hundreds of columns without the name. Then how can we name them using pandas and a for loop or lambda function because if we try to name them using names=[] it will be a very time-consuming process. The name of the columns can be col1, col2 , col3...etc.

  • @ayodejiakinfenwa
    @ayodejiakinfenwa Год назад

    Plesae i am trying to merge two datasets as you have explained but it is giving an error that i should check for duplicates

  • @borispelichekbueno2408
    @borispelichekbueno2408 3 года назад

    Thank you very much for that amazing and cleary explanation. By the way, is there a simple way to merge two dataframes with partial match string between the rows?

  • @robertc2121
    @robertc2121 4 года назад +1

    Love your videos!! excellent tutorial - by chance does Pandas have a facility to do a semi_join() like Dplyrs function?

    • @dataschool
      @dataschool  4 года назад

      Thanks for your kind words! I'm not familiar with semi_join, sorry...

  • @zezodiaa1025
    @zezodiaa1025 2 года назад

    great video. my question is when im working on project when exactly i have to combine ?

  • @vinayakchikkorde8151
    @vinayakchikkorde8151 3 года назад

    I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?

  • @KhadijahMoustafaFlowers
    @KhadijahMoustafaFlowers 2 года назад

    at timestamp 8:22 , how did you open the signature box that showed methods ?

  • @eliasaudi2877
    @eliasaudi2877 2 года назад

    What would we use to show ONLY all the values that do not match ? .... i.e. anything other that inner join

  • @christleiroezi8878
    @christleiroezi8878 3 года назад

    I have a data frame and I have a list and a tuple , I want to merge all three together . I am aware merge can only do two tables at a time, but do you have any helpful hints on how to go about merged the table , list and df. I want make to make the result a new data frame

  • @anonymousm4328
    @anonymousm4328 3 года назад

    slowly talk is very helpfull to me. I have 2 questions. The first is : What's if i want merge only one certain column (rating) from df rating to df movie . The second: What's if I want to sum the rate of each Movie_Id . Tks you so much and looking for your answer.

  • @bommubhavana8794
    @bommubhavana8794 3 года назад

    I am a beginner in python, I am not sure what join is the best to use in different scenarios. Can you help me through it??
    I genuinely learnt a lot from your videos. I would really appreciate your help. Thank you in advance

  • @jeevakumara5599
    @jeevakumara5599 2 года назад

    hi bro, I am currently working in a project. The mentors says that use foreign keys and primary keys in pandas and create table with the keys. so my question is, the usage of foreign and primary keys in pandas is possible or if we can't what shall I do to merge the two tables contains the same column which we are doing in the MYSQL coding. Thank you.

  • @michael3226
    @michael3226 2 года назад

    the resulting dataset I got has a value of null. What do i do?

  • @tirtha9
    @tirtha9 3 года назад

    Lets say a pandas df and mysql have column A, B, C and same schema, Column A in SQL is the primary key.
    now how to upsert a pandas df to mysql table?
    When primary key conflicts, then update the remaining columns, when doesn't conflict/exists, then do an Insert Into..
    Whats the most efficient way to do this?

  • @saikiranhr
    @saikiranhr 2 года назад

    Thanks for the amazing video. One simple question. How to join tables on multiple indices (like 4 or 5)?

  • @MIH20788
    @MIH20788 7 месяцев назад

    thanks Kevin, but where is the concat video

  • @tooljerk666
    @tooljerk666 4 года назад

    Say I want to merge 2 dataframes with the same columns because someone else was working on an excel file. I want to keep everything in both dataframes, but I want, say, the column "category" from dataframe 2 to override the "category" column text in dataframe 1. Would that be left or right merge?

  • @ramachalprajapati1176
    @ramachalprajapati1176 3 года назад

    How to get the common mobile number from two different csv file having the different column name

  • @mehnazjabeen
    @mehnazjabeen 2 года назад

    How to verify if all the columns are incorporated in the merged DataFrame by using simple comparison Operator in Python after merging two DataFrame?

  • @job2k6
    @job2k6 3 года назад

    Very helpful, thank you.

  • @felicytatomaszewska2934
    @felicytatomaszewska2934 4 года назад

    As usual nice tutorial. so thank you for making and sharing. Can you please elaborate about pandas assign function. Also when will you share the code because I want to practice :)

    • @dataschool
      @dataschool  4 года назад

      Here is the code for the merge video: nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas_merge.ipynb As for the pandas assign function, this might help you to understand its usage: twitter.com/justmarkham/status/1173995596631478272

  • @ruthliganad8274
    @ruthliganad8274 3 года назад

    how about not a specific file? for example all .csv or all .tsv file? how to concatenate a header to that file? Thanks