How to remove Duplicate Data in SQL | SQL Query to remove duplicate

Поделиться
HTML-код
  • Опубликовано: 7 июн 2024
  • In this video, we see 10 different ways to remove duplicate records in SQL. We look at 2 different scenario for duplicate records in a table and then come up with 10 SQL queries to remove these duplicate data from the database.
    Data can be consider as duplicate if all column values are duplicated or if only some of the column values are duplicated. In this video, you will get solution to remove duplicate data for both these scenarios.
    The dataset, scripts and SQL Queries used in this video can be downloaded from below link:
    techtfq.com/blog/how-to-remov...
    Have an SQL Query? Would you want me to make a video on it? Then do email me your query to techtfq@gmail.com;
    Timestamp:
    00:00 Intro about duplicate data in SQL
    01:22 Explaining Scenario 1 of Duplicate Data
    03:02 SOLUTION 1 - Delete using Unique Identifier.
    07:19 SOLUTION 2 - Using SELF join.
    12:21 SOLUTION 3 - Using Window function.
    15:32 SOLUTION 4 - Using MIN function. This delete even multiple duplicate records.
    19:34 SOLUTION 5 - Using backup table.
    24:10 SOLUTION 6 - Using backup table without dropping the original table.
    26:06 Explaining Scenario 2 of Duplicate Data
    27:02 SOLUTION 7 - Delete using CTID.
    30:25 SOLUTION 8 - By creating a temporary unique id column.
    33:38 SOLUTION 9 - By creating a backup table.
    35:30 SOLUTION 10 - By creating a backup table without dropping the original table.
    🔴 My Recommended Courses:
    ✅ Learn SQL: learnsql.com/?ref=thoufiqmoha...
    ✅ Practice SQL Queries: www.stratascratch.com/?via=te...
    ✅ Learn PowerBI: codebasics.io/courses/power-b...
    ✅ Learn Python: codebasics.io/courses/python-...
    🔴 WATCH MORE VIDEOS HERE 👇
    ✅ SQL Tutorial - Basic concepts:
    • SQL Tutorial - Basic c...
    ✅ SQL Tutorial - Intermediate concepts:
    • SQL Tutorial - Interme...
    ✅ SQL Tutorial - Advance concepts:
    • SQL Tutorial - Advance...
    ✅ Practice Solving Basic SQL Queries:
    • Practice Solving BASIC...
    ✅ Practice Solving Intermediate SQL Queries:
    • Practice Solving INTER...
    ✅ Practice Solving Complex SQL Queries:
    • Practice Solving COMPL...
    ✅ Data Analytics Career guidance:
    • Data Analytics career ...
    ✅ SQL Course, SQL Training Platform Recommendations:
    • SQL Course / Training
    ✅ Python Tutorial:
    • Python Tutorial
    ✅ Git and GitHub Tutorial:
    • Git and GitHub
    ✅ Data Analytics Projects:
    • Data Analytics Projects
    THANK YOU,
    Thoufiq

Комментарии • 301

  • @bollaalekyadevi6795
    @bollaalekyadevi6795 Год назад +11

    It's blessing that i found your channel.. your way of teaching SQL concepts is really awesome. I am learning all SQL concepts from your videos which is really giving me confidence in queries. Thank you so much sir.

  • @NawazKhan-go3fj
    @NawazKhan-go3fj Год назад +10

    Man you've explained exactly what i needed to know. You've no idea how relieved i feel after finally understanding the concept of joins. I needed to understand how sql works with each record of each table and you've explained that really well. Thanks a lot brother for this video. I'll definitely subscribe to your channel.

  • @dhruvshahvlogs
    @dhruvshahvlogs Год назад +2

    This video is very practical and helpful when it comes to the real-time database. Thank you so much 👌👌👍👍❤❤

  • @MASC_RAMBO4Fuggi
    @MASC_RAMBO4Fuggi Год назад +2

    You took this class to heaven, hats off.

  • @swamivivekananda-cyclonicm8781

    Superb, best part of the video is distinct method when only few values are duplicate and when entire row is duplicate. Thank you so much.

  • @abdulsamadibrahim4929
    @abdulsamadibrahim4929 Год назад

    Thanks TFQ, I've been battling on how to remove duplicates from a project I am working on currently. But thanks to this video for coming to my rescue. Personally, I prefer the one of creating a temporary unique id column because we are used to having unique IDs in our dataset. The dataset I am working doesn't have a unique id, that's why it has been a bit difficult for me. But thanks once again.

  • @sudeep2580
    @sudeep2580 4 месяца назад +1

    This is the best video "How to remove the duplicate records from the table."

  • @Mayank-jw9yy
    @Mayank-jw9yy Месяц назад

    One of the best lecture ever I have seen in my life over SQL topic, so much precise clarity, in-depth knowledge & thoroughly discussion over a topic.....amazing...just amazing buddy.

  • @sivaramakrishnareddy
    @sivaramakrishnareddy Год назад

    Excellent teaching bro... very simple methods you have collected and explained 👏👍👌

  • @vishalsonawane.8905
    @vishalsonawane.8905 Год назад +1

    Your SQL content likes God gifted❣️ its same question asks in interviewer.🏆

  • @ArifRaza
    @ArifRaza 8 месяцев назад

    wonderful explanation! I got overwhelmed!

  • @NB20079
    @NB20079 Год назад

    I'm from Philippines and your new subscriber, I appreciate your lesson how you to discuss this in nice way and simple ,so that the viewer like me can understand clearly, thank you sir!

  • @Unknowing0011
    @Unknowing0011 Год назад

    Thankyou so much for this awesome tutorial. It was soo effective and to the point.

  • @RandomlyWisdom
    @RandomlyWisdom Год назад +1

    Solution 3 can also work for Scenario 2 just need to give order by clause in over( ).
    Thanks for the other solutions : )

  • @nagarapusudheeksha5750
    @nagarapusudheeksha5750 Год назад +3

    Your video is really help full for a data analyst who is struggling thank you so much ❤️👍🇮🇳

    • @techTFQ
      @techTFQ  Год назад

      Glad to hear that

  • @saritaxsingh
    @saritaxsingh Год назад

    Thanks alot. You are helping me understand my current role problem.

  • @somasekharkunka8820
    @somasekharkunka8820 Год назад +2

    I completely learning SQL tips from your channel sir. Thanks for your efforts and it is amazing..

    • @techTFQ
      @techTFQ  Год назад

      Glad to hear that

  • @souradeep.chatterjee
    @souradeep.chatterjee Год назад

    Thanks bro 🥳 need more interview questions like this in SQL.

  • @F_A_R_man
    @F_A_R_man Год назад

    Oh, thank you so much. Your videos are really helpful. Thanks for your efforts.

  • @suyash7450
    @suyash7450 Год назад

    You are doing a great Job and your way of explaining is also very good, Thanks for helping us out

  • @viveknanhe1786
    @viveknanhe1786 Год назад

    Thank you so much sir, it helps me a lot in my data engineering interviews.

  • @faizmujawar3504
    @faizmujawar3504 Год назад

    Very helpful. Thanks a lot for creating such beneficial videos.

  • @user-tr9ot4yo6i
    @user-tr9ot4yo6i 4 месяца назад

    Everyone can understand your video from school boy to graduate guy
    Thanks a lot

  • @vk-jk2gz
    @vk-jk2gz Год назад

    Ur teaching by using examples is very easy to understand me thank u so much sir

  • @codingfox
    @codingfox Год назад +2

    Solution 4 is the best, thank you! :)

  • @tomjoe3380
    @tomjoe3380 19 дней назад

    Very informative. Thank you so much for this video.

  • @jamesopoku8049
    @jamesopoku8049 Год назад +3

    Wonderful presentation, I really like your presentation, thank you very much sir, infact u are making a positive impact in my life , keep it up

    • @techTFQ
      @techTFQ  Год назад +1

      I am so glad this helps.. thank you James 🙏🏼

  • @dhanrajpatil218
    @dhanrajpatil218 Год назад +1

    Thank You Sir!!!! Your all explanations are just awesome.

    • @techTFQ
      @techTFQ  Год назад

      Glad you liked it Dhanraj 🙏🏼

  • @nc193
    @nc193 Год назад

    You explained that very well... Great help.

  • @lkkota41
    @lkkota41 3 дня назад

    Thank you.. You videos are helpful to crack interviews.. You explain very well too.. appreciate

  • @RavikaUniverse.
    @RavikaUniverse. Год назад

    Splendid and most Interesting solutions given hatsoff

  • @ABaumstumpf
    @ABaumstumpf Год назад +2

    I had the harder version of that recently:
    Keep a configurable number of history-entries for each unique car-model - and also only keep those entries for a configurable amount of time.
    The time-constraint is easy, keeping the newest N-entries is still easy, the problem is if that has to scale for performance.
    The primary is a combination of sequence (foreign key to a table that contains timestamp-information) + item-ID (in your example car-model). And for each car i want to keep the newest 5 Entries where newest is the highest sequence.
    Right now we are keeping ~10 000 sequence entries and 5 entries per item-ID - and it can happen that we add as many as 10 000 entries every 10 seconds.
    I have tried nearly every one of the solutions you showed here already, the combination of rank+row-id is currently the fastest, but still takes ~600 ms for 2000 rows. it kinda feels like that should be possible an order of magnitude faster.

  • @goedzo4361
    @goedzo4361 Год назад

    this content is so good and helpful!

  • @user-de8gj5xi7f
    @user-de8gj5xi7f 10 месяцев назад

    Thank you for your wonderful video it helped me a lot!!!

  • @user-ny8rd8eg2v
    @user-ny8rd8eg2v 8 месяцев назад

    Thanks a lot brother. Its gonna help me in so many ways. Subscribed to your channel and ill definitely going to learn lot more things from you without any doubt. Thanks again

  • @nageshmungikar5386
    @nageshmungikar5386 Год назад +1

    Nice KT Transfer. Big Fan Of ur Channel Please Keep Posting Good work 💯💯

    • @techTFQ
      @techTFQ  Год назад

      Thank you so much :)

  • @singh9661
    @singh9661 Год назад

    Hey
    techTFQ,Thank you so for your all videos, This is very helpful not helpful i think this is awesome for me, not me i think everyone , god will blessing you always.😍

  • @abhishekgowda1776
    @abhishekgowda1776 Год назад

    Thanks for making video on another helpful concept 👍please make videos in PLSQL concepts

  • @adeebanaaz5855
    @adeebanaaz5855 8 месяцев назад

    thank you so much..This helped me a lot.

  • @shahrukhsultan8727
    @shahrukhsultan8727 Год назад

    Solution 3 and 4 are the best.

  • @mohanakrishnanr1648
    @mohanakrishnanr1648 Год назад

    Great Content bro Though fast very useful ..

  • @rajatsethi6000
    @rajatsethi6000 Год назад +7

    Hey Thanks for your video
    We can solve scenario two with below 2approaches also
    Select * from cars group by 1,2,3,4,5
    It will give you unique records
    And second approach is using row_number by passing all variables in partition by
    The above 2 are mostly use approachs
    May be you find it better.
    By the way I have learnt about window functions from your channel only.
    Keep up the great work

    • @techTFQ
      @techTFQ  Год назад +4

      thank you and yes there are be atleast 20 different ways to delete dup records. these were just the 10 i could think off at the moment

    • @ManiKandan-kg5ky
      @ManiKandan-kg5ky Год назад

      Hi how are you doing?

    • @avinash8535
      @avinash8535 Год назад

      ​@@techTFQ Sir, Could you please let us share your thoughts -
      Along with SQL, What else can we learn like Power BI, SSIS and etc to get more opportunities in future.
      BTW, UNION would also be preferrable as best option to remove duplicates

  • @gowtham7819
    @gowtham7819 Год назад

    Thanks bro for the excellent explanation.kindly do video on index in sql

  • @krantikumar894
    @krantikumar894 Год назад

    Superb bro ur explanation is awsome

  • @nitinmishra5027
    @nitinmishra5027 10 месяцев назад

    Perfect explanition!!

  • @leoprabhakar4166
    @leoprabhakar4166 9 месяцев назад

    Very Useful toufiq

  • @michaely5375
    @michaely5375 10 месяцев назад

    Thank you so much. Your lesson is so help full! I wonder do you have any lesson for preventing to have duplicate data in tables ?

  • @arunrajv3080
    @arunrajv3080 Год назад

    Bro, you are the best 🙌🤝

  • @hilarylomotey7051
    @hilarylomotey7051 Год назад +2

    Hey my favorite friend on the net. Awesome tutorial as always. When I come to India Pizza is on me for a whole week. Great video excellent explanation

    • @techTFQ
      @techTFQ  Год назад

      Haha thank you Hilary ☺️
      Glad you liked it

  • @mlevvy96
    @mlevvy96 8 месяцев назад +1

    I believe solution 8 has a "bug", if you would have more than 1 duplicate - that is 3 or more records with the same values you would just delete one duplicate with proposed query which has biggest id value. So with simplified schema = (id, model, brand, row_num) and records = [(1, X5, BMW, 1), (1, X5, BMW, 2), (1, X5, BMW, 3)] you would just delete last record and still have [(1, X5, BMW, 1), (1, X5, BMW, 2)] left. It can be fixed by changing the query to `DELETE FROM prod WHERE row_num NOT IN (SELECT MIN(row_num) min_id FROM prod GROUP BY model, brand HAVING COUNT(*) > 1)`
    This way you would delete every possible duplicate. Anyway, great video✌

  • @KetakiGadgil_15
    @KetakiGadgil_15 5 месяцев назад +3

    For the first solution, instead of deleting records from table and repopulating table again with those records, we can use:
    select * from car
    where id not in
    (
    select max(id) from car
    group by model,brand
    having count(*)>1
    )

  • @netlifetoday5193
    @netlifetoday5193 6 месяцев назад

    Amazing instruction! Thank you.

    • @techTFQ
      @techTFQ  5 месяцев назад

      Glad it was helpful

  • @prithiviraj4424
    @prithiviraj4424 Год назад

    Thank you so much Thoufiq!!! ☺☺

  • @mirageman2
    @mirageman2 Год назад

    Great video, thanks a lot :)

  • @dhemanth7437
    @dhemanth7437 Месяц назад

    Hi tofique thanks for the video and also in scenario 2 we can also use "select * from cars union select * from cars" this also removes duplicates.

  • @vebhs9769
    @vebhs9769 Год назад

    Woww...amazing video 💕

  • @jeffrey6124
    @jeffrey6124 3 месяца назад

    Great video! 😍You were able to touch based on most of the common scenario related to duplicate data in sql. Do you also have a playlist regarding Data Quality Concepts/Issues/Resolutions in SQL Server? If not that would also be a great video to work on and will be looking forward it 🙂

  • @RamsEarner
    @RamsEarner Год назад

    Thank you Tfq..., And I mean it.

  • @1112GT
    @1112GT Год назад

    Hi Thanks for the contents. One request, can you please provide the query to highlight duplicates instead of deleting?

  • @chellas6784
    @chellas6784 Год назад

    Very nice explanation

  • @reddaiahreddymallu
    @reddaiahreddymallu Год назад

    Thank you for the content.

  • @s.s.vinodhvinu9698
    @s.s.vinodhvinu9698 9 месяцев назад

    Very nice video and explanation and would like to share the backup table creation as (Select *
    into cars_bkp
    From cars
    where 1=2;) please check and share your feedback it works for me. Thanks for your video sir,

  • @jesuszelayac
    @jesuszelayac Год назад

    Great tutorial !!

  • @ishratalikhan7528
    @ishratalikhan7528 Год назад +1

    really very nice and very informative.

    • @techTFQ
      @techTFQ  Год назад

      Thank you 🙏🏼

  • @snehalvaidya5843
    @snehalvaidya5843 Год назад +27

    Happy Teachers Day Sir🎇 (5th Sep 22)

    • @techTFQ
      @techTFQ  Год назад +1

      Thank you so much Snehal 🙏🏼☺️

  • @user-hk4sv1wx6z
    @user-hk4sv1wx6z 11 месяцев назад

    super explaination bro

  • @Imranbof
    @Imranbof 11 месяцев назад

    Very helpful

  • @cj7334
    @cj7334 Год назад +3

    Howdy, great work overall, however Solution 8 is not possible with Microsoft TSQL server as the Identity property can only be applied when the table is created and cannot be added in later through using an Alter statement. If you want unique identifiers on the data set you have to use Row_number windows function. Try this CTE query:
    ;with row_delete as
    (
    select id, model, brand, (row_number() over(partition by model, brand order by id asc) ) as row_num
    FROM cars
    )
    delete from row_delete where row_num > 1

  • @Aubd417
    @Aubd417 Год назад

    your videos are really helpful opening mind to approach problem with different and effective way,,
    share some session on hierarchical sql quires.
    if its already available, share the link.

  • @beks5755
    @beks5755 11 месяцев назад

    @techTFQ
    Your first method 'Delete using Unique Identifier' works if we have just one duplicate but what if we have 55 rows with Tesla Model S?

  • @gecarter53
    @gecarter53 Год назад +2

    Great solutions, easy to follow. BTW: If I had a table with duplicates, where the duplicates are records with numerous multiple column values (let's say records are duplicates if they matching values across 20 columns). Is it possible to write a query without list all 20 columns?

    • @techTFQ
      @techTFQ  Год назад +1

      thank you and I think you need to mention all column names.

  • @hasanmougharbel8030
    @hasanmougharbel8030 Год назад

    Hey there, god bless your efforts.
    Glad to meet again with a general enquiry.
    Is "where" and "having" clauses similarly applied in sql?
    Thanks for taking care of this.

    • @prem486
      @prem486 Год назад +1

      The basic difference which you should consider Hasan is that Having is used with agg functions such as count, max, min, etc. while “Where” cannot be used with agg functions and thats the main purpose of why Having was included in later parts of SQL.

  • @bhaveshverma8629
    @bhaveshverma8629 Год назад +1

    Very very awesome 👍👍👍👍👍👍👍👍

    • @techTFQ
      @techTFQ  Год назад

      Thank you so much 😀

  • @Alltec2233
    @Alltec2233 Год назад

    Good info

  • @matthewwalton450
    @matthewwalton450 10 месяцев назад

    If you are using SQLite you can use a NOT IN clause with built in rowid.
    DELETE FROM dataset
    WHERE rowid NOT IN (
    SELECT MIN(rowid)
    FROM dataset
    GROUP BY col1, col2, col3 etc etc.
    );
    This will assign all duplicates a rowid, grouped by the columns you want to scrutinize, then it will delete all but the MIN rowid in that set of duplicates.
    Hope this helps with SQLite users.

  • @NitinKumar-ow5yr
    @NitinKumar-ow5yr 2 месяца назад

    Your video is really help full for a data analyst who is struggling thank you so much ❤❤. how to delete in MYSQL server which code we can use in MYSQL.

  • @akshayhalande8763
    @akshayhalande8763 Год назад

    thank you 🙌

  • @girdhar3224
    @girdhar3224 Год назад

    explanation op💙❤

  • @nadinfridburg7088
    @nadinfridburg7088 Год назад

    Thank you!!!!

  • @anmolsonthalia7371
    @anmolsonthalia7371 8 месяцев назад

    May I know which tool you use this SQL to analyze or interpret it?

  • @kavithabalasubramaniyan3390
    @kavithabalasubramaniyan3390 10 месяцев назад

    Thank you❤

  • @kachvejathin
    @kachvejathin 11 месяцев назад

    Hi sir, instead of using generated always as identity we could have used ROWNUNBER or other window functions right..?

  • @MuhammadSalmanKhan-lp4mf
    @MuhammadSalmanKhan-lp4mf Год назад +9

    Assalam O Alaikum Toufeeq!
    On 7:01 I do not think that it will remove all duplicates from table. In your case duplicates were maximum of 2 but Lets say that In some table we have duplicates more than 2 rows than i.e 4 rows are duplicates, then It will remove the row with maximum id and we will be left with 3 rows. Maybe I am wrong, so could you guide me about it.
    Btw, I have learned a lot from your channel and I have watched almost all basics and intermediate playlist. Thank you so much.

    • @neerajgupta6858
      @neerajgupta6858 Год назад

      You just have to use "not in" in delete query instead of "in"

  • @krishnamangalapally2314
    @krishnamangalapally2314 10 месяцев назад

    Thanks😊

  • @akiemcameron5303
    @akiemcameron5303 5 месяцев назад

    The window function is definetly the best one for me. Is there a rule that governs which method you choose based on the problem?

  • @YogeshKumar-me9es
    @YogeshKumar-me9es Год назад

    I think in self join we can also use this condition c1.idc2.id.

  • @ebj5805
    @ebj5805 10 месяцев назад

    For the 2nd solution would this where clause work the same? I'm using data without unique ids:
    WHERE c1.color c2>color

  • @Genial_Gouthi
    @Genial_Gouthi Год назад +1

    I started watching your videos and very clear explanation,planning to watch all the videos. Please make a video on join using multiple tables and how the data is populated from multiple table.

  • @kanurutejeswararao7757
    @kanurutejeswararao7757 7 месяцев назад

    Can u please tell me the query to fetch the duplicate records for scenario 2

  • @abhijitcaps
    @abhijitcaps Год назад

    We can also use row_number() and dense_rank() for all column(s) duplicate row.

  • @vanmathiraj2617
    @vanmathiraj2617 Год назад +1

    I really fell in love with channel to build my career

    • @techTFQ
      @techTFQ  Год назад

      Thank you so much Vanmathi 😀

  • @fathimafarahna2633
    @fathimafarahna2633 Год назад +2

    Amazingggg👏👏

    • @techTFQ
      @techTFQ  Год назад

      Thank you 🙏🏼

  • @dvdsampath
    @dvdsampath Год назад +1

    Awesome solutions and could you post video about cricket match scheduling

    • @techTFQ
      @techTFQ  Год назад +1

      Thank you :)
      I had made a video on "3 tricky sql interview queries" where I had solved a query for IPL matches schedule. u can check that

  • @DE-Py-Sq-Az-Db
    @DE-Py-Sq-Az-Db Год назад +1

    Thank You!!!

  • @mohammedkarimuddin8472
    @mohammedkarimuddin8472 Год назад

    Hi Thoufiq,
    For SQL PLSQL jobs, Linux and Shell scripting is required.
    can you please tell me ?

  • @MAMAKIJAANGUGU
    @MAMAKIJAANGUGU Год назад

    Great solutions. Im not able to delete from the target table. I get an error message so I have to create a copy of the table.

  • @prajwalchauhan3010
    @prajwalchauhan3010 Год назад +1

    Thanks sir 😀

    • @techTFQ
      @techTFQ  Год назад

      Your welcome buddy ☺️

  • @dhemanth7437
    @dhemanth7437 Месяц назад

    Hi Tofique a big fan of your lectures and just wanna mention that for solution 1 if have have more than 2 duplicates and we are deleting only the max one we will still be left with duplicates right.

  • @richagurav
    @richagurav 4 месяца назад

    which tool you use for this sql query? best tool?

  • @rohitraghwa170
    @rohitraghwa170 10 месяцев назад

    Please make a video on advance sql questions.
    Please.