Great explanation on different approaches of deleting duplicates. How to delete duplicates without creating backup table when there are duplicates with exact same rows. Below would give an error saying cte does not exist. with cte as (select *, row_number() over(partition by emp_id order by emp_id) rwn from employee) delete from cte where rwn > 1
cte is a temporary table, that you have created in this query, it doesnt have existence unlike employee table, thats why you cant delete records from it
Not sure if its bcoz of Mssms it worked ..but conceptually CTE is a temporary table to view right? How would delete work on a temp table.. may be @ankit can respond on this..@pr-rb6jl
Hello sir, Can we use this approach also to remove multiple duplicates with cte as( select *, row_number() over(partition by emp_id order by timestamp desc) as rn from employee) delete from cte where rn>1
@@mayankgupta7548 No Bro i was thinking same but it gets deleted from table only ..i have verified it by doing myself on table data....the rows gets deleted from table as well .....
@@mayankgupta7548 CTEs are also temporary results just like subqueries but stored separately on top of your main query. Using CTE to delete duplicates will work perfectly and will delete duplicate records from the table.
Great video though 1 question and a possible video request: What if we want to remove pure duplicates without creating backup table? Deleting from cte is not reliable across DBMS as per other's comments it is working in few DBMS and not in other. Please confirm 1 robust method of deleting duplicates by creating a back and not creating a back up which can work across most of the popoular DBMS like MySQL,SQL server,Big Query etc Thanks
So, each null is treated as a diff value, how can we remove duplicate null from a table? Because when we use the row_number() logic it won't keep the duplicate nulls in single partition, right?
MERGE INTO your_table tgt USING ( SELECT ROWID AS rid, name, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY ROWNUM ) As rn FROM your_table ) src ON (src.rid = tgt.ROWID AND src.rn > 1) WHEN MATCHED THEN UPDATE SET name = name DELETE WHERE 1 = 1; Unfortunatlly this will not work in Sql Server or at last me i couldn t make it work but in Oracle work like a charm. Happy Holydays.
Thank You So So Much Ankit for putting in such great efforts....💯💯 But I have one Question How to delete pure duplicates without taking a backup of the table? Is It Possible?
The course is in a structured manner covers from basic to advanced all the concepts along with assignments. Also covers 2 projects and premium subscriptions to practice SQL problems.
Please give tips on not going blank during intrvws. The intrvwer acts hurry hurry and i end up saying 'i dont know' even though later i realise i knew the answer. So many opportunities missed due to this fear.😓😔😔😔
Maybe u need to practice more questions before Interview ...For hurry , You can tell interviewer to give u 1 minute to think ... Ask him " May i think for a minute " and try to recall the answer in ur mind... and then answer accordingly ....This worked for me in 6-7 interviews of Data Analyst
i thought I am the only one who is facing this problem..i jumbled even after knowing the answers which make bad impression to interviewers which lead to rejection unnecessarily
Ankit, your videos are just fabulous my friend. I am using SQL Server, While deleting duplicate records with multiple columns I am getting an error ("An expression of non-boolean type specified in a context where a condition is expected, near ','). Is it because that ,SQL Server doesn't support multiple column subqueries? Please confirm. Thanks in advance..
With cte as (Select*, row_number() over(partition by emp_id order by create_timestamp asc) as rank from employee a) Select emp_id, max(create_timestamp) From cte Group by emp_id;
I am getting this error "An expression of non-boolean type specified in a context where a condition is expected, near ','." after running query delete from EMP2 where (Emp_id,create_date) in (select Emp_id, min(create_Date) as time_c from EMP2 group by Emp_ID having count(1)>1) Please comment what needs to be update in this
@@paragkelkar8538 ok so in SQL server you can not filter on 2 columns as tuple. Use this syntax using inner join DELETE w FROM WorkRecord2 w INNER JOIN Employee e ON EmployeeRun=EmployeeNo Do join on the 2 columns
Hi Sir, thanks again for wonderful video, I tried deleting in workbench platform but it is not allowing me to delete. I have uncheck the safe mode as well. Error: You can't specify target table 'employee_new' for update in FROM clause. Same query I am using it: delete from employee_new where (emp_id, create_timestamp) = (select emp_id, min(create_timestamp) as create_timestamp from employee_new group by 1 having count(1) >1);
INSERT INTO duplicate_entries (emp_id, emp_name, salary, create_timestamp) VALUES (1, 'Ankit', 10000, NOW()), (2, 'Rahul', 20000, NOW()), (3, 'Agam', 30000, NOW()), (4, 'Ankit', 15000, NOW()); WITH RankedSalaries AS ( SELECT emp_id, DENSE_RANK() OVER(PARTITION BY emp_name ORDER BY salary DESC) AS salary_rank FROM duplicate_entries ) DELETE FROM duplicate_entries WHERE emp_id IN (SELECT emp_id FROM RankedSalaries WHERE salary_rank > 1); select * from duplicate_entries; This is one of the method which i used
watching this lec exactly after 2 years in December,helpful concept
Great video sir💯
Most of the time I have faced duplication issues with entire row
So I used row_number to take care of that
Thanks for sharing
Great explanation on different approaches of deleting duplicates.
How to delete duplicates without creating backup table when there are duplicates with exact same rows. Below would give an error saying cte does not exist.
with cte as (select *, row_number() over(partition by emp_id order by emp_id) rwn from employee)
delete from cte where rwn > 1
cte is a temporary table, that you have created in this query, it doesnt have existence unlike employee table, thats why you cant delete records from it
@@Rekha_m_angadi But it worked for me with CTE ...i was using MSSMS.
Not sure if its bcoz of Mssms it worked ..but conceptually CTE is a temporary table to view right? How would delete work on a temp table.. may be @ankit can respond on this..@pr-rb6jl
Very well explained 😊
Thanks a lot 😊
Hello sir,
Can we use this approach also to remove multiple duplicates
with cte as(
select *,
row_number() over(partition by emp_id order by timestamp desc) as rn
from employee)
delete from cte where rn>1
No you deleting from cte
Cte are views only
@@mayankgupta7548 No Bro i was thinking same but it gets deleted from table only ..i have verified it by doing myself on table data....the rows gets deleted from table as well .....
yes you can use this approach ...this will work..
@@mayankgupta7548 CTEs are also temporary results just like subqueries but stored separately on top of your main query. Using CTE to delete duplicates will work perfectly and will delete duplicate records from the table.
Great Video Sir !!
Also, for SQL Server, the best approach would be Row_Number(), that will solve both the problems.
Yes, you are right
Great video though 1 question and a possible video request:
What if we want to remove pure duplicates without creating backup table? Deleting from cte is not reliable across DBMS as per other's comments it is working in few DBMS and not in other. Please confirm 1 robust method of deleting duplicates by creating a back and not creating a back up which can work across most of the popoular DBMS like MySQL,SQL server,Big Query etc Thanks
Great video sir❤
Keep watching
Thanks a lot greatly explained...
Glad it was helpful!
Nice explanation. May i know which IDE you are using?
Thank you. It's datagrip.
Thank you for making such insightful videos on sql. Please advise some do's and don'ts in an interview or how to approach a problem statement.
Noted
@@ankitbansal6 Instead of giving links of so many videos try to give the code script
Great approach!
Glad it was helpful!
Please provide optimised query for to detect duplicates when we will having million rows and want query to run fast
So, each null is treated as a diff value, how can we remove duplicate null from a table? Because when we use the row_number() logic it won't keep the duplicate nulls in single partition, right?
Nice explnation
Keep watching
MERGE INTO your_table tgt
USING (
SELECT ROWID AS rid, name,
ROW_NUMBER() OVER (
PARTITION BY name
ORDER BY ROWNUM
) As rn
FROM your_table
) src
ON (src.rid = tgt.ROWID AND src.rn > 1)
WHEN MATCHED THEN
UPDATE
SET name = name
DELETE WHERE 1 = 1;
Unfortunatlly this will not work in Sql Server or at last me i couldn t make it work but in Oracle work like a charm.
Happy Holydays.
Thank You So So Much Ankit for putting in such great efforts....💯💯 But I have one Question How to delete pure duplicates without taking a backup of the table? Is It Possible?
There is no standard way of doing it. Depending on databases it can be done.
Hey, what about if i have duplicate record but its id is different. In that case how would we delete that record?
Please provide create table and insert statement like you used to provide in your odl videos
There is nothing in the table. You can create table and data as per the video
Hello Ankit
Just wanted a little help . How is the course different from all the videos you have put . What additional knowledge will we get ?
The course is in a structured manner covers from basic to advanced all the concepts along with assignments. Also covers 2 projects and premium subscriptions to practice SQL problems.
@@ankitbansal6 when will the next batch start and is live batch any different from the recorded one you currently have on the website
Mind blowing🤯🤯🤯..!
Why using the alias A when you havnt used it anywhere like A.empid or A.salary??
This doubt follows me evrywhere..😓😔😔
In SQL server it's mandatory to give aliases to a sub query.
Please give tips on not going blank during intrvws. The intrvwer acts hurry hurry and i end up saying 'i dont know' even though later i realise i knew the answer. So many opportunities missed due to this fear.😓😔😔😔
Maybe u need to practice more questions before Interview ...For hurry , You can tell interviewer to give u 1 minute to think ... Ask him " May i think for a minute " and try to recall the answer in ur mind... and then answer accordingly ....This worked for me in 6-7 interviews of Data Analyst
@@rishav144 hi rishav what are the Major skills required for data analyst and how to apply for companies ?
i thought I am the only one who is facing this problem..i jumbled even after knowing the answers which make bad impression to interviewers which lead to rejection unnecessarily
It's same with me also. knewing the right answer ,still can't recall at that moment, mind runs in hurry to impress or something else
Ankit, your videos are just fabulous my friend. I am using SQL Server, While deleting duplicate records with multiple columns I am getting an error ("An expression of non-boolean type specified in a context where a condition is expected, near ','). Is it because that ,SQL Server doesn't support multiple column subqueries? Please confirm.
Thanks in advance..
Yes it doesn't support multiple columns. You can concatenate the 2 columns and then use in filter
With cte as
(Select*, row_number() over(partition by emp_id order by create_timestamp asc) as rank from employee a)
Select emp_id, max(create_timestamp)
From cte
Group by emp_id;
Hello Ankit, is there a way to delete pure duplicate record without taking backup of original table. Please help me with this. Thanks in advance.
Which database ?
@@ankitbansal6 for MySql
@@ankitbansal6 for MySQL
Hi can u provide the database you use so that it will be useful for practicing
You can practice on any database. Just create the table and insert records
I am getting this error "An expression of non-boolean type specified in a context where a condition is expected, near ','." after running query
delete from EMP2 where (Emp_id,create_date) in (select Emp_id, min(create_Date) as time_c from EMP2 group by Emp_ID having count(1)>1)
Please comment what needs to be update in this
Is it SQL server?
@@ankitbansal6 Yes
@@paragkelkar8538 ok so in SQL server you can not filter on 2 columns as tuple. Use this syntax using inner join
DELETE w
FROM WorkRecord2 w INNER JOIN Employee e ON EmployeeRun=EmployeeNo
Do join on the 2 columns
Hi Sir, thanks again for wonderful video, I tried deleting in workbench platform but it is not allowing me to delete. I have uncheck the safe mode as well. Error: You can't specify target table 'employee_new' for update in FROM clause. Same query I am using it:
delete from employee_new where (emp_id, create_timestamp) = (select emp_id, min(create_timestamp) as create_timestamp from employee_new
group by 1
having count(1) >1);
same issue with me
Hi..Can you share the name of the application you are using..This is not MS SQL Server
Datagrip
thankyou
You're welcome!
Please sir strat SQL courses from basic to Advance.
Join here www.namastesql.com
@@ankitbansal6 i checked it out but you don't list Normalization topic in your courses
Please continue the tutorials in SSMS, this editor does not seem to go well
Ok. Thanks for the feedback 😊
Not the best video in terms of video production but well explained a few things.
Agree with you. Was trying a new IDE but it did not go well.
Why your used this tool Ankit....It was bit late for execution process
Just for a change
Accent too heavy , thanks for content tho
somewhat confusing...
Good SQL problems but explanations not that good. Will love to watch if you speak in language you are comfartable to explain
create table employee (
emp_id int,
emp_name varchar(20),
salary int,
create_timestamp time);
INsert into employee values
(1, 'Ankit', 20000, GETDATE()),
(2, 'Rahul', 20000, GETDATE()),
(3, 'Agam', 30000, GETDATE()),
(1, 'Ankit', 15000, GETDATE())
INSERT INTO duplicate_entries (emp_id, emp_name, salary, create_timestamp) VALUES
(1, 'Ankit', 10000, NOW()),
(2, 'Rahul', 20000, NOW()),
(3, 'Agam', 30000, NOW()),
(4, 'Ankit', 15000, NOW());
WITH RankedSalaries AS (
SELECT emp_id,
DENSE_RANK() OVER(PARTITION BY emp_name ORDER BY salary DESC) AS salary_rank
FROM duplicate_entries
)
DELETE FROM duplicate_entries
WHERE emp_id IN (SELECT emp_id
FROM RankedSalaries
WHERE salary_rank > 1);
select * from duplicate_entries;
This is one of the method which i used