I think the below one also works. with cte as (select *, row_number() over(partition by buckets) as rn from( select *, ntile(3) over(partition by distance) as buckets from src_dest_distance)) select source,destination,distance from cte where rn
@@kartikeytyagi9119 in the given query two state names are repeated vice versa and i need to fetch the row which was inserted first. if i create buckets, i can get three buckets for three pairs of states, each bucket will have two rows with vice versa state names and if i apply row number for each bucket, i can be able to get first inserted rows(by using rn
I think we can simply write the query using least , Greatest funtions (in Oracle Database) SELECT DISTINCT least(SOURCE, DESTINATION) SOURCE, greatest(SOURCE, DESTINATION) DESTINATION,
DISTANCE FROM table1; It will give the desired results
Thanks for sharing these functions. I'm trying to think through the logic of the code to see what the output would be. I'm not sure but it looks like your code will perform 6 sets of comparisons returning 6 rows of data. How do you reduce it down to the 3 rows desired?
@@olu0mg Least function will always select the first word based on alphabetical order You may watch some videos on these functions, it is quite easy. And As we are using 'Distinct' in the query above, it will remove the duplicate records and return only 3 rows
A self anti semi join sounds easier to me: select * from src_dest_distance a where not exists (select 1 from src_dest_distance a where a.source=b.destination and a.destination=b.source and b.source
It's great to see tutorials like this posted. However, it should be noted that the solution presented only works for a very specific dataset. Two potential failure cases come to mind: 1) Adding the city pair Bangalore Chennai will fail. As others have noted, the fix for this is to make the where clause check both T1.Source = T2.Dest and T1.Dest = T2.Source 2) It assumes that every city pair shows in both orders. If a city pair is only listed in one direction, it won't match on the self join. Others have mentioned using least() and greatest(). Not just an alternative, but using such functions solves both of the problems noted above: SELECT DISTINCT GREATEST(source, destination) AS source, LEAST(source, destination) AS destination, distance FROM cities; Note: tested with MariaDB, other engines may vary.
I guess this can be also one more approach with greatest and least function: select distinct greatest(source, destination) as source, least(source, destination) as destination, distance from src_dest_distance;
Thank you for this alternative. I was thinking about something similar. This approach to me is more clever and avoids using the row number, which I feel is a bit hacky (even if it works)
Actually @techTFQ it would be better if you add t1.dest = t2.source bcz in the sample data we don't have repetition of source but that is mostly possible to uniquely identify we could add this condition
For this data set it's fine but it would be cool to incorporate the possibility of the same city showing up in a different pair. I guess you could just also add the condition that t2.source = t1.destination
Again an awesome video with great explanation. Just one suggestion to the join condition. Should we include T1.source = T2.destination and T1.destination = T2.source and T.id < T2.id so that we match only the required rows. The condition T1.source = T2.destination and T.id < T2.id may join non required rows as well which is not present in your data example. Like Bangalore -> Hyderabad and Chennai -> Bangalore. Let me know your thoughts. Thanks
agree, the best answer sees beyond the data given to the broader nature of the data and request, and provides a solution that also is robust for handling future edge cases. This is "find all the unique trips" not just "return these 3 specific rows"... I'd add on that doing an inner join is less ideal for this same reason. It assumes that a "return trip" record will always be present to pair against. Instead, using a left join with the filter condition placed in the WHERE clause (WHERE T2.id is null) would better handle the potential situation of an unpaired entry down the road. Retain a record when no "return trip" match is found is more robust, assuming that "find all the unique trips" is the mission.
@@mudassirasaipillai6584 That does not avoid duplicates. That will gives the same duplicate values. For any city pair, you will have 2 rows, they will have ids of id1 and id2 which will be different, so one will be larger than the other; and they will have a different source. Then for the join in one case you will have T1.id < T2.id, and in the other case you have T1.id > T2.id. By having one of those as the condition you will only get one of the pair. But having t1.source t2.source, the condition will be true in both cases and both directions of the journey will be returned.
Hey My solution for the same is ;with cte as ( select *, case when source destination then source else destination end as destination1 from src_dest_distance) select source1,destination1,max(distance) from cte group by source1,destination1
@@jacksparrow3595 You can certainly compare strings in SQL. Protap’s suggestion is the best solution to the stated problem. It’s extremely simple, gives correct results, and even has the benefit that the selected rows will always have the two endpoints in alphabetical order, which, if the actual source were quite long, would make the output easier to use.
Looking at the output, it is a alternative row of the input table. In this instance, I believe we can divide the row number by 2. Where ever the value is not equal to zero, the result is the output table. Select * From (Select *, row_number() over () as row from input_table) t1 where row % 2 0
This would do the job SELECT DISTINCT IIF(source > destination, source, destination) AS source, IIF(source < destination, source, destination) AS destination, distance FROM src_dest_distance
with cte as ( select (case when sourcedestination then source else destination end) as destination,distance from src_dest_distance ) , cte2 as ( select source,destination,distance, row_number() over(partition by source order by distance) as rnk from cte )
select distinct s,d,distance from (select *, case when source >destination then source else destination end s, case when source < destination then source else destination end d from src_dest_distance)x
we can solve it through not exists operator also right. with cte as (select *, row_number() over as id from src_dest_distance) select * from cte AS t1 where not exists (select * from cte AS t2 where t1.source = t2.destination and t1.destination = t2.source and t1.id>t2.id);
Another Simpler way to solve this : with xyz as ( select * , lead(destination) over() as LD from src_dest_distance ) Select source , destination , distance from xyz where source = LD ;
I would prefer source < destination, that way the source comes first alphabetically. But neither will return the desired data, where it seems to want to use the first trip in the table.
Problem with MS Sql is it wont let you create row_number() without putting an order by in the over() clause. I solved like this: with CTE as ( select t.*, row_number() over( partition by distance order by distance) as [id] from travel_routes t ) select source, destination, distance from CTE where id=1 May be the CTE is overkill, but for someone who might find this useful.
Why not filter the original input for SOURCE < DESTINATION? That would eliminate one of the 2 records, and would be much more efficient than a self join...
yep, although it wouldn't change the results, just make sure to use lower function on both sorce and destination to avoid headaches down the line when someone introduces "delhi" and someone's script breaks and nobody knows why
This might sound daft, but given the problem/solution set out at 2:20 why not just SELECT * FROM INPUT WHERE SOURCE IN ('Bangalore','Mumbai','Chennai');
with cte as ( select * , lead(destination) over(order by distance) as Lead_dest from src_dest_distance ) Select source , destination , distance from cte where source = Lead_dest ;
select s.source,s.destination,s.distance from src_dest_distance as s join src_dest_distance as s1 on s.source=s1.source and ascii(s1.source)>ascii(s.destination)
This is an easier solution @techTFQ with cte as (select *, lead(destination,1,destination) over() as sc1 from src_dest_distance) select source, destination, distance from cte where source = sc1;
Taufiq, what if the entry is like: bangalore hyderabad 400 delhi bangalore 1200 will this condition not treat these as duplicate records ? Can we use the condition: T1.source=T2.destination and T1.destination=T2.source ?
Easier: select source as destination, destination as source, distance from input union select destination, source, distance from input Explanation: union will automatically remove duplicate rows ;)
Hi, My approach to this Question With cte as (Select *,ROW_NUMBER() over (order by Distance) as RN from Distance) Select Source,Destination,Distance from cte where RN %2 =1 I got it output....but is the approach right??
with cte as (select row_number()over (order by distance ) as rowid,SOURCE,DISTINATION,DISTANCE from TRAVEL) select SOURCE,DISTINATION,DISTANCE from cte where (rowid%2)0
Thank you for the Video. This SQL only works if there are always two sets of records (A->B, B->A). If there is only a single source - destination pair (S->D) it will not be part of the result.
select source,destination,distance From (select *, LEAD(Source,1,Source) OVER() as Source1, LEAD(Destination,1,Destination) OVER() as Destination1 FROM src_dest_distance) as a WHERE source=destination1 AND destination=source1; Is that correct please check?
This is what I came up with before I watched your solution: select src, dest, dist from cities c1 where ( select count(*) from cities c2 where c2.src = c1.dest and c2.src < c1.src) = 0;
I don't understand why we cannot just do: SELECT * FROM table1 WHERE Source IN('Bangalore', 'Mumbai', 'Chenai') ??? Maybe the problem definition needs to be a bit more precise? Cool video either way :-)
Write a sql query on input table is like Item, no_items Apple, 8 Potato, 4 Banana,6 Tomato,2 And Output table should be like Item, sum_item Vegitables, 8 Fruits, 14
hi everything is excellent, but you mentioned the concept used in the title, so anyone who saw this video didn't get the opportunity to think on their own coz as the concept used is already mentioned, but you are doing a great job, to data community
Table name assumed - Location WITH cte as (SELECT * , LAG(Source,1,0) OVER() as comp FROM Location) SELECT Source,Destination,Distance FROM cte WHERE Source NOT IN (SELECT Source FROM Location WHERE Destination=comp)
I would not hire you if you gave that answer. It solves the problem one time only, based on assumption about the input data. What if the order of input rows was randomized?
select t1.source, t1.destination, t1.distance from src_dest_distance t1 left join src_dest_distance t2 on t1.destination = t2.source and t1.distance = t2.distance where t1.source
with cte as( Select *,row_number() over(order by distance ) as rn from src_dest_distance), cte2 as (Select *,row_number() over(order by distance ) as rn from src_dest_distance) select c1.source,c1.destination,c1.distance,c1.rn from cte c1,cte2 c2 where c1.rn
Assuming that your table1 always has both directions, then SELECT * FROM TABLE1 WHERE source < destination will suffice otherwise SELECT DISTINCT CASE WHEN source > destination THEN destination ELSE source END source, CASE WHEN source > destination THEN source ELSE destination END destination, distance FROM table1 would work but be slower but should be significantly faster than a self-join.
What if you have two paths to the same destination from 1 city? Here is my alternate solution: use append to combine start and destination cities, then use append to combine destination and start cities. Now you have a pair of "unique" (for a given pair of cities) identifiers. Using the same self join and row number, check if start-destination in destination-start before current row.
Hi Sir , Please let me know me this will work or not with cte as (select *,least(source,destination) as lst,greatest(source,destination) gr from distance) select source,destination,distance from (select *,row_number() over(partition by lst,gr) rnk from cte) sal where rnk =1;
You should use a union which automatically removes duplicates, just inverse the destination and source while unioning. Far lighter, easier above all much much faster. I love the way bring content and training to the people but I noticed you are really record focused, try to solve issues in sets of data this will eliminate a lot of headache with growing data.
with cte as( select source,destination,distance,row_number() over(order by distance) as row from src_dest_distance) select source,destination,distance from cte where row in (1,3,5) This Query also gave the same result. Is that correct?
No, as in the example the distances are all the same, the order is more or less random. With real distances it would be still a problem, because it is possible that two different pairs of cities have the same distance.
MYSQL Solution for Freshers With CTE as (Select Source,Destination,Distances, Case When Source>Destination then concat(Source,Destination) else concat(Destination,Source) end as Batch from src_dest_distances) Select Source,Destination,Distances from (Select *,row_number() over (Partition by Batch) as Rn from CTE) N Where Rn=1;
if you can use row numbering and sub-clauses, just do: select * from ( select *, row_number() as rid from table) where modulo(rid,2) = 0 And if modulo not allowed just do a diff even check, as wonky as: round(rid/2) = rid/2
Hi, could please give solution for SQL server as well.. getting issue - The function 'row_number' must have an OVER clause with ORDER BY. Also dont think this solution will work for all type of combo as below - can please check once? insert into src_dest_distance values ('Bangalore', 'Hyderbad', 400); insert into src_dest_distance values ('Hyderbad', 'Bangalore', 400); insert into src_dest_distance values ('Bangalore', 'Kolkata', 500); insert into src_dest_distance values ('Kolkata', 'Bangalore', 500); insert into src_dest_distance values ('Mumbai', 'Delhi', 400); insert into src_dest_distance values ('Kasmir', 'Mumbai', 1000); insert into src_dest_distance values ('Delhi', 'Mumbai', 400); insert into src_dest_distance values ('Mumbai', 'Kasmir', 1000); insert into src_dest_distance values ('Chennai', 'Pune', 400); insert into src_dest_distance values ('Chennai', 'HYD', 100); insert into src_dest_distance values ('Pune', 'Chennai', 400); insert into src_dest_distance values ('HYD', 'Chennai', 100);
Hi Toufik, Can you make a video on "case-insensitive pattern matching in PostgreSQL". I recently faced this issue when using wildcard, unlike mySQL Postgre isn't case-insensitive. Thanks for the resources & study and practice materials you provide, they are very helpful.
Suppose there are another data also bangalore to mumbai and mumbai to bangalore. Then how join will be helofull because bangalore will be maped with two data.
There doesn't seem to be any order to the table and you did not provide any order to your query. You can end up with Hyderbad in row one and Bangalore in row two. Then your query would return the wrong row. Seems completely arbitrary. Maybe something was not provided to you. Otherwise just do this: select * from table where source in ('Bangalore', 'Mumbai', 'Chennai')
Hi Toufiq it would fail for this scenario CREATE TABLE DATA1( SOURCE VARCHAR(50), DESTINATION VARCHAR(50), DISTANCE INTEGER ); INSERT INTO DATA1 VALUES ('Ban', 'Hyd', 400), ('Hyd', 'Ban', 400), ('Mum', 'Del', 400), ('Del', 'Mum', 400), ('Che', 'Pun', 400), ('Pun', 'Che', 400), ('Ban', 'Del', 400), ('Del', 'Ban', 400); The output should be 4 records, but we get 6 records as output
There are some huge mistakes First of all it will work (but not every time) only with this set of data.Because you don't have a second condition of join: T1.Destination = T2.Source. Next, you don't have "order by" in "over()" so sqlserver can rearrange data on his own and again not every time you get the same result in id column. And last:we don't know what distance we should show if a-b and b-a distances are different (it is common if it's the road distance between two cities).And here too:there is no condition why they choose these cities in source and destination
Hi sir, I am trying to solve in sql server but its getting following error. Then which col should i consider for order by clause Msg 4112, Level 15, State 1, Line 147, The function 'row_number' must have an OVER clause with ORDER BY.
And now I watched full video: so you don't assume that one city can participate in 2 different pairs? Why you don't assume this? Where it was told? Also, over() without order by: why in hell you expect that data will be fed in some specific order? It might be mixed however DB engine and/or storage decides to do that. Really, this solution demonstrates only the fact that you even didn't try to solve the task.
Again we insert two rows 7. Bangalore to Hyderabad 400 8.Hyderabad to Bangalore 400 We have to fetch again one row is Bangalore to Hyderabad 400 , in that situation which condition we are apply.
Does this query work if we later, add another row with source Bangalore but a different destination? We are only checking that Bangalore is present in both columns, but not that they have the same corresponding source.
It will not work if you have different source for banglore multiple joins will happen for banglore so it only works if u have distinct source and distinct destination
I think the below one also works.
with cte as
(select *,
row_number() over(partition by buckets) as rn
from(
select *,
ntile(3) over(partition by distance) as buckets
from src_dest_distance))
select source,destination,distance
from cte
where rn
But why we use bucket in this?
@@kartikeytyagi9119 in the given query two state names are repeated vice versa and i need to fetch the row which was inserted first. if i create buckets, i can get three buckets for three pairs of states, each bucket will have two rows with vice versa state names and if i apply row number for each bucket, i can be able to get first inserted rows(by using rn
Since distance is same how can you make buckets by partitioning the distance?
I think we can simply write the query using least , Greatest funtions (in Oracle Database)
SELECT DISTINCT
least(SOURCE, DESTINATION) SOURCE,
greatest(SOURCE, DESTINATION) DESTINATION,
DISTANCE
FROM table1;
It will give the desired results
Thanks for sharing these functions. I'm trying to think through the logic of the code to see what the output would be. I'm not sure but it looks like your code will perform 6 sets of comparisons returning 6 rows of data. How do you reduce it down to the 3 rows desired?
@@olu0mg
Least function will always select the first word based on alphabetical order
You may watch some videos on these functions, it is quite easy.
And As we are using 'Distinct' in the query above, it will remove the duplicate records and return only 3 rows
@@Anonymous-le2zr Thank you! Somehow missed DISTINCT when I first read your post.
desired output is hyd, delhi, pune in destination column
above query gives hyd, mumbai, Pune as output
That was also the first thing that I taught about
A self anti semi join sounds easier to me:
select *
from src_dest_distance a
where not exists
(select 1 from src_dest_distance a where a.source=b.destination and a.destination=b.source and b.source
It's great to see tutorials like this posted. However, it should be noted that the solution presented only works for a very specific dataset. Two potential failure cases come to mind:
1) Adding the city pair Bangalore Chennai will fail. As others have noted, the fix for this is to make the where clause check both T1.Source = T2.Dest and T1.Dest = T2.Source
2) It assumes that every city pair shows in both orders. If a city pair is only listed in one direction, it won't match on the self join.
Others have mentioned using least() and greatest(). Not just an alternative, but using such functions solves both of the problems noted above:
SELECT DISTINCT GREATEST(source, destination) AS source, LEAST(source, destination) AS destination, distance FROM cities;
Note: tested with MariaDB, other engines may vary.
This also works well with SQL server and Databricks SQL
Second theamigo42. This query will not work in those 2 scenarios
I guess this can be also one more approach with greatest and least function:
select distinct greatest(source, destination) as source, least(source, destination) as destination, distance from src_dest_distance;
Appreciate it. Could you tell me, is it MySQL or SQL server or others ? Thank you
PostgreSQL
Thank you for this alternative. I was thinking about something similar. This approach to me is more clever and avoids using the row number, which I feel is a bit hacky (even if it works)
I just learned JOIN today in SQL and this was a great lesson to add to my earlier lesson. I also liked the addition of the ID column. Thank you!
Actually @techTFQ it would be better if you add t1.dest = t2.source bcz in the sample data we don't have repetition of source but that is mostly possible to uniquely identify we could add this condition
Also creation of ID is not necessary as text can also be compared directly, here source or destination
@@yaminurrahmantopiwala6207 it is required, query isn’t giving required output without id column instead it’s giving whole data
@@aakriti_100 I still disagree. You share your sample data and I will give you perfect query without redundancies
For this data set it's fine but it would be cool to incorporate the possibility of the same city showing up in a different pair.
I guess you could just also add the condition that t2.source = t1.destination
This has to be one of the best IT channels out there.
Again an awesome video with great explanation. Just one suggestion to the join condition. Should we include T1.source = T2.destination and T1.destination = T2.source and T.id < T2.id so that we match only the required rows. The condition T1.source = T2.destination and T.id < T2.id may join non required rows as well which is not present in your data example. Like Bangalore -> Hyderabad and Chennai -> Bangalore. Let me know your thoughts. Thanks
Yes you need that condition T1.destination=t2.source and also you can replace t1.id
@@mudassirasaipillai6584 it is not working
agree, the best answer sees beyond the data given to the broader nature of the data and request, and provides a solution that also is robust for handling future edge cases. This is "find all the unique trips" not just "return these 3 specific rows"...
I'd add on that doing an inner join is less ideal for this same reason. It assumes that a "return trip" record will always be present to pair against.
Instead, using a left join with the filter condition placed in the WHERE clause (WHERE T2.id is null) would better handle the potential situation of an unpaired entry down the road. Retain a record when no "return trip" match is found is more robust, assuming that "find all the unique trips" is the mission.
@@mudassirasaipillai6584 That does not avoid duplicates. That will gives the same duplicate values.
For any city pair, you will have 2 rows, they will have ids of id1 and id2 which will be different, so one will be larger than the other; and they will have a different source.
Then for the join in one case you will have T1.id < T2.id, and in the other case you have T1.id > T2.id.
By having one of those as the condition you will only get one of the pair.
But having t1.source t2.source, the condition will be true in both cases and both directions of the journey will be returned.
Hey
My solution for the same is
;with cte as (
select *,
case when source destination then source else destination end as destination1
from src_dest_distance)
select source1,destination1,max(distance)
from cte
group by source1,destination1
Instead of the group, DISTINCT can be used too.
Great solution, if the source and destination are not strict, we can do simple trick-
select * from src_dest_distance where source
U can't use comparison operator in character column
@@madavsaravanan8844 why?
@@protapnandi9729 beacuse
How can you compare
Two names
@@jacksparrow3595 You can certainly compare strings in SQL. Protap’s suggestion is the best solution to the stated problem. It’s extremely simple, gives correct results, and even has the benefit that the selected rows will always have the two endpoints in alphabetical order, which, if the actual source were quite long, would make the output easier to use.
I think my solution is much simpler and therefore more understandable:
select source c1, dest c2, dist from input where sourcedest
approach for specific dataset
select * from src_dest_distance
where source
Looking at the output, it is a alternative row of the input table. In this instance, I believe we can divide the row number by 2. Where ever the value is not equal to zero, the result is the output table.
Select *
From (Select *, row_number() over () as row
from input_table) t1
where row % 2 0
This would do the job
SELECT DISTINCT IIF(source > destination, source, destination) AS source, IIF(source < destination, source, destination) AS destination, distance
FROM src_dest_distance
with cte as (
select (case when sourcedestination then source else destination end) as destination,distance
from src_dest_distance ) ,
cte2 as (
select source,destination,distance, row_number()
over(partition by source order by distance) as rnk from cte )
select * from cte2
where rnk = 1
select distinct s,d,distance from
(select *, case when source >destination then source else destination end s,
case when source < destination then source else destination end d
from src_dest_distance)x
we can solve it through not exists operator also right.
with cte as
(select *, row_number() over as id from src_dest_distance)
select * from cte AS t1 where not exists (select * from cte AS t2 where t1.source = t2.destination and t1.destination = t2.source and t1.id>t2.id);
Thanks Taufiq, I think visualization of output is necessary in such scenarios
Another Simpler way to solve this :
with xyz as (
select * , lead(destination) over() as LD
from src_dest_distance )
Select source , destination , distance from xyz where source = LD ;
great explanation ..
We can also run a simple query
select * from src_dest_distance
where source > destination
I would prefer source < destination, that way the source comes first alphabetically.
But neither will return the desired data, where it seems to want to use the first trip in the table.
Problem with MS Sql is it wont let you create row_number() without putting an order by in the over() clause. I solved like this:
with CTE as
(
select t.*, row_number() over( partition by distance order by distance) as [id]
from travel_routes t
)
select source, destination, distance from CTE where id=1
May be the CTE is overkill, but for someone who might find this useful.
if distance is same for all records then this will not work
Why not filter the original input for SOURCE < DESTINATION? That would eliminate one of the 2 records, and would be much more efficient than a self join...
yep, although it wouldn't change the results, just make sure to use lower function on both sorce and destination to avoid headaches down the line when someone introduces "delhi" and someone's script breaks and nobody knows why
This might sound daft, but given the problem/solution set out at 2:20 why not just SELECT * FROM INPUT WHERE SOURCE IN ('Bangalore','Mumbai','Chennai');
you have one of the best channels for practicing SQL queries...great going
with cte as (
select * , lead(destination) over(order by distance) as Lead_dest
from src_dest_distance )
Select source , destination , distance from cte where source = Lead_dest ;
select * from src_dest_distance where source < destination;
select s.source,s.destination,s.distance from src_dest_distance as s
join
src_dest_distance as s1
on s.source=s1.source and ascii(s1.source)>ascii(s.destination)
Thank you TFQ....
This question is asked by me.
Thank you for your reply 🙏 it helps a lot to others as well.
This is an easier solution @techTFQ
with cte as (select *, lead(destination,1,destination) over() as sc1 from src_dest_distance)
select source, destination, distance from cte where source = sc1;
excellent solution
Can't we write source> destination instead of complex join,row_ num
Taufiq, what if the entry is like:
bangalore hyderabad 400
delhi bangalore 1200
will this condition not treat these as duplicate records ?
Can we use the condition: T1.source=T2.destination and T1.destination=T2.source ?
Then this record delhi bangalore will be filtered out
We can also use lead function in Table and then comparing in with the 1st Column. That will result the Unique output.
I wouldn't have done it like that but it is a quite neat solution. Thanks.
Select hash(destination) + hash(source) unique, first(destination), first(source), max(distance)
From table
Group by hash(destination) + hash(source);
Easier:
select
source as destination,
destination as source,
distance
from input
union
select
destination,
source,
distance
from input
Explanation: union will automatically remove duplicate rows ;)
it is not working
I think we simply do like below.
Select * from tablename
Where source > destination
yes..much easier and better.
Hi,
My approach to this Question
With cte as
(Select *,ROW_NUMBER() over (order by Distance) as RN from Distance)
Select Source,Destination,Distance from cte where RN %2 =1
I got it output....but is the approach right??
But it won't match if the input table is scrambled. Like delhi to mumbai is not just below mumbai to delhi
with cte as
(select row_number()over (order by distance ) as rowid,SOURCE,DISTINATION,DISTANCE from TRAVEL)
select SOURCE,DISTINATION,DISTANCE from cte where (rowid%2)0
I really enjoyed how you explain the win func and make easy for us to understand
Great work and delivery 🎉❤
select distinct greatest(source,destination),least(source,destination),distance from src_dest_distance
Explanation in Excel did the whole trick with focus on ID column, thanks a lot sir
Thank you for the Video. This SQL only works if there are always two sets of records (A->B, B->A). If there is only a single source - destination pair (S->D) it will not be part of the result.
select source,destination,distance From (select *,
LEAD(Source,1,Source) OVER() as Source1,
LEAD(Destination,1,Destination) OVER() as Destination1
FROM src_dest_distance) as a
WHERE source=destination1 AND destination=source1;
Is that correct please check?
Hi I think this can be done by using row number function and selecting the odd rows will give the desired output?
I think this will only solve this particular data.
You're assuming you're getting sorted pairs, can you do it with unsorted source and no ID?
This is what I came up with before I watched your solution:
select src, dest, dist
from cities c1
where (
select count(*)
from cities c2
where c2.src = c1.dest
and c2.src < c1.src) = 0;
Select source , destination, distance from table
Union
Select destination as souce, source as destination, distance from table
? Is it possible ?
I don't understand why we cannot just do:
SELECT * FROM table1 WHERE Source IN('Bangalore', 'Mumbai', 'Chenai')
??? Maybe the problem definition needs to be a bit more precise? Cool video either way :-)
The solution has to be dynamic in case there are other cities added to the table. It shouldn't be hardcoded.
with cte as(
select src_dest_distance.*,ROW_NUMBER()OVER() as "x" FROM src_dest_distance
)
select source,destination,distance from cte where x%2!=0;
Write a sql query on input table is like Item, no_items Apple, 8 Potato, 4 Banana,6 Tomato,2 And Output table should be like Item, sum_item Vegitables, 8 Fruits, 14
hi everything is excellent, but you mentioned the concept used in the title, so anyone who saw this video didn't get the opportunity to think on their own coz as the concept used is already mentioned, but you are doing a great job, to data community
very valid point. Hence I have just renamed the title to remove it so it help the future viewers.
Table name assumed - Location
WITH cte as
(SELECT * , LAG(Source,1,0) OVER() as comp
FROM Location)
SELECT Source,Destination,Distance
FROM cte
WHERE Source NOT IN (SELECT Source
FROM Location
WHERE Destination=comp)
Very nicely explained
So clear ! Thank you Thoufiq. Keep it coming man. More to watch and learn.
What if we just take the odd rows.
Select * from (Select *,row_number() over() as id from table) where (id&1)=1;
I would not hire you if you gave that answer. It solves the problem one time only, based on assumption about the input data. What if the order of input rows was randomized?
@@PhilAndOr lol
Best of the Best Techtfq, Awesome delivery consistency. great job ciao
My senior @jinal used to explain the data in the same way how you do here...loved your presentation.
select
t1.source, t1.destination, t1.distance
from src_dest_distance t1 left join src_dest_distance t2 on t1.destination = t2.source and t1.distance = t2.distance
where t1.source
with cte as(
Select *,row_number() over(order by distance ) as rn
from src_dest_distance),
cte2 as (Select *,row_number() over(order by distance ) as rn
from src_dest_distance)
select c1.source,c1.destination,c1.distance,c1.rn
from cte c1,cte2 c2
where c1.rn
is below query work ?
select * from src_dest_distance
where source > destination
I think wasy way to remove the duplicate records in the table is to use the union set operator.
Assuming that your table1 always has both directions, then SELECT * FROM TABLE1 WHERE source < destination will suffice otherwise SELECT DISTINCT CASE WHEN source > destination THEN destination ELSE source END source, CASE WHEN source > destination THEN source ELSE destination END destination, distance FROM table1 would work but be slower but should be significantly faster than a self-join.
What if you have two paths to the same destination from 1 city?
Here is my alternate solution: use append to combine start and destination cities, then use append to combine destination and start cities. Now you have a pair of "unique" (for a given pair of cities) identifiers. Using the same self join and row number, check if start-destination in destination-start before current row.
thanks for your efforts Toufiq! Jazakallohu hairon.
Sir i hv one question. Syntax kaise likhe ..i mean basics may kuch sikhate hay or yaha pe or kuch likhte hay
Hi Sir ,
Please let me know me this will work or not
with cte as (select *,least(source,destination) as lst,greatest(source,destination) gr from distance)
select source,destination,distance from (select *,row_number() over(partition by lst,gr) rnk from cte) sal where rnk =1;
Very clean and crisp explanation..Thanks Taufiq
You should use a union which automatically removes duplicates, just inverse the destination and source while unioning. Far lighter, easier above all much much faster.
I love the way bring content and training to the people but I noticed you are really record focused, try to solve issues in sets of data this will eliminate a lot of headache with growing data.
with cte as(
select source,destination,distance,row_number() over(order by distance) as row from src_dest_distance)
select source,destination,distance from cte where row in (1,3,5)
This Query also gave the same result. Is that correct?
No, as in the example the distances are all the same, the order is more or less random. With real distances it would be still a problem, because it is possible that two different pairs of cities have the same distance.
well laid out..thanks
Fabulous
We can concatenate both the columns and on that column we can get the duplicates rt?
MYSQL Solution for Freshers
With CTE as
(Select Source,Destination,Distances,
Case When Source>Destination then concat(Source,Destination) else concat(Destination,Source) end as Batch
from src_dest_distances)
Select Source,Destination,Distances from
(Select *,row_number() over (Partition by Batch) as Rn from CTE) N
Where Rn=1;
thank you bro.........can you do more videos on interview point of view
if you can use row numbering and sub-clauses, just do:
select *
from
( select *, row_number() as rid
from table)
where modulo(rid,2) = 0
And if modulo not allowed just do a diff even check, as wonky as:
round(rid/2) = rid/2
Hi sir,
Can you please make a video on how to update data of one table from another table using the merge concept.
It will be very helpful for me.
You are a true gem for data community ❤️
Could you not in this case also do a bitwise OR of the two fields and select distinct?
Hi, could please give solution for SQL server as well..
getting issue -
The function 'row_number' must have an OVER clause with ORDER BY.
Also dont think this solution will work for all type of combo as below -
can please check once?
insert into src_dest_distance values ('Bangalore', 'Hyderbad', 400);
insert into src_dest_distance values ('Hyderbad', 'Bangalore', 400);
insert into src_dest_distance values ('Bangalore', 'Kolkata', 500);
insert into src_dest_distance values ('Kolkata', 'Bangalore', 500);
insert into src_dest_distance values ('Mumbai', 'Delhi', 400);
insert into src_dest_distance values ('Kasmir', 'Mumbai', 1000);
insert into src_dest_distance values ('Delhi', 'Mumbai', 400);
insert into src_dest_distance values ('Mumbai', 'Kasmir', 1000);
insert into src_dest_distance values ('Chennai', 'Pune', 400);
insert into src_dest_distance values ('Chennai', 'HYD', 100);
insert into src_dest_distance values ('Pune', 'Chennai', 400);
insert into src_dest_distance values ('HYD', 'Chennai', 100);
We can write
Select * from table
Where source in ("Banglore","Mumbai","Chennai") ;
it's a 1-time solution, it doesn't solve the problem in the general case.
@@PhilAndOr yes bro
Nice explanation Taufiq 👌 👍 👏
Very Nice explanation 👌 thank you 😊
Hi Toufik,
Can you make a video on "case-insensitive pattern matching in PostgreSQL". I recently faced this issue when using wildcard, unlike mySQL Postgre isn't case-insensitive.
Thanks for the resources & study and practice materials you provide, they are very helpful.
in postgres, you can use 'ilike' for a case-insentive version of 'like'
Amazing 👏 keep it up 😇😇
Suppose there are another data also bangalore to mumbai and mumbai to bangalore. Then how join will be helofull because bangalore will be maped with two data.
Helpful.
Nice Presentation.. Keep it up..
There doesn't seem to be any order to the table and you did not provide any order to your query. You can end up with Hyderbad in row one and Bangalore in row two. Then your query would return the wrong row. Seems completely arbitrary. Maybe something was not provided to you. Otherwise just do this:
select * from table
where source in ('Bangalore', 'Mumbai', 'Chennai')
Hi Toufiq
it would fail for this scenario
CREATE TABLE DATA1(
SOURCE VARCHAR(50),
DESTINATION VARCHAR(50),
DISTANCE INTEGER
);
INSERT INTO DATA1
VALUES ('Ban', 'Hyd', 400),
('Hyd', 'Ban', 400),
('Mum', 'Del', 400),
('Del', 'Mum', 400),
('Che', 'Pun', 400),
('Pun', 'Che', 400),
('Ban', 'Del', 400),
('Del', 'Ban', 400);
The output should be 4 records, but we get 6 records as output
There are some huge mistakes
First of all it will work (but not every time) only with this set of data.Because you don't have a second condition of join: T1.Destination = T2.Source.
Next, you don't have "order by" in "over()" so sqlserver can rearrange data on his own and again not every time you get the same result in id column.
And last:we don't know what distance we should show if a-b and b-a distances are different (it is common if it's the road distance between two cities).And here too:there is no condition why they choose these cities in source and destination
Wha if there is another row which is chennai, bangalore, 400. Will it works?
Hi sir,
I am trying to solve in sql server but its getting following error. Then which col should i consider for order by clause
Msg 4112, Level 15, State 1, Line 147, The function 'row_number' must have an OVER clause with ORDER BY.
You can order by any column, the solution would still work
@@techTFQ Thankyou sir🙏
I also got the error even though after i gave order by clause i am still facing that error.
And now I watched full video: so you don't assume that one city can participate in 2 different pairs? Why you don't assume this? Where it was told?
Also, over() without order by: why in hell you expect that data will be fed in some specific order? It might be mixed however DB engine and/or storage decides to do that.
Really, this solution demonstrates only the fact that you even didn't try to solve the task.
Again we insert two rows
7. Bangalore to Hyderabad 400
8.Hyderabad to Bangalore 400
We have to fetch again one row is Bangalore to Hyderabad 400 , in that situation which condition we are apply.
In that case what we can do is first select distinct rows and take that as table so you won't find any issuse
Does this query work if we later, add another row with source Bangalore but a different destination? We are only checking that Bangalore is present in both columns, but not that they have the same corresponding source.
It will not work if you have different source for banglore multiple joins will happen for banglore so it only works if u have distinct source and distinct destination
IT IS NOT WORKING IN ORACLE WITH OUT OVER() CLAUSE
SELECT *
FROM src_dest_distance
WHERE LOWER(Source) < LOWER(Destination)
please share another way of doing that?