Tiger Analytics Set of 2 SQL Interview Problems | SQL For Beginners
HTML-код
- Опубликовано: 10 фев 2025
- In this video we will discuss 2 SQL interview problems asked in Tiger Analytics data analyst Interview. Here is the script to try yourself
00:00 Problem 1
04:48 Problem 2
problem 1:
CREATE TABLE flights
(
cid VARCHAR(512),
fid VARCHAR(512),
origin VARCHAR(512),
Destination VARCHAR(512)
);
INSERT INTO flights (cid, fid, origin, Destination) VALUES ('1', 'f1', 'Del', 'Hyd');
INSERT INTO flights (cid, fid, origin, Destination) VALUES ('1', 'f2', 'Hyd', 'Blr');
INSERT INTO flights (cid, fid, origin, Destination) VALUES ('2', 'f3', 'Mum', 'Agra');
INSERT INTO flights (cid, fid, origin, Destination) VALUES ('2', 'f4', 'Agra', 'Kol');
Problem 2:
CREATE TABLE sales
(
order_date date,
customer VARCHAR(512),
qty INT
);
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-01-01', 'C1', '20');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-01-01', 'C2', '30');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-02-01', 'C1', '10');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-02-01', 'C3', '15');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-03-01', 'C5', '19');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-03-01', 'C4', '10');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-04-01', 'C3', '13');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-04-01', 'C5', '15');
INSERT INTO sales (order_date, customer, qty) VALUES ('2021-04-01', 'C6', '10');
Zero to hero(Advance) SQL Aggregation:
• All About SQL Aggregat...
Most Asked Join Based Interview Question:
• Most Asked SQL JOIN ba...
Solving 4 Trick SQL problems:
• Solving 4 Tricky SQL P...
Data Analyst Spotify Case Study:
• Data Analyst Spotify C...
Top 10 SQL interview Questions:
• Top 10 SQL interview Q...
Interview Question based on FULL OUTER JOIN:
• SQL Interview Question...
Playlist to master SQL :
• Complex SQL Questions ...
Rank, Dense_Rank and Row_Number:
• RANK, DENSE_RANK, ROW_...
#sql #tigeranalytics #datanalytics
My solution to the first one :
select distinct cid, first_value(origin) over(partition by cid order by fid ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as actual_origin,
last_value(Destination) over(partition by cid order by fid ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as final_destination
from flights
select distinct cid, first_value(origin) over(partition by cid order by fid ) as actual_origin,
first_value(Destination) over(partition by cid order by fid desc) as final_destination
from flights
this only will get u the same result
@@rahulmehla2014 nice, this is more scalable
Please do like the video for more interview questions.
The best
Hi Ankit, my solution for 1 question is :
WITH cte1 AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY cid ORDER BY fid) as rk_origin,
ROW_NUMBER() OVER (PARTITION BY cid ORDER BY fid DESC) as rk_dest
FROM flights
)
SELECT
c1.cid,
c1.origin AS first_origin,
c2.destination AS last_destination
FROM cte1 c1
JOIN cte1 c2 ON c1.cid = c2.cid
WHERE c1.rk_origin = 1 AND c2.rk_dest = 1;
This approach ensures that regardless of how many entries each cid has, you always get the starting point and the final destination in the sequence based on fid ordering.
Great explanation. one more command to add in the first question in join condition-
o.cid = d.cid you missed. so for any new records having similar origin/destination records will be duplicated.
thanks for sharing
Hey Ankit!
Forever grateful to you for the ultimate content and for making SQL easier for us.
Here is my solution to the 1st question.
with cte as (
select *,
rank() over (partition by cid order by fid) as rnk
from flights
)
select cid,
max(case when rnk=1 then origin end) as origin,
max(case when rnk=2 then Destination end) as destination
from cte
group by cid
Hi Ankit,
You're truly a gem for people like me. god bless you. I tried 2nd question using this approach. please check and share your feedback
with cte as(
select min(order_date) as date,customer from sales group by customer
)
select date,count(*) as no_of_new_cus from cte group by date
Hi @Ankit
Your content has helped me a lot in Job switch
Please keep uploading videos, the way you explains makes a difficult question look like a easy peasy.
You have helped me in building confidence in MySql.
Thanks a lot Sir.
Glad to know 😊
Hi Ankit Sir,
Here is an additional alternative:
SELECT A.First_visit_date,
Count(DISTINCT customer) AS New_customer
FROM (SELECT customer,
Min(order_date) AS First_visit_date
FROM sales
GROUP BY customer) A
GROUP BY A.First_visit_date
Ques1: select o.cid, o.origin, d.destination
from flights o
inner join flights d
on o.destination = d.origin
and o.cid = d.cid;
Ques2: with cte as (
select customer, min(order_date) as order_date
from sales group by customer)
select order_date, count(distinct(customer)) from cte group by order_date ;
2nd solution alternative approach
with cte as (
select
order_date, customer,
row_number() over (partition by customer order by order_date) as rn
from Sales)
select order_date, count(rn) as new_customer
from cte
where rn =1
group by order_date
Thanks Ankit for the problem . Here's my take:
Problem 1 Sol:
select cid, max(or_rnk) as source, max(det_rnk) as destination from
(select *, first_value(origin) over (partition by cid order by fid) as or_rnk,
first_value(destination) over (partition by cid order by fid desc) as det_rnk
from flights) a
group by 1
order by 1;
Problem 2 Sol:
WITH CTE AS
(select *, DENSE_RANK() OVER (ORDER BY ORDER_dATE) AS DATE_RANK
from sales)
SELECT A.ORDER_DATE, COUNT(*) AS NEW_CUSTOMER_COUNT
FROM CTE A LEFT JOIN CTE B
ON A.DATE_RANK > B.DATE_RANK AND A.CUSTOMER = B.CUSTOMER
WHERE B.ORDER_DATE IS NULL
GROUP BY 1;
Hi @Ankit,
My approach for second solution would be like below. We can extract date also from order_date rather than using substr too.
select substr(a.order_date,6,2), count(distinct customer)
from customers a
where customer not in (select customer from customers where substr(a.order_date,6,2) > substr(order_date,6,2))
group by 1
order by 1
It's better to use date functions on date columns . Substring is best suited for string data types
with cte as(
select cid ,origin as loc ,'start_loc' as column_name from flights
union all
select cid ,destination as loc ,'end_loc' as column_name from flights ),
cte2 as(
select *,count(*) over(partition by cid,loc) as cnt
from cte )
select cid ,
max(case when column_name = 'start_loc' then loc end )as origin,
max(case when column_name = 'end_loc ' then loc end) as destination
from cte2
where cnt=1
group by cid
1.Logic fail when flight stop more than one stop like del,hyd,kol,blr (two stop hyd,kol before blr) thats why i write this way
;with cte1 as
(
select cid,origin from flights
except
select cid,destination as origin from flights
),cte2 as
(select cid,destination from flights
except
select cid,origin as destination from flights
) select c1.cid,c1.origin,c2.destination from cte1 c1
inner join cte2 c2
on c1.cid = c2.cid
Hi Ankit Sir, I have tried a solution like this :
Problem 1 :
with cte as (select *, row_number() over (partition by cid) as rn from flights )
select cid, destination as final_destination from cte where rn in (select max(rn) from cte group by cid) ;
Problem 2:
with cte as (
select customer, order_date,
dense_rank() over (partition by customer order by customer, order_date asc) as drnk
from sales
)
select month(order_date), count(customer) as new_customer_count from cte where drnk = 1
group by month(order_date);
solution 1 is wrong
This query works well:
select f1.cid, f1.origin, f2.destination from flights f1 JOIN flights f2 ON f1.cid = f2.cid and f1.fid < f2.fid;
Great explanation. But I think we can have an easier solution as well. I think in this solution of mine I am utilizing a loophole of the group by clause.
WITH cte AS (SELECT order_date, customer FROM sold
GROUP BY customer)
SELECT order_date, COUNT(customer) AS new_customer FROM cte
GROUP BY MONTH(order_date);
basically group by will automatically deliver us first row if no aggregate function is used in the CTE. Thus, we automatically get the first visit of the customer without using the row_number() window function.
For 1st question, What in case if there are 3 or more records for some id's. Let's say for id = 1, we have BNG -> HYD, then HYD -> DEL, then DEL -> PTN. How to solve this scenario.
Thank you for your all the work and community help you are doing in world of SQL.
I had also same question.
Solution:
WITH RankedData AS (
SELECT
cid,
origin,
destination,
FIRST_VALUE(origin) OVER (PARTITION BY cid ORDER BY fid) AS FirstOrigin,
LAST_VALUE(destination) OVER (PARTITION BY cid ORDER BY fid ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastDestination
FROM
flights
)
SELECT DISTINCT
cid,
FirstOrigin AS Origin,
LastDestination AS Destination
FROM
RankedData;
Awesome video, great explanation for each query and associated logic !
first solution another approach especially if there are multiple flights by 1 cust
with cte as (
select *, row_number() over(partition by cust_id order by flight_id) as rn
from flight order by flight_id asc
)
select cust_id,
first(origin) as start,
last(destination) as end
from cte
group by cust_id
My approach for the 1st one:
select distinct cid, first_value(origin) over(partition by cid order by fid ) as actual_origin,
first_value(Destination) over(partition by cid order by fid desc) as final_destination
from flights
For first solution , while joining we have put cid comparison also otherwise we will get wrong result if 2 cid has same connecting destination
--my solution(its dynamic)
with flight as(
select *,count(*) over(partition by cid) totalflights, row_number() over(partition by cid)as flightNumber from flights
)
select cid,MAX(origin) as origin,max(destination) as destination from(
select cid, case when flightNumber=1 then origin end as origin,
case when flightNumber=totalflights then destination end as destination
from flight)
group by cid
Great video sir💯
Thankyou for bringing such wonderful interview questions 🙏
ques 1-
with cte as
(
select *,
lead(Destination) over(partition by cid) as final_destination
from flights
)
select cid,origin, final_destination
from cte
where (case when final_destination is not null then 1 else 0 end) = 1 ;
ques 2-
with cte as
(
select customer,min(order_date) as order_date
from sale
group by customer
)
select monthname(order_date) as month,count(customer) as new_customer_count
from cte
group by monthname(order_date);
For first questions, doing a lead helps
with NextDesCTE as (
select *, lead(Destination) over (partition by cid order by cid, fid) as next_destination
from flights
)
select cid, fid, origin, next_destination as final_destination
from nextDesCTe
where next_destination is not null
;
with cte as
(select cid,origin,destination,lead(origin,1) over(order by cid) as origin_1,
lag(origin,1) over(order by cid) as origin_2 from flights)
select cid,max(case when destination=origin_1 then origin end) as origin,
max(case when destination!=origin_2 then destination end) as destination from cte
group by cid
if we have more than two flights of each customers then below will be more dynamic solution:
with cte as (
select cid, origin, Destination, Rank() over (partition by cid order by fid) as rnk
from flights),
cte1 as (
select cid, origin, Destination, Rank() over (partition by cid order by fid Desc) as rnk
from flights),
cte2 as (
select c1.cid, c1.origin, c2.Destination
from cte c1 inner join cte1 c2 on c1.cid=c2.cid
where c1.rnk=1 and c2.rnk=1)
select * from cte2;
u have done best among all for first solution
with cte as(select cid,origin as place,'origin'
as mode from flights
union all
select cid,destination,'destination' from flights)
,cte2 as(select cid,place,count(1) as cnt from cte
group by cid,place
having count(1)=1)
select b.cid
,max(case when mode='origin' then b.place end) as origin
,max(case when mode='destination' then b.place end) as destination
from cte2 a inner join cte b on a.cid=b.cid and a.place=b.place
group by b.cid
with cte as (
select * ,lead(origin) over(partition by cid order by cid) as ro ,
lead(destination) over(partition by cid order by cid) as rd from flights), ct2 as (
select * , case when destination =ro then 1
else 0 end as rk from cte
)
select cid,origin,rd as destination from ct2 where rk =1;
1.
select cid, origin, final_dest
from (
Select *,
lead(origin) over(partition by cid order by cid) as transit_source,
lead(destination) over(partition by cid order by cid) as final_dest
from flights)i
where destination=transit_source
more generalised sol for 1st problem: (to handle more than 2 rows for a flight)
with tab as (select *,lag(origin) over (partition by cid order by (select null)) prev_origin,
lead(Destination) over (partition by cid order by (select null)) nxt_des from flights)
select cid,max(case when prev_origin is null then origin end) as ori,max(case when nxt_des is null then Destination end) as Dest from tab
group by cid
2nd solu:
with first_date as (select customer,min(order_date) as first_date from sales group by customer)
select s.order_date,sum(case when order_date=first_date then 1 end) as new_cust
from sales s inner join first_date fd on s.customer=fd.customer
group by s.order_date
Approach for second question:
with cte as (select *, dense_rank() over(partition by customer order by order_date asc) as rnk from sales)
select order_date, sum(case when rnk = 1 then 1 else 0 end) as new_cust_cnt from cte group by 1;
Hi @ankitbansal6 . I am loving your content ❤.lets make the first question as more generic .when the customer has to board only one flight to reach his destination or have to board more than two flight to reach the destination, then we need to have the code like this.
With flight_order as (select cid,origin, destination,row_number() over (partition by cid order by fid asc) as flight_order, lead(destination) over (partition by cod order by fid asc) as next_flight from flights),
First_flight as (select * ,case when flight_order=1 then origin end as first_flight from flight_order),
Last_flight as (select *,case when next_flight is null then destination end as last_flight from flight_order),
Select o.cid, o.first_flight, d.last_flight from first_flight o inner join last_flight d on o.cid=d.cid where o.fist_flight is not null and d.last_flight is not null ;
I believe we can also try:
WITH CFE AS(
SELECT *,
FIRST_VALUE(ORIGIN) OVER(PARTITION BY CID ORDER BY FID) AS ORIGIN_DESTINATION,
LAST_VALUE(DESTINATION) OVER(PARTITION BY CID ORDER BY FID) AS FINAL_DESTINATION
FROM
FLIGHT)
SELECT DISTINCT CID, ORIGINAL_DESTINATION, FINAL_DESTINATION FROM CFE;
for 2nd question:
with cte as(select customer,order_date,
row_number() over(partition by customer order by order_date) as rn
from sales)
select order_date, sum(rn) as new_customer_count
from cte
where rn = 1
group by order_date
Hey Ankit
My Approach towards the problem
with cte as (
select *
,ROW_NUMBER()over(partition by cid order by fid) as rn
from flights )
select cid,max(case when rn=originf then origin else null end )as Origination
,max(case when rn=destinationf then Destination else null end )as Desination
from (
select *,min(rn)over() as originf,
max(rn)over() as destinationf
from cte)a
group by cid
with cte as (
select * ,
min(order_date)over(partition by customer) as first_purchase
from sales)
select order_date,SUM(flag) as new_customers from (
select *, case when order_date=first_purchase then 1 else 0 end as flag
from cte )a
group by order_date
with cte as (
select *,row_number() over(partition by cid) as rnk from flights
),
cte2 as
(select cid,min(rnk) as min_rnk,max(rnk) as max_rnk from cte group by 1)
select c2.cid,
max(case when c1.rnk=min_rnk then c1.origin end) as origin,
max(case when c1.rnk=max_rnk then c1.destination end) as destination
from cte2 c2
inner join cte c1
on c2.cid = c1.cid
group by 1;
@ankit bansal
Can we implement the Problem 1 in this way ?
SELECT DISTINCT cid,FIRST_VALUE(origin) over(partition by cid order by fid ASC) as origin ,
FIRST_VALUE(destination) over(partition by cid order by fid desc) as Dest
from flights
Great video..! Also bought your course, SQL zero to hero 🎉.
Awesome! Thank you!
@@ankitbansal6How can I purchase that course please guide me🙏
@@poonamwaghmare7205 here you go
www.namastesql.com/courses/SQL-For-Analytics-6301f405e4b0238f71788354
with base as (select date_format(order_date,'MMMM' ) as order_month,customer, rank() over(partition by customer order by order_date) as rnk
from sales order by order_date)
select order_month,
sum(case when rnk=1 then 1 else 0 end) as count
from base
group by order_month
Query 1
with cte1 as (select *,
lead(origin,1) over (partition by cid order by fid) as transit,
lead(destination,1) over (partition by cid order by fid) as finaldest
from flights)
select cid,origin,finaldest as destinationf from cte1
where destination=transit
and finaldest is not null;
Nice question
Plj thoda thoda hindi me bhi explain krdia kro ise thoda understand krne me easy rehta h
Thank You Man
with cte as(select t1.cid,t2.origin,t1.destination from flights as t1 join flights as t2 on
t1.origin=t2.destination)
select * from cte
BY USING SELF JOIN
with cte as(
select *,ROW_NUMBER() over(partition by cid order by fid) as rn
, case when ROW_NUMBER() over(partition by cid order by fid) = 1 then origin end as origin1
, case when ROW_NUMBER() over(partition by cid order by fid) = 2 then destination end as destination1
from flights)
select cid, max(origin1) as origin, max(destination1) as destination from cte
group by cid;
1) select f1.cid,f1.origin , f2.destination
from flights f1 join flights f2 on
f1.cid=f2.cid
where f1.destination=f2.origin
2) with cte as(
select order_date, customer ,row_number() over(partition by customer) as rn
from sales )
select order_date, count(distinct customer) as no_of_vis from(
select order_date, customer,rn from cte where rn=1)
group by order_date
finished watching
My approach
select cid,max(case when (cid,origin) not in (select cid,destination from flight group by cid,destination) then origin end) as origin,
max(case when (cid,destination) not in (select cid,origin from flight group by cid,origin) then destination end) as destination from flight group by cid;
sol 2:-
with cte as (
select month(order_date) as month, customer from sales
) select a.month,count( distinct a.customer) from cte as a left join cte as b on a.month>b.month and a.customer=b.customer
where b.month is null group by a.month;
select cid,origin,dd as destination from(
select*,lead(destination) over(partition by cid order by cid)as dd from flights) a
where dd is not null
I used a cross join here in !st problem
Select f1.cid,f1.origin, f2.destination from flights f1
cross join
flights f2
where f1.cid = f2.cid and
f1.destination = f2.origin
Second problem solution i feel this is better no need to use window
WITH CTE AS(
select customer,
EXTRACT(YEAR_MONTH FROM min(order_date)) as first_visit
from sales group by customer)
SELECT first_visit,COUNT(CUSTOMER) AS NEW_CUSTOMER_COUNT FROM CTE GROUP BY first_visit
with sample as (
select customer,min(order_date) as min_date from sales
group by customer)
select distinct to_char(min_date,'month'),count(customer) from sample group by min_date
select s.order_date,sum(case when order_date= minorder then 1 else 0 end)
from sales s
outer apply(select customer,min(order_date) minorder
from sales
where customer=s.customer
group by customer)a
group by s.order_date
With CTE1 as(
select *,
ROW_NUMBER() over(partition by customer order by order_date) as rn
from sales)
select order_date, count(distinct customer) as count_new_cust from CTE1
where rn = 1
group by order_date;
with cte as (
select order_date,customer, month(order_date) as month,
rank() over(partition by customer order by month(order_date)) as rn
from sales)
select order_date , count(customer) as new_customer
from cte
where rn = 1
group by order_date
my solution :
select order_date, count(*) from (
select *,row_number() over(partition by customer order by order_date) as rn from sales ) as tb where rn=1 group by order_date;
with temp as (
select
cid,
origin, destination,
RANK() OVER(PARTITION BY cid order by fid) rn1,
RANK() OVER(PARTITION BY cid order by fid desc) rn2
from
flights f
)
select
cid,
max(case when rn1 = 1 then origin end) as origin,
max(case when rn2 = 1 then destination end) as destination
from temp
group by cid
with cte as(
select *, row_number () over (partition by customer order by order_date) as cnt from sales)
select datepart(month,order_date) as mnt ,count(distinct customer) as new_cust_cnt from cte
where cnt=1
group by datepart(month,order_date)
order by mnt;
select f1.cid, f1.origin, f2.destination from flights f1 JOIN flights f2 ON f1.cid = f2.cid and f1.fid < f2.fid;
My Solution to the problem:-
with cte as (
select *, lead(destination) over() as EndDestination, row_number() over(partition by cid) as row_num from flights)
select cid, origin, EndDestination from cte where row_num =1
1)
select f1.cid, f1.origin, f2.destination
from flights f1
join flights f2
on f1.cid = f2.cid
and f1.destination = f2.origin
;
2)
select substr(odate,1,7) as Month, count(customer) New_customer_count
from
(select customer, min(order_date) as odate
from sales
group by customer) a
group by month
;
with cte as(
select order_date,Customer,
row_number() over(partition by customer order by order_date) as rn from sales )
select order_date,sum(rn) as total_new_customer from cte where rn=1 group by order_date
For the first problem, I think we have to add "o.cid=d.cid" as well in the join condition.
I too think same here there is no match for other flights
correct me ,If I am wrong . As we are using row_number , Distinct is not necessary for dealing with the duplicate data of same customer visiting in same month as , fnc gives different value to duplicates.
with cte as
(
select *, row_number()over(partition by customer order by month(order_date)) as rnk
from sales
)
select month(order_date) , count(*)
from cte
where rnk =1
group by month(order_date);
You are right
Thanks much for videos. I have given interview and this helped a lot
Wonderful!
Hi Sir,
Below is another alternative one
with cte as(
Select *,lead(cid) over(partition by cid order by fid) as cid_nxt,lead(origin) over(partition by cid order by fid) as origin_nxt,
lead(Destination) over(partition by cid order by fid) as Destination_nxt from flights)
select cid,origin,Destination_nxt as Destination from cte where cid_nxt is not null;
select o.cid,o.origin,d.destination as fd from flights o
join flights d on o.destination = d.origin
problem 1:
with cte as
(select cid,origin,
lead(destination) over(partition by cid order by cid asc) f_destination from flights)
select cid,origin,f_destination from cte
where f_destination is not null
with cte as(
select *from(
select *, row_number() over(partition by customer order by MONTH(order_date))as dd from sales)a
where dd=1)
select order_date,count(customer) as new_customer_count from cte group by order_date
FOR PROBLEM 1 - with cte1 as (SELECT * ,LEAD(DESTINATION)OVER(Partition by cid) FinalDestination from flights),
cte2 as
(select *, row_number () over(partition by cid) from cte1)
select cid ,origin , FinalDestination from cte2 where row_number = 1 ;
For the first question would it work for a generic case where there can be more than 1 connecting airport for example. Delhi -> Mumbai -> Bengaluru -> Chennai?
My Solution:
select origin,final_destination from (
select *, case when LEAD(origin) over (PARTITION by cid order by fid) = Destination
then
LEAD(Destination) over (PARTITION by cid order by fid)
end as final_destination
from flights) q where final_destination is not NULL
;with cte as (
select *
,min(order_date)over(partition by customer order by order_date) as first_order_date
from sales)
select month(order_date),
sum(case when order_date=first_order_date then 1 else 0 end ) as new_customer_count,
STRING_AGG(customer,',') as cust
from cte
group by month(order_date)
with cte1 as
(select *,
lead(Destination,1) over(partition by cid order by cid) as vv
from flights)
select cid,origin,vv
from cte1
where vv is not null;
This is temporary solution as we have only two sets of row for each CID.
problem 2:
with cte as
(select order_date,customer,qty, row_number() over(partition by customer order by order_date) rnk
from sales)
select order_date,count(rnk) from cte
where rnk=1
group by order_date
simple Solution:
select to_char(mnth,'MMMM-YY') as MonthN, count(customer) as cnt from
(select customer, min(order_date) as mnth from sales
group by customer) A
group by mnth
order by 1
With A as
(select customer, MIN(order_date) as first_visit from sales
group by customer)
select first_visit as monthh,count(customer) as cnt
from A
group by first_visit
with cte as
(
select cid,origin,
case when lead(origin) over(order by cid)=Destination then lead(Destination) over(order by cid) end as Destination
from [ops-qaprodcarlsbad-5003].[flights]
)
select * from cte where Destination IS NOT NULL
@AnkitBansal And for multiple entries like if there are n no of flights for one cid,then this would not work.SO i have coded that as:INSERT INTO [ops-qaprodcarlsbad-5003].[flights] (cid, fid, origin, Destination) VALUES ('1', 'f1', 'Del', 'Hyd');
INSERT INTO [ops-qaprodcarlsbad-5003].[flights] (cid, fid, origin, Destination) VALUES ('1', 'f2', 'Hyd', 'Blr');
INSERT INTO [ops-qaprodcarlsbad-5003].[flights] (cid, fid, origin, Destination) VALUES ('1', 'f3', 'Blr', 'Chennai');
INSERT INTO [ops-qaprodcarlsbad-5003].[flights] (cid, fid, origin, Destination) VALUES ('2', 'f4', 'Agra', 'Kol');
INSERT INTO [ops-qaprodcarlsbad-5003].[flights] (cid, fid, origin, Destination) VALUES ('2', 'f5', 'Kol', 'Kashmir');
WITH cte AS (
SELECT t1.cid, t1.origin, t2.destination,row_number() over(order by t1.cid) as rn_join
FROM [ops-qaprodcarlsbad-5003].[flights] t1
INNER JOIN [ops-qaprodcarlsbad-5003].[flights] t2 ON t1.Destination = t2.origin
),
cte2 as
(SELECT *,
LAST_VALUE(destination) OVER (PARTITION BY cid order by rn_join ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS new_Destination,
ROW_NUMBER() OVER (PARTITION BY cid ORDER BY rn_join) AS rn
FROM cte)
SELECT cid,origin,new_Destination FROM cte2 where rn=1
Hi @ankit
for the first problem what if we had the input table like the below?
1 f1 Del Hyd
1 f2 Hyd Blr
2 f3 Mum Agra
2 f4 Agra Kol
1 f3 Blr Koc
@ankitbansal6 pls answer this
Hi Ankit
My solution for 2 problem
with cte as (
select Month, Distinct customer as customer_new from sales)
select *,
count(customer) over(partition by month order by customer_new) as cnk
from cte
select first_order_date, count(customer) from
(
select customer, min(order_date) as first_order_date
from sales
group by customer
) A group by first_order_date
my solution:
select cid,origin,last_destination from (
select *,lead(destination) over(partition by cid order by destination) as last_destination from flights) as r1
where last_destination is not null
with cte as(
select f1.cid,f1.origin
from flights_bar f1
left join flights_bar f2
on f1.origin = f2.Destination
where f2.fid is null
),cte2 as(
select f1.cid,f1.Destination
from flights_bar f1
left join flights_bar f2
on f1.Destination = f2.origin
where f2.fid is null
)
select c1.cid,c1.origin,c2.destination
from cte c1
join cte2 c2
on c1.cid = c2.cid
@ankit bansal
select cid,origin,final_dest from (
select *,
lead(destination,1) over(partition by cid order by fid) final_dest
from flights)
where final_dest is not null
is this correct? for first solution , i think this is easy
my PySpark solution for Q1:
result_df = flights_df.groupBy("cid").agg(
first("origin").alias("origin"),
last("Destination").alias("Destination")
)
Hi Ankit,
For 1st Problem i tried with this approach
with cte as
(select cid,min(fid) min_f,max(fid) max_f from flights group by cid order by cid)
select c.cid,f.origin,f1.destination from cte c
join flights f on c.min_f=f.fid
join flights f1 on c.max_f=f1.fid;
2nd problem approach
with cte as
(select customer,min(order_date) min_order_date from sales
group by customer)
select min_order_date,count(distinct customer) new_customer from cte group by min_order_date;
let me know your thoughts on this?
What if the Destination and Origin are same but, the `cid` is different? Why are we not checking if the cid in first table is equal to the cid in the second table?
We are doing that in join condition
with cte as (select * ,
row_number() over(partition by customer order by order_date) as rn
from sales)
select monthname(order_date) as month,sum(rn) as new_c
from cte
where rn
FOR PROBLEM 2 - WITH CTE1 AS
(select TO_CHAR (ORDER_DATE,'MON') MONTHS, CUSTOMER FROM SALES)
, cte2 as (SELECT *,ROW_NUMBER()OVER(partition by customer ) from cte1)
,cte3 as (select * from cte2 where row_number=1)
select months ,count(customer) from cte3 group by months order by months DESC
Hello Ankit ji - For problem 2, if there is no new customer for a particular month then that month and it's count 0 won't be populated in output as we are filtering based on rn =1
Right 👍
I love your content Ankit, i have one doubt here in problem 1, what about the customers who have direct flight, no changes in between, then the query doesn't give the ouput. Here is my solution and your feedback will be appreciated
with cte as (
select *,
lead(destination) over(partition by cid) as final_destination,
count(fid) over(partition by cid) as flight_change
from flights
)
select cid, origin, case when final_destination is not null then final_destination
when flight_change =1 then destination end as destination
from cte
where final_destination is not null or flight_change = 1;
Can we do it in sql server -
SELECT DISTINCT cid,
FIRST_VALUE(origin) OVER (PARTITION BY cid order by cid ) AS origin,
LAST_VALUE(Destination) OVER (PARTITION BY cid order by cid ) AS d_estination
FROM flights;
with cte as (select customer, min(order_date) as month from sales
group by customer)
select count(customer) as new_cust_count, to_char(month, 'Month') from cte group by month;
PROBLEM 2 - Simple solution WITHOUT using window functions !🔥🔥🔥
select first_purchase as date,count(distinct customer) as new_cutomer_count
from(
select customer,min(order_date) as first_purchase
from sales
group by customer) a
group by first_purchase;
for second solution -----------
select order_date, count(distinct customer) count_new_customer
from
(select customer, min(order_date) as order_date
from sales
group by customer) a
group by order_date
order by order_date;
But second solution fail if we add any new customer on another like suppose we add C7 on 2021-01-02 then it will fail as the question says we need to find new customer added in a month.
Hello Ankit Sir, Your Solution for problem 1 as per my understanding is not correct if we have three different use cases like Del -> Hyd , Hyd -> Blr, Blr -> Vns, please correct me If I am wrong.
below is my sol for problem 1:
with cte as (Select cid,origin,destination, lead(origin) over (partition by cid order by cid)next_origin,
lag(origin) over (partition by cid order by cid)prev_origin
from
flights)
select cid,
max(case when prev_origin is null then origin end)origin,
max(case when next_origin is null then destination end)destination
from cte
group by cid
solution for 2nd:
select order_date,count(distinct customer) from
(
select customer,order_date,row_number() over (partition by customer order by order_date) as rn from sales) where rn =1
group by order_date;
😂ye different solution hai kya