tiger analytics interview questions and answers in pyspark |
HTML-код
- Опубликовано: 5 фев 2025
- tiger analytics interview questions and answers in pyspark | #interview | #dataengineers | #datascience | #dataanalytics
"tiger analytics interview questions and answers in pyspark "
Create DataFrame Code :
======================
flights_data = [(1,'Flight1' , 'Delhi' , 'Hyderabad'),
(1,'Flight2' , 'Hyderabad' , 'Kochi'),
(1,'Flight3' , 'Kochi' , 'Mangalore'),
(2,'Flight1' , 'Mumbai' , 'Ayodhya'),
(2,'Flight2' , 'Ayodhya' , 'Gorakhpur')
]
_schema = "cust_id int, flight_id string , origin string , destination string"
df_flight = spark.createDataFrame(data = flights_data , schema= _schema)
df_flight.show()
Need Help ? Connect With me 1:1 - topmate.io/dew...
Let's connect on LinkedIn : / dhirajgupta141
top interview question and answer in pyspark :
• top interview question...
PySpark Installation and Setup : • Spark Installation | P...
DSA In Python Interview Series : • dsa for data engineer ...
PySpark Interview Series : • pyspark interview ques...
Pandas Interview Series : • pandas interview quest...
SQL Interview Series : • sql interview question...
#fang #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion
SQL Version : Ankit Bansal ( Have done little Twist )
ruclips.net/video/eMQDHHfUJtU/видео.html
ans=df_flight.groupBy("cust_id").agg(first("origin"),last("destination"))
display(ans)
directly we can use this
LoL you just made this question trivial,now a days interviewer may ask you to solve other way.
Thank you so much for sharing the knowledge, like and subscribed
Do subscribe and
Pls share in your network 😊
thanks alot
keep going brother
Do share in your network 😊
Well this question is good one. i solved it before using recursive cte in SQL
ms sql server query
with cte as(
select *,
ROW_NUMBER() over(partition by id order by fly) as r
from flight
)
select distinct id,
first_value(origin) over(partition by id order by r ) as start,
LAST_VALUE(dest) over(partition by id order by r
range between unbounded preceding and unbounded following) as endp
from cte
Below is my query, which works for any number of flights:
with cte1 as (
select *, row_number() over(partition by cid order by fid asc) as rn from flights),
cte2 as(
select cid,min(rn) as mi, max(rn) as mx from cte1 GROUP by cid)
select ori.cid,ori.origin, des.destination from
(
select c1.* from cte1 c1
inner join cte2 c2 on c1.cid=c2.cid and c1.rn=c2.mi) ori
join (
select c1.* from cte1 c1
inner join cte2 c2 on c1.cid=c2.cid and c1.rn=c2.mx) des
on ori.cid=des.cid;
Great i am just curious to know how much time usually they give to solve it. More you pracitce faster you will solve but depends how good you are. Secondly what is the use of aggregate and max in the final result if i am grouping by cust_id and comparing when start == rn = 1 as origin and end == rn == 3 as destination in both origin and dest col.
Best explanation 👌
Thanks a lot 😊
@@DEwithDhairy😊
It can be solve by lead, lag function...
I'm little new to these type of questions, I have used another method to solve the same , can you tell me why this approach is not suited ?
Select a.cust_id, a.origin, b.destination from (Select o.cust_id, o.origin from travelling_details o
where origin not In ( Select t.destination from travelling_details t
where t.cust_id = o.cust_id)) a join
(Select o.cust_id, o.destination from travelling_details o
where o.destination not In ( Select t.origin from travelling_details t
where t.cust_id = o.cust_id)) b
on a.cust_id = b.cust_id;
with cte as (
select *, row_number() over(partition by cust_id order by flight_id) as rn
from flight order by flight_id asc
)
select cust_id,
first(origin) as start,
last(destination) as end
from cte
group by cust_id
one doubt: how we ensure flight id describe the order of connectivity
That's the assumption in this problem.
Your solution will not work if flight10, flight11...flight15 will be there
We can use isdigit(flightid)..this will extract only the numbers
Bro.. Wrong approach
WITH RankedFlights AS (
SELECT
customer_id,
origin,
destination,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY flight) AS row_num,
MAX(ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY flight)) AS max_row_num
FROM
flights_data
)
SELECT
customer_id,
MAX(CASE WHEN row_num = 1 THEN origin END) AS origin,
MAX(CASE WHEN row_num = max_row_num THEN destination END) AS destination
FROM
RankedFlights
GROUP BY
customer_id;