tiger analytics interview questions and answers in pyspark |

Поделиться
HTML-код
  • Опубликовано: 5 фев 2025
  • tiger analytics interview questions and answers in pyspark | #interview | #dataengineers | #datascience | #dataanalytics
    "tiger analytics interview questions and answers in pyspark "
    Create DataFrame Code :
    ======================
    flights_data = [(1,'Flight1' , 'Delhi' , 'Hyderabad'),
    (1,'Flight2' , 'Hyderabad' , 'Kochi'),
    (1,'Flight3' , 'Kochi' , 'Mangalore'),
    (2,'Flight1' , 'Mumbai' , 'Ayodhya'),
    (2,'Flight2' , 'Ayodhya' , 'Gorakhpur')
    ]
    _schema = "cust_id int, flight_id string , origin string , destination string"
    df_flight = spark.createDataFrame(data = flights_data , schema= _schema)
    df_flight.show()
    Need Help ? Connect With me 1:1 - topmate.io/dew...
    Let's connect on LinkedIn : / dhirajgupta141
    top interview question and answer in pyspark :
    • top interview question...
    PySpark Installation and Setup : • Spark Installation | P...
    DSA In Python Interview Series : • dsa for data engineer ...
    PySpark Interview Series : • pyspark interview ques...
    Pandas Interview Series : • pandas interview quest...
    SQL Interview Series : • sql interview question...
    #fang #pyspark #sql #interview #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
    #dsafordataguy #dewithdhairy #DEwithDhairy #dhiarjgupta #leetcode #topinterviewquestion

Комментарии • 27

  • @DEwithDhairy
    @DEwithDhairy  Год назад +1

    SQL Version : Ankit Bansal ( Have done little Twist )
    ruclips.net/video/eMQDHHfUJtU/видео.html

  • @siddharthchoudhary103
    @siddharthchoudhary103 Год назад +14

    ans=df_flight.groupBy("cust_id").agg(first("origin"),last("destination"))
    display(ans)
    directly we can use this

    • @jhonsen9842
      @jhonsen9842 9 месяцев назад

      LoL you just made this question trivial,now a days interviewer may ask you to solve other way.

  • @Mouni_MahendraNath
    @Mouni_MahendraNath Год назад

    Thank you so much for sharing the knowledge, like and subscribed

    • @DEwithDhairy
      @DEwithDhairy  Год назад

      Do subscribe and
      Pls share in your network 😊

  • @JAVEDKHAN-nh2tw
    @JAVEDKHAN-nh2tw Год назад +1

    thanks alot
    keep going brother

  • @jhonsen9842
    @jhonsen9842 9 месяцев назад

    Well this question is good one. i solved it before using recursive cte in SQL

  • @gowrikm-n5u
    @gowrikm-n5u 11 месяцев назад +1

    ms sql server query
    with cte as(
    select *,
    ROW_NUMBER() over(partition by id order by fly) as r
    from flight
    )
    select distinct id,
    first_value(origin) over(partition by id order by r ) as start,
    LAST_VALUE(dest) over(partition by id order by r
    range between unbounded preceding and unbounded following) as endp
    from cte

  • @saheedahmed1474
    @saheedahmed1474 10 месяцев назад +2

    Below is my query, which works for any number of flights:
    with cte1 as (
    select *, row_number() over(partition by cid order by fid asc) as rn from flights),
    cte2 as(
    select cid,min(rn) as mi, max(rn) as mx from cte1 GROUP by cid)
    select ori.cid,ori.origin, des.destination from
    (
    select c1.* from cte1 c1
    inner join cte2 c2 on c1.cid=c2.cid and c1.rn=c2.mi) ori
    join (
    select c1.* from cte1 c1
    inner join cte2 c2 on c1.cid=c2.cid and c1.rn=c2.mx) des
    on ori.cid=des.cid;

  • @jhonsen9842
    @jhonsen9842 9 месяцев назад

    Great i am just curious to know how much time usually they give to solve it. More you pracitce faster you will solve but depends how good you are. Secondly what is the use of aggregate and max in the final result if i am grouping by cust_id and comparing when start == rn = 1 as origin and end == rn == 3 as destination in both origin and dest col.

  • @tanushreenagar3116
    @tanushreenagar3116 Год назад

    Best explanation 👌

  • @sudiptabhandari9691
    @sudiptabhandari9691 2 месяца назад

    It can be solve by lead, lag function...

  • @hariprasad3820
    @hariprasad3820 7 месяцев назад

    I'm little new to these type of questions, I have used another method to solve the same , can you tell me why this approach is not suited ?
    Select a.cust_id, a.origin, b.destination from (Select o.cust_id, o.origin from travelling_details o
    where origin not In ( Select t.destination from travelling_details t
    where t.cust_id = o.cust_id)) a join
    (Select o.cust_id, o.destination from travelling_details o
    where o.destination not In ( Select t.origin from travelling_details t
    where t.cust_id = o.cust_id)) b
    on a.cust_id = b.cust_id;

  • @mohammeddilshad1945
    @mohammeddilshad1945 14 дней назад

    with cte as (
    select *, row_number() over(partition by cust_id order by flight_id) as rn
    from flight order by flight_id asc
    )
    select cust_id,
    first(origin) as start,
    last(destination) as end
    from cte
    group by cust_id

  • @ashish.barmer1
    @ashish.barmer1 Год назад

    one doubt: how we ensure flight id describe the order of connectivity

    • @DEwithDhairy
      @DEwithDhairy  Год назад

      That's the assumption in this problem.

  • @AbhishekKumar_02
    @AbhishekKumar_02 Год назад +1

    Your solution will not work if flight10, flight11...flight15 will be there

    • @jegadheesj2688
      @jegadheesj2688 17 дней назад

      We can use isdigit(flightid)..this will extract only the numbers

  • @hhhj-iu4ss
    @hhhj-iu4ss 9 месяцев назад

    Bro.. Wrong approach

  • @Maheshwaripremierleague
    @Maheshwaripremierleague 8 месяцев назад +2

    WITH RankedFlights AS (
    SELECT
    customer_id,
    origin,
    destination,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY flight) AS row_num,
    MAX(ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY flight)) AS max_row_num
    FROM
    flights_data
    )
    SELECT
    customer_id,
    MAX(CASE WHEN row_num = 1 THEN origin END) AS origin,
    MAX(CASE WHEN row_num = max_row_num THEN destination END) AS destination
    FROM
    RankedFlights
    GROUP BY
    customer_id;