Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

Поделиться
HTML-код

Комментарии • 7

  • @hdr-tech4350
    @hdr-tech4350 6 месяцев назад +7

    Source type, project discussion
    Handling duplicates
    Delta lake feature
    Spark vs dbx
    Power bi connect to synapse
    Spark architecture
    Dag
    Client mode vs cluster mode
    Df vs dataset
    Normalisation
    2nd highest salary in dep

  • @gudiatoka
    @gudiatoka 7 месяцев назад +7

    When someone saying they are optimizing the code in databricks..all are faking😂😂.
    Spark itself optimize your code using catalytst optimizer/Spark sql engine and after spark 3.0 when Adaptive Query Execution(AQE) introduced it also optimized join during run time and we can alter the broadcast threshold which is part of admin team during databricks cluster creation
    The only things didnt impact by above two is those things stored inside user defined memory like udfs and low level programming on rdd ops which now a days no one doing in databricks.last one is caching manually also

    • @LearnifyTvKannada-ue6op
      @LearnifyTvKannada-ue6op 6 месяцев назад

      ​@SrihariSrinivasDhanakshirurexactly there are a lot of other optimisations

    • @NoobForReason
      @NoobForReason 4 месяца назад

      @SrihariSrinivasDhanakshirurbut these are the code level optimisations like you said bucketing, partitioning

    • @gauravgaikwad2939
      @gauravgaikwad2939 3 месяца назад +2

      Yes, you are 100% correct. But still there are some optimization which we can perform like
      - Using Predicate and Projection Pushdown to cut down the data read.
      - Caching or persisting data if it's reused frequently.
      - Choosing built-in functions over UDFs, since they’re more optimized.
      - And picking efficient file formats like Parquet for better compression and speed.
      Please correct if needed.

    • @maninderbhambra3796
      @maninderbhambra3796 Месяц назад

      Correct brother and moment you select photon acceleration in cluster creation ..data.bricks take care if itself

  • @rainbowhappy6968
    @rainbowhappy6968 2 месяца назад

    seems rataa laga kar aaya hai bhai :).... anyway he answered all questions very well