CognitiveCoders
CognitiveCoders
  • Видео 157
  • Просмотров 44 410
How to add new column with source file name using ADF | Azure Data Factory | Real Time Scenario
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to add new column with source file name using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions.
Why Use Azure Data Factory?
Azure Data Factory (ADF) is a powerful cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation. It's a key component for any data engineer working with big data, ETL processes, and data lakes in the Azure environment.
🚀 Key Topics Covered:
Real...
Просмотров: 67

Видео

Top Data Engineering Interview Question | KANTAR Group | Pyspark Interview Question
Просмотров 124День назад
If you like this video please do like,share and subscribe my channel. data = [(1, "Alice", 1500, "2023-05-15"), (2, "Bob", 500, "2023-02-20"), (2, "Bob", 700, "2023-04-22"), (3, "Charlie", 1200, "2022-12-10"), (4, "Donald", 1200, "2024-12-10"), (5, "Tom", 800, "2023-12-08")] schema = ['id', 'name', 'purchase_amount', 'purchase_date'] PySpark playlist : ruclips.net/p/PL7DrGo85HcssOo4q5ihH3PqRRXw...
How to get count of files in a folder using ADF | Azure Data Factory | Real Time Scenario
Просмотров 8314 дней назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to grt the file count of a directory using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful clou...
Slowly Changing Dimension(SCD)Type1 using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 10021 день назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to implement Slowly Changing Dimensions (SCD) Type1 in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a p...
Top Data Engineering Interview Questions From Impetus | Pyspark | SQL | Interview Question
Просмотров 220Месяц назад
If you like this video please do like,share and subscribe my channel. data = [(1,'Indigo',7000,'2023-01-01'), (1,'Indigo',7500,'2023-01-05'), (1,'Indigo',7100,'2023-02-10'), (1,'Indigo',8200,'2023-02-15'), (1,'Indigo',8500,'2023-03-04'), (2,'Vistara',8000,'2023-01-02'), (2,'Vistara',8500,'2023-01-06'), (2,'Vistara',8200,'2023-02-12'), (2,'Vistara',9200,'2023-02-18'), (2,'Vistara',9500,'2023-03-...
How to create running total using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 86Месяц назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to create running quantity total using window data flow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is...
Top 4 Data Engineering Interview Questions | Accenture | Pyspark | Tehnical Round Question
Просмотров 202Месяц назад
data = [(1,1,'2023-01-01','Coverage_A',5000), (2,2,'2023-01-05','Coverage_B',7000), (1,3,'2023-02-10','Coverage_A',3000), (3,4,'2023-02-15','Coverage_C',4500), (2,5,'2023-03-03','Coverage_B',6000), (1,6,'2023-03-20','Coverage_A',8000), (1,7,'2023-04-02','Coverage_A',5500), (3,8,'2023-04-10','Coverage_C',7000), (2,9,'2023-05-05','Coverage_B',3500), (1,10,'2023-05-15','Coverage_A',9000), (3,11,'2...
How to create incremental key using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 99Месяц назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to generate surrogate key using data flow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful cl...
How to remove duplicate rows using dataflow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 85Месяц назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to remove duplicate data using dataflow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful clou...
Latest Data Engineering Interview Question from PWC | BigData | SQL | Azure Data Engineer
Просмотров 204Месяц назад
If you like this video please do like,share and subscribe my channel. PySpark playlist : ruclips.net/p/PL7DrGo85HcssOo4q5ihH3PqRRXwupRe65 PySpark RealTime Scenarios playlist : ruclips.net/p/PL7DrGo85HcstBR0D4881RTIzqpwyae1Tl Azure Datafactory playlist : ruclips.net/p/PL7DrGo85HcsueO7qbe3-W9kGa00ifeQMn Azure Data Factory RealTime Scenarios playlist : ruclips.net/p/PL7DrGo85HcsulFTAXy2cgcS6bWRlIs...
How to process fixed length text file using ADF DataFlow| Azure Data Factory | Real Time Scenario
Просмотров 542 месяца назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to handle fixed length text file using ADF dataflow. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful cloud-based da...
How to copy last n days data incrementally from ADLS Gen2 | Azure Data Factory | Real Time Scenario
Просмотров 1432 месяца назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to Get last n days data from Source ADLS Gen2 using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powe...
Delta Lake : Slowly Changing Dimension (SCD Type2) | Pyspark RealTime Scenario | Data Engineering
Просмотров 2482 месяца назад
Delta Lake : Slowly Changing Dimension (SCD Type2) | Pyspark RealTime Scenario | Data Engineering
How to copy latest or last modified file from ADLS Gen2| Azure Data Factory | Real Time Scenario
Просмотров 1722 месяца назад
How to copy latest or last modified file from ADLS Gen2| Azure Data Factory | Real Time Scenario
Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024
Просмотров 2,6 тыс.2 месяца назад
Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024
How to get source file name dynamically in ADF | Azure Data Factory | Real Time Scenario
Просмотров 1152 месяца назад
How to get source file name dynamically in ADF | Azure Data Factory | Real Time Scenario
Top LTIMindtree SQL Interview Questions | Data Engineering Career Guide 2024 | Data Engineering
Просмотров 2,7 тыс.3 месяца назад
Top LTIMindtree SQL Interview Questions | Data Engineering Career Guide 2024 | Data Engineering
How to upsert data into delta table using PySpark | Pyspark RealTime Scenario | Data Engineering
Просмотров 1763 месяца назад
How to upsert data into delta table using PySpark | Pyspark RealTime Scenario | Data Engineering
Write a pyspark code to get the given output | Data Engineering Interview Question | DealShare
Просмотров 1623 месяца назад
Write a pyspark code to get the given output | Data Engineering Interview Question | DealShare
11. Write spark code to find the employee count under each manager | Pyspark | SQL Solution
Просмотров 5073 месяца назад
11. Write spark code to find the employee count under each manager | Pyspark | SQL Solution
10. Find the employees with their primary department | Pyspark | SQL Solution
Просмотров 1353 месяца назад
10. Find the employees with their primary department | Pyspark | SQL Solution
Find the top three high earner employees from each department | Data Engineer Interview | Michaels
Просмотров 1654 месяца назад
Find the top three high earner employees from each department | Data Engineer Interview | Michaels
Find the popularity percentage for each user on Meta | Data Engineering Interview Question | Meta
Просмотров 1404 месяца назад
Find the popularity percentage for each user on Meta | Data Engineering Interview Question | Meta
Find the house that has won max no of battles for each region | Data Engineering Interview | fractal
Просмотров 2124 месяца назад
Find the house that has won max no of battles for each region | Data Engineering Interview | fractal
How to parameterize Linked Services in ADF | Azure Data Factory Tutorial for Begineers
Просмотров 814 месяца назад
How to parameterize Linked Services in ADF | Azure Data Factory Tutorial for Begineers
calculate the percentage difference of total sales Q1 & Q2 | Data Engineering Interview | Prologis
Просмотров 1204 месяца назад
calculate the percentage difference of total sales Q1 & Q2 | Data Engineering Interview | Prologis
Integration Runtime in ADF | Azure Data Factory Tutorial For Beginners
Просмотров 574 месяца назад
Integration Runtime in ADF | Azure Data Factory Tutorial For Beginners
BandhanBank SQL Interview Questions and Answers | Data Engineering | SQL Interview Question
Просмотров 4405 месяцев назад
BandhanBank SQL Interview Questions and Answers | Data Engineering | SQL Interview Question
How to Install Microsoft SQL Server & SSMS 20.1 on Windows | Complete guide
Просмотров 805 месяцев назад
How to Install Microsoft SQL Server & SSMS 20.1 on Windows | Complete guide
Write a query to find out third highest salary | SQL Interview Question | HCLTech
Просмотров 1,3 тыс.5 месяцев назад
Write a query to find out third highest salary | SQL Interview Question | HCLTech

Комментарии

  • @shivkumaryadav1652
    @shivkumaryadav1652 7 часов назад

    Can you please make a similar video using pytest framework for testing databricks notebooks

  • @YRaghul-C
    @YRaghul-C 7 дней назад

    Nice content brother, usefull for all the aspiring Data Engineers.

    • @CognitiveCoders
      @CognitiveCoders 6 дней назад

      it means a lot to us. please stay with us.

  • @saikrishna-q6d
    @saikrishna-q6d Месяц назад

    hello.... i am looking for Data engineer roles... mostly openings are in accenture,mindtree, deloiite, ....but how to check these kind of product based openings..

  • @shlokagarwalentertainment9747
    @shlokagarwalentertainment9747 Месяц назад

    very disturbed explanation.. seems due to linguistic oddities

  • @CharanSaiAnnam
    @CharanSaiAnnam Месяц назад

    1. window_spec = Window.partitionBy("product").orderBy("sale_date") wdf = df.withColumn("2nd day pre sales" , lead("amount" , 2).over(window_spec))\ .withColumn("3rd day pre sales" , lag("amount" , 2).over(window_spec))

  • @AkhilShaik-p4k
    @AkhilShaik-p4k Месяц назад

    Thanks for sharing🎉

  • @BeingHanumanLife
    @BeingHanumanLife Месяц назад

    Can you please create video on DLT streaming tables. I'm facing issues while using SCD1. My bronze notebook is seperate and Silver notebook is seperate. I'm facing issues while calling bronze table as stream and loading into silver.

    • @CognitiveCoders
      @CognitiveCoders 29 дней назад

      we'll create and upload. stay tuned with us

  • @siddhantmishra6581
    @siddhantmishra6581 Месяц назад

    thanks for sharing. keep up the good work!!

  • @rahuldave6699
    @rahuldave6699 Месяц назад

    query = spark.sql("""with cte as (select dept_id,emp_name,salary, row_number() over(partition by dept_id order by salary desc, emp_name) as rn, count(dept_id) over (partition by dept_id order by dept_id) as dept_count from emp ) select dept_id,max(case when rn = 1 then emp_name else Null end ) as max_salary, min(case when rn = dept_count then emp_name else Null end) as min_salary from cte group by dept_id""")

  • @rahuldave6699
    @rahuldave6699 Месяц назад

    product_data = [(1, 'Laptop', 'Electronics'), (2, 'Jeans', 'Clothing'), (3, 'Chairs', 'Home Appliances')] product_schema = ['product_id', 'product_name', 'category'] product_df = spark.createDataFrame(product_data, product_schema) product_df.show() sales_data = [(1, 2019, 1000.00), (1, 2020, 1200.00), (1, 2021, 1100.00), (2, 2019, 500.00), (2, 2020, 600.00), (2, 2021, 900.00), (3, 2019, 300.00), (3, 2020, 450.00), (3, 2021, 400.00)] sales_schema = ['product_id', 'year', 'total_sales_revenue'] sales_df = spark.createDataFrame(sales_data, sales_schema) sales_df.withColumn("year",to_date("year",'YYYY').cast(DateType())) sales_df.show()

  • @sibanandaroutray9721
    @sibanandaroutray9721 Месяц назад

    This question is for how many years of experience candidates ?

  • @harshavardhansaimachineni587
    @harshavardhansaimachineni587 Месяц назад

    Can we Write In CTE2 AS ( SELECT DISTINCT company from cte1 where rnk =1) ?

  • @gauravgaikwad2939
    @gauravgaikwad2939 2 месяца назад

    Why your audio is echoing?

    • @CognitiveCoders
      @CognitiveCoders 2 месяца назад

      We've resolved the issue. From next video onward you'll not face the issue.

  • @prajju8114
    @prajju8114 2 месяца назад

    match_df1=match_df.withColumn('team',expr("concat((team_A),',',team_B)")) match_df1=match_df1.drop('team_A','team_B') match_df1.show() match_df1=match_df1.withColumn('team',split(col('team'),',')) match_df1=match_df1.withColumn('team',explode(col('team'))) match_df1=match_df1.select('team','win') match_df1.show() match_df2=match_df1.groupBy('team').agg(count('*').alias('played')) match_df3=match_df1.groupBy('win').agg((count('*')/2).cast('int').alias('total_win')) final_df=match_df2.join(match_df3,col('team')==col('win'),'left').orderBy(col('total_win').desc()) final_df=final_df.select('team','played','total_win',coalesce(col('total_win'),lit(0)).alias('total_wins')) final_df=final_df.drop('total_win') final_df.show() this is my alternative approach and its working well.

  • @siddu1036
    @siddu1036 2 месяца назад

    For duplicates, With CTE (Select *, row_number() over (partition by dept,name,salary order by salary) as rnk from emp) SELECT * from CTE where rnk = 1 Will get only the records with out duplicate

  • @prajju8114
    @prajju8114 2 месяца назад

    Here is my approach, this also works. from pyspark.sql.functions import * from pyspark.sql.window import Window win=Window.partitionBy('dept_id').orderBy(col('salary').desc()) df1=df.withColumn('highest_salary',dense_rank().over(win)) df1.show() df2=df1.groupBy('dept_id').agg(min(when(col('highest_salary')==2,col('emp_name'))).alias('min_salaried_emp')\ ,max(when(col('highest_salary')==1,col('emp_name'))).alias('max_salaried_emp')) df2.display()

  • @prajju8114
    @prajju8114 2 месяца назад

    i think we should use row_number not dense_rank, please clarify

  • @prajju8114
    @prajju8114 2 месяца назад

    Cant we use any other command for populating character * instead of repeat?

  • @prajju8114
    @prajju8114 2 месяца назад

    this was straightforward question, From the glimpse of the dataset, I came to know that we have to create an array and explode to get the room types.

  • @june17you
    @june17you 2 месяца назад

    Small suggestion. Your voice is echoing so little bit disturbance to hear you properly. Appreciate you for creating this kind of videos

    • @CognitiveCoders
      @CognitiveCoders 2 месяца назад

      Thanks for the feedback. We'll try to improve the sound quality

  • @prajwalreddy2882
    @prajwalreddy2882 2 месяца назад

    select distinct * from employee order by salary desc offset 2 rows fetch next 1 row only;

  • @prajwalreddy2882
    @prajwalreddy2882 2 месяца назад

    with cte as (select * from employee e1 where not exists ( select 1 from employee e2 where e1.emp_id = e2.manager_id)), cte2 as (select department ,emp_id, dense_rank() over(partition by department order by salary desc ) as dnk from cte) select department, emp_id,dnk from cte2 where dnk= 1;

  • @KaiwalyaSevekar
    @KaiwalyaSevekar 2 месяца назад

    bhai be original . trying to talk in different accent

  • @vighneshbuddhivant8353
    @vighneshbuddhivant8353 2 месяца назад

    df=spark.read.csv("/content/jobs.csv",header=True) df.show() result=df.groupBy('job').agg(count('name').alias('total_count')) result.show() rows=result.rdd.collect() result_dict = dict((row['job'], row['total_count']) for row in rows) print(result_dict)

  • @vighneshbuddhivant8353
    @vighneshbuddhivant8353 2 месяца назад

    customer_df=spark.createDataFrame(customer_data,customer_schema) order_df=spark.createDataFrame(order_data,order_schema) customer_df.show() order_df.show() group_df=customer_df.join(order_df,'customer_id','left_anti') group_df.show()

  • @bulluhemanth9149
    @bulluhemanth9149 2 месяца назад

    Haven’t you already filtered out nulls in the last second step? Coaelesce was unnecessary. cdf there was a spelling mistake it is cDf

    • @CognitiveCoders
      @CognitiveCoders 2 месяца назад

      Please share your solution for all the community members

  • @arsalanansari5066
    @arsalanansari5066 2 месяца назад

    To solve Question 1 use the following query: select name,dept,salary from emp e where salary = (select max(salary) from emp where dept = e.dept ) order by dept

  • @vijaybandi5417
    @vijaybandi5417 3 месяца назад

    Nice session. Thanks a lot.

  • @ramswaroop1520
    @ramswaroop1520 3 месяца назад

    Can you share your resume please .... I'm applying im naukri since one month not even single call or reply received. Please gudie me