- Видео 157
- Просмотров 44 410
CognitiveCoders
Индия
Добавлен 4 июл 2021
Hello Coders,
This channel is to provide the knowledge
of Programming Languages,SQL, Azure Data Engineering, multiple Azure service related stuff.
Please follow me on LinkedIn for more Information.
1 Subscriber, 1👍🏻, 1Comment = 100 Motivation 🙏🏼
🙏🏻Please Subscribe 🙏🏼
This channel is to provide the knowledge
of Programming Languages,SQL, Azure Data Engineering, multiple Azure service related stuff.
Please follow me on LinkedIn for more Information.
1 Subscriber, 1👍🏻, 1Comment = 100 Motivation 🙏🏼
🙏🏻Please Subscribe 🙏🏼
How to add new column with source file name using ADF | Azure Data Factory | Real Time Scenario
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to add new column with source file name using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions.
Why Use Azure Data Factory?
Azure Data Factory (ADF) is a powerful cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation. It's a key component for any data engineer working with big data, ETL processes, and data lakes in the Azure environment.
🚀 Key Topics Covered:
Real...
Why Use Azure Data Factory?
Azure Data Factory (ADF) is a powerful cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation. It's a key component for any data engineer working with big data, ETL processes, and data lakes in the Azure environment.
🚀 Key Topics Covered:
Real...
Просмотров: 67
Видео
Top Data Engineering Interview Question | KANTAR Group | Pyspark Interview Question
Просмотров 124День назад
If you like this video please do like,share and subscribe my channel. data = [(1, "Alice", 1500, "2023-05-15"), (2, "Bob", 500, "2023-02-20"), (2, "Bob", 700, "2023-04-22"), (3, "Charlie", 1200, "2022-12-10"), (4, "Donald", 1200, "2024-12-10"), (5, "Tom", 800, "2023-12-08")] schema = ['id', 'name', 'purchase_amount', 'purchase_date'] PySpark playlist : ruclips.net/p/PL7DrGo85HcssOo4q5ihH3PqRRXw...
How to get count of files in a folder using ADF | Azure Data Factory | Real Time Scenario
Просмотров 8314 дней назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to grt the file count of a directory using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful clou...
Slowly Changing Dimension(SCD)Type1 using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 10021 день назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to implement Slowly Changing Dimensions (SCD) Type1 in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a p...
Top Data Engineering Interview Questions From Impetus | Pyspark | SQL | Interview Question
Просмотров 220Месяц назад
If you like this video please do like,share and subscribe my channel. data = [(1,'Indigo',7000,'2023-01-01'), (1,'Indigo',7500,'2023-01-05'), (1,'Indigo',7100,'2023-02-10'), (1,'Indigo',8200,'2023-02-15'), (1,'Indigo',8500,'2023-03-04'), (2,'Vistara',8000,'2023-01-02'), (2,'Vistara',8500,'2023-01-06'), (2,'Vistara',8200,'2023-02-12'), (2,'Vistara',9200,'2023-02-18'), (2,'Vistara',9500,'2023-03-...
How to create running total using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 86Месяц назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to create running quantity total using window data flow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is...
Top 4 Data Engineering Interview Questions | Accenture | Pyspark | Tehnical Round Question
Просмотров 202Месяц назад
data = [(1,1,'2023-01-01','Coverage_A',5000), (2,2,'2023-01-05','Coverage_B',7000), (1,3,'2023-02-10','Coverage_A',3000), (3,4,'2023-02-15','Coverage_C',4500), (2,5,'2023-03-03','Coverage_B',6000), (1,6,'2023-03-20','Coverage_A',8000), (1,7,'2023-04-02','Coverage_A',5500), (3,8,'2023-04-10','Coverage_C',7000), (2,9,'2023-05-05','Coverage_B',3500), (1,10,'2023-05-15','Coverage_A',9000), (3,11,'2...
How to create incremental key using Data Flow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 99Месяц назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to generate surrogate key using data flow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful cl...
How to remove duplicate rows using dataflow in ADF | Azure Data Factory | Real Time Scenario
Просмотров 85Месяц назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to remove duplicate data using dataflow in Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful clou...
Latest Data Engineering Interview Question from PWC | BigData | SQL | Azure Data Engineer
Просмотров 204Месяц назад
If you like this video please do like,share and subscribe my channel. PySpark playlist : ruclips.net/p/PL7DrGo85HcssOo4q5ihH3PqRRXwupRe65 PySpark RealTime Scenarios playlist : ruclips.net/p/PL7DrGo85HcstBR0D4881RTIzqpwyae1Tl Azure Datafactory playlist : ruclips.net/p/PL7DrGo85HcsueO7qbe3-W9kGa00ifeQMn Azure Data Factory RealTime Scenarios playlist : ruclips.net/p/PL7DrGo85HcsulFTAXy2cgcS6bWRlIs...
How to process fixed length text file using ADF DataFlow| Azure Data Factory | Real Time Scenario
Просмотров 542 месяца назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to handle fixed length text file using ADF dataflow. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powerful cloud-based da...
How to copy last n days data incrementally from ADLS Gen2 | Azure Data Factory | Real Time Scenario
Просмотров 1432 месяца назад
Welcome to our comprehensive Azure Data Factory RealTime scenarios where we'll take you through the process to Get last n days data from Source ADLS Gen2 using Azure Data Factory. Whether you're a beginner or looking to expand your Azure skills, this video is designed to help you master ADF with practical, step-by-step instructions. Why Use Azure Data Factory? Azure Data Factory (ADF) is a powe...
Delta Lake : Slowly Changing Dimension (SCD Type2) | Pyspark RealTime Scenario | Data Engineering
Просмотров 2482 месяца назад
Delta Lake : Slowly Changing Dimension (SCD Type2) | Pyspark RealTime Scenario | Data Engineering
How to copy latest or last modified file from ADLS Gen2| Azure Data Factory | Real Time Scenario
Просмотров 1722 месяца назад
How to copy latest or last modified file from ADLS Gen2| Azure Data Factory | Real Time Scenario
Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024
Просмотров 2,6 тыс.2 месяца назад
Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024
How to get source file name dynamically in ADF | Azure Data Factory | Real Time Scenario
Просмотров 1152 месяца назад
How to get source file name dynamically in ADF | Azure Data Factory | Real Time Scenario
Top LTIMindtree SQL Interview Questions | Data Engineering Career Guide 2024 | Data Engineering
Просмотров 2,7 тыс.3 месяца назад
Top LTIMindtree SQL Interview Questions | Data Engineering Career Guide 2024 | Data Engineering
How to upsert data into delta table using PySpark | Pyspark RealTime Scenario | Data Engineering
Просмотров 1763 месяца назад
How to upsert data into delta table using PySpark | Pyspark RealTime Scenario | Data Engineering
Write a pyspark code to get the given output | Data Engineering Interview Question | DealShare
Просмотров 1623 месяца назад
Write a pyspark code to get the given output | Data Engineering Interview Question | DealShare
11. Write spark code to find the employee count under each manager | Pyspark | SQL Solution
Просмотров 5073 месяца назад
11. Write spark code to find the employee count under each manager | Pyspark | SQL Solution
10. Find the employees with their primary department | Pyspark | SQL Solution
Просмотров 1353 месяца назад
10. Find the employees with their primary department | Pyspark | SQL Solution
Find the top three high earner employees from each department | Data Engineer Interview | Michaels
Просмотров 1654 месяца назад
Find the top three high earner employees from each department | Data Engineer Interview | Michaels
Find the popularity percentage for each user on Meta | Data Engineering Interview Question | Meta
Просмотров 1404 месяца назад
Find the popularity percentage for each user on Meta | Data Engineering Interview Question | Meta
Find the house that has won max no of battles for each region | Data Engineering Interview | fractal
Просмотров 2124 месяца назад
Find the house that has won max no of battles for each region | Data Engineering Interview | fractal
How to parameterize Linked Services in ADF | Azure Data Factory Tutorial for Begineers
Просмотров 814 месяца назад
How to parameterize Linked Services in ADF | Azure Data Factory Tutorial for Begineers
calculate the percentage difference of total sales Q1 & Q2 | Data Engineering Interview | Prologis
Просмотров 1204 месяца назад
calculate the percentage difference of total sales Q1 & Q2 | Data Engineering Interview | Prologis
Integration Runtime in ADF | Azure Data Factory Tutorial For Beginners
Просмотров 574 месяца назад
Integration Runtime in ADF | Azure Data Factory Tutorial For Beginners
BandhanBank SQL Interview Questions and Answers | Data Engineering | SQL Interview Question
Просмотров 4405 месяцев назад
BandhanBank SQL Interview Questions and Answers | Data Engineering | SQL Interview Question
How to Install Microsoft SQL Server & SSMS 20.1 on Windows | Complete guide
Просмотров 805 месяцев назад
How to Install Microsoft SQL Server & SSMS 20.1 on Windows | Complete guide
Write a query to find out third highest salary | SQL Interview Question | HCLTech
Просмотров 1,3 тыс.5 месяцев назад
Write a query to find out third highest salary | SQL Interview Question | HCLTech
Can you please make a similar video using pytest framework for testing databricks notebooks
Will do that
Nice content brother, usefull for all the aspiring Data Engineers.
it means a lot to us. please stay with us.
hello.... i am looking for Data engineer roles... mostly openings are in accenture,mindtree, deloiite, ....but how to check these kind of product based openings..
very disturbed explanation.. seems due to linguistic oddities
1. window_spec = Window.partitionBy("product").orderBy("sale_date") wdf = df.withColumn("2nd day pre sales" , lead("amount" , 2).over(window_spec))\ .withColumn("3rd day pre sales" , lag("amount" , 2).over(window_spec))
Thanks for sharing🎉
Can you please create video on DLT streaming tables. I'm facing issues while using SCD1. My bronze notebook is seperate and Silver notebook is seperate. I'm facing issues while calling bronze table as stream and loading into silver.
we'll create and upload. stay tuned with us
thanks for sharing. keep up the good work!!
query = spark.sql("""with cte as (select dept_id,emp_name,salary, row_number() over(partition by dept_id order by salary desc, emp_name) as rn, count(dept_id) over (partition by dept_id order by dept_id) as dept_count from emp ) select dept_id,max(case when rn = 1 then emp_name else Null end ) as max_salary, min(case when rn = dept_count then emp_name else Null end) as min_salary from cte group by dept_id""")
product_data = [(1, 'Laptop', 'Electronics'), (2, 'Jeans', 'Clothing'), (3, 'Chairs', 'Home Appliances')] product_schema = ['product_id', 'product_name', 'category'] product_df = spark.createDataFrame(product_data, product_schema) product_df.show() sales_data = [(1, 2019, 1000.00), (1, 2020, 1200.00), (1, 2021, 1100.00), (2, 2019, 500.00), (2, 2020, 600.00), (2, 2021, 900.00), (3, 2019, 300.00), (3, 2020, 450.00), (3, 2021, 400.00)] sales_schema = ['product_id', 'year', 'total_sales_revenue'] sales_df = spark.createDataFrame(sales_data, sales_schema) sales_df.withColumn("year",to_date("year",'YYYY').cast(DateType())) sales_df.show()
This question is for how many years of experience candidates ?
Can we Write In CTE2 AS ( SELECT DISTINCT company from cte1 where rnk =1) ?
Why your audio is echoing?
We've resolved the issue. From next video onward you'll not face the issue.
match_df1=match_df.withColumn('team',expr("concat((team_A),',',team_B)")) match_df1=match_df1.drop('team_A','team_B') match_df1.show() match_df1=match_df1.withColumn('team',split(col('team'),',')) match_df1=match_df1.withColumn('team',explode(col('team'))) match_df1=match_df1.select('team','win') match_df1.show() match_df2=match_df1.groupBy('team').agg(count('*').alias('played')) match_df3=match_df1.groupBy('win').agg((count('*')/2).cast('int').alias('total_win')) final_df=match_df2.join(match_df3,col('team')==col('win'),'left').orderBy(col('total_win').desc()) final_df=final_df.select('team','played','total_win',coalesce(col('total_win'),lit(0)).alias('total_wins')) final_df=final_df.drop('total_win') final_df.show() this is my alternative approach and its working well.
For duplicates, With CTE (Select *, row_number() over (partition by dept,name,salary order by salary) as rnk from emp) SELECT * from CTE where rnk = 1 Will get only the records with out duplicate
Here is my approach, this also works. from pyspark.sql.functions import * from pyspark.sql.window import Window win=Window.partitionBy('dept_id').orderBy(col('salary').desc()) df1=df.withColumn('highest_salary',dense_rank().over(win)) df1.show() df2=df1.groupBy('dept_id').agg(min(when(col('highest_salary')==2,col('emp_name'))).alias('min_salaried_emp')\ ,max(when(col('highest_salary')==1,col('emp_name'))).alias('max_salaried_emp')) df2.display()
i think we should use row_number not dense_rank, please clarify
Cant we use any other command for populating character * instead of repeat?
this was straightforward question, From the glimpse of the dataset, I came to know that we have to create an array and explode to get the room types.
Small suggestion. Your voice is echoing so little bit disturbance to hear you properly. Appreciate you for creating this kind of videos
Thanks for the feedback. We'll try to improve the sound quality
select distinct * from employee order by salary desc offset 2 rows fetch next 1 row only;
with cte as (select * from employee e1 where not exists ( select 1 from employee e2 where e1.emp_id = e2.manager_id)), cte2 as (select department ,emp_id, dense_rank() over(partition by department order by salary desc ) as dnk from cte) select department, emp_id,dnk from cte2 where dnk= 1;
bhai be original . trying to talk in different accent
df=spark.read.csv("/content/jobs.csv",header=True) df.show() result=df.groupBy('job').agg(count('name').alias('total_count')) result.show() rows=result.rdd.collect() result_dict = dict((row['job'], row['total_count']) for row in rows) print(result_dict)
customer_df=spark.createDataFrame(customer_data,customer_schema) order_df=spark.createDataFrame(order_data,order_schema) customer_df.show() order_df.show() group_df=customer_df.join(order_df,'customer_id','left_anti') group_df.show()
Haven’t you already filtered out nulls in the last second step? Coaelesce was unnecessary. cdf there was a spelling mistake it is cDf
Please share your solution for all the community members
To solve Question 1 use the following query: select name,dept,salary from emp e where salary = (select max(salary) from emp where dept = e.dept ) order by dept
Nice session. Thanks a lot.
Thanks 😌
Can you share your resume please .... I'm applying im naukri since one month not even single call or reply received. Please gudie me