SQL Portfolio Project - Part 1 - Faaso's | Analytics | Ashutosh Kumar
HTML-код
- Опубликовано: 26 ноя 2022
- The best technique of learning a skill is completing a project which is based on real life scenario and when it comes to analytics , SQL is the one of the most asked skillset in all the interviews, so friends in this video i have created an end to end project on data analysis using SQL, thi will be relevant to all those people who want to make a career into data analytics, business analytics ,data science , in this video i have analysis on Faaso's dataset . You can find the code attached below -
DATASET CODE LINK - github.com/ashutoshkr103/Micr...
----------------------------------------------------------------------------------------------------------------------
Check out some more relevant content here
👉 SQL Portfolio Project Indian Census- Part 2
• SQL Portfolio Project ...
👉 SQL Portfolio Project - Shark Tank India
• SQL Portfolio Project ...
👉 How to Learn SQL
• How to learn sql for b...
👉 Complete playlist on Sql Interview questions and answers
• HackerRank SQL problem...
👉 Top free and best resources to learn analytics -
• Free and Best resource...
👉 How to become a data analyst complete roadmap-
• Data Analyst Complete ...
👉 Data analytics internships and job stipend-
• 70,000 + per month dat...
👉 Top 3 you tube channels to learn sql for free for beginners
• Video
👉 How to get data analytics internships-
• How to apply for data ...
👉Check out the roadmap to become a business analyst - • Business Analyst Compl...
👉All about analytics playlist-
• All about analytics
👉 Top 3 you tube channels to learn excel for free for beginners
• Top 3 you tube channel...
____________________________________________________________________
Fill the form below to subscribe yourself to the analytics jobs mailing list to receive regular job opening updates - docs.google.com/forms/d/e/1FA...
Why you should definitely fill the analytics job updates google form - • Job Openings into busi...
_______________________________________________________________________
Connect with me
📸Instagram - / ashutoszh
💻Linkedin- / ashutosh.analytics
_____________________________________________________________________
Comment down if you have any doubts
Please leave a LIKE 👍 and SUBSCRIBE ❤️ to my channel to receive more amazing content in data analytics and data science.
_____________________________________________________________________
🏷️ Tags
sql,
sql for analytics,
sql tutorials for beginners,
portfolio project sql,
sql portfolio project,
portfolio project on sql for resume,
sql interview portfolio project,
sql interview questions and answers,
sql for data analytics,
sql data science,
interview questions on sql,
sql hard questions,
sql problem solving,
ashutosh,
ashutosh kumar,
ashutosh kumar analytics
🏷️ HashTags
#sql #portfolio #project
wooooooow!!! should have discovered this earlier
Very helpfull ..keep uploading new projects on sql projects
Thanks a lot
Bohot sahi bhai keep it up great session
thanks a lot
can we use sub queries regularly like you did in video for interviews, is it good practice regarding performance?
For 3rd question
Select driver_id,count(distinct order_id) as successful_orders _delivered from driver_order where pickup_time not like "null" group by driver_id;
Answer should be 4,3,1 for driver 1,2,3 respectively
Hi ashutosh. Your sql advanced playlist really helped me for prepping for my interviews. Like out of many resources I found your explanation using excel to be amazing.
Just one thing, interviewers are rejecting me with a reason that I haven't worked with sql on scale/ on industry level data.
Can you tell me how can I build a sql project that involves real world data/ cloud architecture
Sure you can reach me on instagram
Keep uploading this...and just one advice if u can make a long video on medium and hard question especially on joins it will really be great.
Sure, btw did you watch this or not
ruclips.net/video/Bmj5ej62XjM/видео.html
@@AshutoshKumaryt sure bro i will watch this video but as I seen ur project videos itz very much interesting.. And if u can make full fledge videos espicially on joins questions it will really be great... And just how to approach complex queries how to break it down
pls do one project on Ecommerce
Great video.Please make more such videos if possible.Are you planning to start a series on Data Science?
Soon in near future
For 4th question
select *
from driver_order
where cancellation is NULL OR cancellation NOT IN ('Cancellation', 'Customer Cancellation')
As you have not taken the rows with 'null' values from cancellation column
sir i have one problem in you dataset after downloading a dataset from site i am import data in my sql so we have give error in data data function incorrect datatime value so what would we do because i am not change all date data accordig to my sql like example i am explain it example=you give me this date data 01-11-2020 according to your date data monthin first position,day is second position and year is third position in your dataset what if iam import this data so my sql gives error because my sql date data types is year is first position,month in second position and day in third position so, i have expect that you gave me a conclusion for this because sir i am not converted all date data of your datasets to convert type of mysql datetype dataa so,sir can you give me a conclusion for this .
at 21:05 u havent considered for null constraint so count is 3 less than actual count, coz 3 values are in null
Hi Ashutosh. Thank you very much for the project. Could you please provide the right code for q3, 19:35? It has 8 successful orders. But your total successful order count is 5 only.
SELECT A. driver_id, COUNT(ORDER_ID) SUCCESSFUL_ORDERS FROM
(SELECT *,CASE WHEN cancellation IN ('CANCELLATION','CUSTOMER CANCELLATION') THEN 'CANCEL' ELSE 'NOT CANCEL' END AS CANCELLATION_STATUS
FROM DRIVER_ORDER) A WHERE CANCELLATION_STATUS = 'NOT CANCEL'
GROUP BY driver_id
seems like there was a typo in resolving problem no4 ...the result which im getting is 8 since null and blank values are considered as successfull
select driver_id,sum(case when cancellation ='Cancellation' or cancellation='Customer Cancellation' then 0 else
1 end) as counting from [dbo].[driver_order] group by driver_id;
Hii ashutosh your videos are really helpful but can you make 1 video on how to present this project????
Thanks sure
This content is very helpful, Thaks
from where we could download this excel file or if you plz share it with us to practice.
The code can be found in the description box
Hi Ashutosh. Not able to find the excel sheet attached in the description. Not even in Github. So that I could practice it myself. Would be of great help..
Sir there is no excel sheet you can find the SQL code for the table in the description box of the video just copy the code paste in your SQL DBMS and you can start the project
I don't think we need to use a ranking window function in the last question, the reason why that is we are asked to find out the maximum number of rolls delivered and not which order had the maximum number of rolls ordered. We just need to find the maximum rolls, in which case a simple order by clause with a limit 1 clause is more than enough to retrieve it.
in the first question that's roll id not how many rolls have been ordered
alternative solution for problem 4 is :
with cte1
as(select *,case when cancellation in ('Cancellation','Customer Cancellation') then 'C' else 'NC' end as cancellation_status from driver_order),
cte2 as(select * from cte1 where cancellation_status = 'NC')
select c.order_id,c.roll_id,cancellation_status from customer_orders c
join cte2
on c.order_id=cte2.order_id
Ashutosh sir,
In first problem, should we not exclude the cancelled orders by joining the customer_order and driver_order table?
No , didi hindi meh samjata hu , the question is " Kitna order Company ko aaya" answer hai 14 , Ur point is valid when question is " kitna order deliver hua" then answer is 12 here u remove cancellation.
could you please tell how you imported this data in sql so that we can also run these query.
You can find the code in the description box of this video , just copy paste that's it
Bro plz make visualisation of this on tableau....a full project
Yes will be covering everything in phases
For 4 th question answer should be 9,,,,bcz on same order_id more than 1 orders are placed
I had a question. if I put this project in my resume and the interviewer asks from where I have got this data and the questions so what should I answer? Should I say that its a dummy dataset and the question are made by myself? Plz ANS!!!!!
Where's the dataset on your GitHub??
SQL is showcasing Null or Blank values when I used this Query , "select * from customer_orders c join driver_order d on c.order_id=d.order_id where d.cancellation not like '%cancel%'".
Any shortcut command to directly convert the date format from mm-dd-yyyy to yyyy-dd-mm in mysql?
Bro tell me also if u found it
Bro tell me also if u found it
ur content is much appreciated but mike sound is very noisy and loud
Thanks and sorry
Sir i learned data analyst from your channel sir but where to find the jobs sir pls make a video where to find jobs , how to find jobs tips and tricks sir 😭🙏🙏🙏❤️❤️❤️
Sure
HI Ashutosh , Could you pls check your question No 3 : output is wrong As initially before filtering out cancellation & customer cancellation the total number of successful order was 10 which is right & after applying NOT in (cancellation,Customer cancellation) the total number of successful orders were 5 only but ideally there should be 8 🙂as we only have 2 unsuccessful orders in table , I hope u get my point
Yes, you are right
Bro, Can you please write queries in a proper sql query format, it will be easy to read query
Sure
@@AshutoshKumaryt Bhaiyaa aap SQL ke classes lete ho kya ? want to learn SQL from youuuu.
Hi sir I am struggling with a problem as I copy paste the dataset but error comes in date format. I have to manually change the format to yyyy-mm-dd, is there any other way by which I can change the format at once, it takes a lot of time to do this manually ,please help.
I also facing the same problem
paste the code in note pad and using CTRL + H just replace which you wanted
i would recommend pasting the data in chatgpt to rectify all the error u r facing i did the same its working now... ig its showing error because im using VScode as editor but dont know for sure as im new to sql
Bro on quesion 4 : select c.roll_id , count(d.order_id) from customer_orders c join driver_order d on c.order_id=d.order_id where cancellation like 'No' group by roll_id; , this is the right answer instead of using subqueries. Subquires shouldn't be used that much as u r making a lot of table and tha reduces performance.
I have updated null and empty values with "No".
hey bro....plz share the excel file..... not able to find the file on github
Check link in description sir
can't find the excel data set.
Check link in the description box
@@AshutoshKumaryt i am also not able to find.....only txt file is in description box
@@shivamsingla7596 to make it more easy this time i have not included any sheets because in my previous video people were not able to import the sheets and many people were facing issues so this time i have just written the code you can copy paste the entire code in ssms and that exactly is the dataset , the amount of data is small, but not small enough to gain lot of knowledge, so just copy paste the same code in the text file in your ssms, that's it
6 the question me jab sab order k count ko rank diye, to usme rank 1,2,4 kaise aya? Why not 1,2,3
Diff between rank, row number ,dense rank
For 3rd Question : Shouldn''t be like the below code ?
select driver_id, sum(New_Cancellation)cnt from
(select driver_id,case when cancellation in ('cancellation','customer cancellation') then 0 else 1 end as New_Cancellation
from driver_order)a
group by driver_id
because the previous code :
SELECT driver_id,COUNT (distinct order_id)cnt
FROM driver_order
WHERE cancellation NOT IN ('cancellation','customer cancellation')
GROUP BY driver_id;
not taking 3 NULL values which was successfully delivered.
Yes good observation
for question 3
SELECT driver_id, COUNT(*) AS successful_orders
FROM driver_order
WHERE cancellation IS NULL OR cancellation = '' OR cancellation NOT IN ('cancellation', 'customer cancellation')
GROUP BY driver_id;
Hi . Third question answer is wrong . U forgot to include null in cancellation column
Exactly
6 the question me jab sab order k count ko rank diye, to usme rank 1,2,4 kaise aya? Why not 1,2,3
@@rinalzankar2812 dense_rank use kiye hai
it should be
WHERE cancellation IS NULL AND cancellation NOT IN ('Cancellation', 'Customer Cancellation')
Thanks a lot everyone
Hi Ashutosh,
you have done a slight mistake in question number 3 on finding successful orders by each driver.
there are just 2 cancellations in 10 orders which means a total of 8 successful orders but your answers have just 5 successful orders.
I think the correct query will be as follows:
with cte as
(
SELECT driver_id,
CASE
WHEN cancellation LIKE '%cancel%' THEN 0
ELSE 1
END AS successful_orders
FROM driver_order
)
SELECT driver_id,SUM(successful_orders)
FROM cte
GROUP BY driver_id;
I hope this helps. Thank you!!
Query looks fine Adarsh , thanks for noticing appreciate this
Hi Adarsh. Thanks for the answer. When I am running your code it still shows me 10 orders. It is 8 successful orders, right? Do you think where clause is missing?
@@satviksatvik3033 where clause is not required because case conditions are counted correctly
Got the correct answer running this query, SELECT driver_id, COUNT(duration) AS orders_delivered FROM driver_order
GROUP BY driver_id
Alternate solution to problem 4:
Select a.roll_id , count(*) from
customer_orders a join driver_order b
on a.order_id=b.order_id
where b.cancellation not like '%Cancellation%' or b.cancellation is Null
group by a.roll_id
This is wrong
@@shubhamkapoor85 SELECT B.roll_id, COUNT(ORDER_ID) FROM
(SELECT A.*,CASE WHEN cancellation IN ('CANCELLATION','CUSTOMER CANCELLATION') THEN 'CANCEL' ELSE 'NOT CANCEL' END AS CANCELLATION_STATUS FROM
(SELECT C.*, D.cancellation FROM customer_orders C
INNER JOIN
driver_order D ON C.order_id = D.order_id) A ) B WHERE CANCELLATION_STATUS = 'NOT CANCEL' GROUP BY B.roll_id
ROLL ID 1 HAS 9 ORDER DELIVERED
2 HAS 3 ORDERS
Hi friend,
for question - how many veg and non-veg rolls were ordered by each customer?
the solution is wrongly executed friend , please check once and update please
actual solution was
customer_id roll_id cnt_roll_id roll_name
101 1 6 Non Veg Roll
101 2 3 Veg Roll
102 1 6 Non Veg Roll
102 2 3 Veg Roll
103 1 9 Non Veg Roll
103 2 3 Veg Roll
104 1 9 Non Veg Roll
105 2 3 Veg Roll
English English English always English
Senior agar aapko Hindi aati hai toh Hindi mien b bol liya karo ussi mein achha lagta hai
my answer for
---what was the maximum number of rolls delivered in a single order
select top 1 d.order_id,count(d.roll_id) as cnt_of_orders from
(select * from
(select *,case when cancellation like ('%cancel%') then 'c' else 'nc' end as cancellation_details
from driver_order) a
where cancellation_details != 'c') b join customer_orders d on b.order_id=d.order_id
group by d.order_id
order by cnt_of_orders desc
For those who want dataset--
drop table if exists driver;
CREATE TABLE driver(driver_id integer,reg_date date);
INSERT INTO driver(driver_id,reg_date)
VALUES (1,'01-01-2021'),
(2,'01-03-2021'),
(3,'01-08-2021'),
(4,'01-15-2021');
drop table if exists ingredients;
CREATE TABLE ingredients(ingredients_id integer,ingredients_name varchar(60));
INSERT INTO ingredients(ingredients_id ,ingredients_name)
VALUES (1,'BBQ Chicken'),
(2,'Chilli Sauce'),
(3,'Chicken'),
(4,'Cheese'),
(5,'Kebab'),
(6,'Mushrooms'),
(7,'Onions'),
(8,'Egg'),
(9,'Peppers'),
(10,'schezwan sauce'),
(11,'Tomatoes'),
(12,'Tomato Sauce');
drop table if exists rolls;
CREATE TABLE rolls(roll_id integer,roll_name varchar(30));
INSERT INTO rolls(roll_id ,roll_name)
VALUES (1 ,'Non Veg Roll'),
(2 ,'Veg Roll');
drop table if exists rolls_recipes;
CREATE TABLE rolls_recipes(roll_id integer,ingredients varchar(24));
INSERT INTO rolls_recipes(roll_id ,ingredients)
VALUES (1,'1,2,3,4,5,6,8,10'),
(2,'4,6,7,9,11,12');
drop table if exists driver_order;
CREATE TABLE driver_order(order_id integer,driver_id integer,pickup_time datetime,distance VARCHAR(7),duration VARCHAR(10),cancellation VARCHAR(23));
INSERT INTO driver_order(order_id,driver_id,pickup_time,distance,duration,cancellation)
VALUES(1,1,'01-01-2021 18:15:34','20km','32 minutes',''),
(2,1,'01-01-2021 19:10:54','20km','27 minutes',''),
(3,1,'01-03-2021 00:12:37','13.4km','20 mins','NaN'),
(4,2,'01-04-2021 13:53:03','23.4','40','NaN'),
(5,3,'01-08-2021 21:10:57','10','15','NaN'),
(6,3,null,null,null,'Cancellation'),
(7,2,'01-08-2020 21:30:45','25km','25mins',null),
(8,2,'01-10-2020 00:15:02','23.4 km','15 minute',null),
(9,2,null,null,null,'Customer Cancellation'),
(10,1,'01-11-2020 18:50:20','10km','10minutes',null);
drop table if exists customer_orders;
CREATE TABLE customer_orders(order_id integer,customer_id integer,roll_id integer,not_include_items VARCHAR(4),extra_items_included VARCHAR(4),order_date datetime);
INSERT INTO customer_orders(order_id,customer_id,roll_id,not_include_items,extra_items_included,order_date)
values (1,101,1,'','','01-01-2021 18:05:02'),
(2,101,1,'','','01-01-2021 19:00:52'),
(3,102,1,'','','01-02-2021 23:51:23'),
(3,102,2,'','NaN','01-02-2021 23:51:23'),
(4,103,1,'4','','01-04-2021 13:23:46'),
(4,103,1,'4','','01-04-2021 13:23:46'),
(4,103,2,'4','','01-04-2021 13:23:46'),
(5,104,1,null,'1','01-08-2021 21:00:29'),
(6,101,2,null,null,'01-08-2021 21:03:13'),
(7,105,2,null,'1','01-08-2021 21:20:29'),
(8,102,1,null,null,'01-09-2021 23:54:33'),
(9,103,1,'4','1,5','01-10-2021 11:22:59'),
(10,104,1,null,null,'01-11-2021 18:34:49'),
(10,104,1,'2,6','1,4','01-11-2021 18:34:49');
select * from customer_orders;
select * from driver_order;
select * from ingredients;
select * from driver;
select * from rolls;
select * from rolls_recipes;
Thanks bro
@@AshutoshKumaryt thanks a ton bhiaya !!!
---Hi bro,
--I solved your question in my way.
--If you have time just take a look on that
--1. How many rolls were ordered ?
select count(roll_id) Total_rolls_ordered from customer_orders;
--2. How many unique customers are made ?
select count(distinct(customer_id)) Total_customer from customer_orders
--3. How many successful order delivered by each driver ?
select COUNT(cancellation) Successfull_order from driver_order
where cancellation NOT IN ('Cancellation','Customer Cancellation')
--OR
select driver_id, COUNT(distinct(order_id)) Successfull_order from driver_order
where cancellation not in ('Cancellation', 'Customer Cancellation')
group by driver_id
--4.How many of each type of roll was delivered ?
select c.roll_id,
count(c.roll_id) Successfull_delivered_roll
--case when d.cancellation in ('Cancellation', 'Customer Cancellation') then 'c' else 'nc' end
from customer_orders c inner join driver_order d on c.order_id=d.order_id
where (case when d.cancellation in ('Cancellation', 'Customer Cancellation') then 'c' else 'nc' end) = 'nc'
group by roll_id;
--5. How many Veg & Nonveg rolls were ordered by each customer ?
select c.customer_id,
r.roll_id,
count(r.roll_id) cnt,
r.roll_name
from customer_orders c inner join rolls r on c.roll_id=r.roll_id
group by customer_id, r.roll_id,r.roll_name;
--6. What was the maximum number of rolls delivered in a single order ?
select TOP 1
c.order_id,
count(roll_id) cnt
from customer_orders c inner join driver_order d on c.order_id=d.order_id
where (case when d.cancellation in ('Cancellation', 'Customer Cancellation') then 'c' else 'nc' end) = 'nc'
group by c.order_id
order by (count(roll_id)) desc;
Great
there is no reason for use distinct