Very complex approach in the video. Here is the most simplest approach (MySQL) : - SELECT customer_id, customer_name, ROUND(SUM(billed_amount) / (3 - COUNT(DISTINCT YEAR(billing_creation_date)) + COUNT(*)),1) AS avg_billed_amount FROM billing WHERE YEAR(billing_creation_date) BETWEEN 2019 and 2021 GROUP BY 1, 2
@shubhamagrawal7068 I have written the below code for this but not getting the desired output. Can you help me find the mistake please. select customer_id,customer_name,sum(billing_amount)/(count(*)+3-count(distinct(year(x.billing_year)))) from( select *,year(billing_creation_date) as billing_year from billing where year(billing_creation_date) between 2019 and 2021) x group by customer_id,customer_name;
@@amadei2 No, it won't. Try to grasp the concept behind it. Suppose you have given 3 years to evaluate for, as in our case - 2019 to 2021. As for records- consider 1st id , it has 3 records, for 2020 twice and 2021. No. of years to evaluate for (2019 to 2021) = 3 No. of years (in 1st record )= 3 (i.e 2020, 2020, 2021) No. of distinct years(in 1st record) = 2 (i.e 2020, 2021) so the formula is - { No. of years (to evaluate for) - No.of distinct years } + No. of years = { 3 - Count(Distinct(years)) } + Count * = {3-2} + 3 = 4 for 1st record i.e id = 1, name =A Sum = 350 Count = 4 Avg = 350/4 = 87.5 . Hoping, now it is cleared.
A little browsing, we can use recursive cte to generate date rows between start and end, And we just do right/left join from main table and the generated rows with year(tbl1.date) and year(tbl2.date) From that join, we will get result null if there's no trx for that user in that year Last, we just group by year(date) and customer_id, and select sum(trx amount)/count(trx amount)
Nice question and a great explanation. Thanks a lot. I also tried my own and used string functions to come up with the solution. with cte1 as ( select customer_id, customer_name, count(*) as ct , sum(billed_amount) as amount, string_agg(cast(extract (year from billing_creation_date) as varchar), ',' ) as all_year from billing where extract(year from billing_creation_date) in (2019,2020,2021) group by 1,2 ) , cte2 as ( select *, case when all_year not like '%2019%' then 1 else 0 end as is_2019, case when all_year not like '%2020%' then 1 else 0 end as is_2020, case when all_year not like '%2021%' then 1 else 0 end as is_2021 from cte1 )
select customer_id, customer_name, round(amount*1.0/(ct+is_2019 + is_2020+is_2021),2)as avg from cte2
@@KavishSrivastava thanks for the comment , I got a chance to revisist this question again, but now when i solved it again,i used different approach using the joins. sharing the recent solution below: with cte_year as ( SELECT 2019 AS year UNION ALL SELECT 2020 UNION ALL SELECT 2021 ) , cte_customer as ( select distinct customer_id from billing where year(billing_creation_date) in(2019,2020,2021) ) , cte_customer_with_year as ( select * from cte_customer cross join cte_year ) select cte_customer_with_year.customer_id , round(sum(billed_amount)/count(*),2) as average_billing_amount from cte_customer_with_year left join billing on cte_customer_with_year.customer_id = billing.customer_id and cte_customer_with_year.year = year(billing.billing_creation_date ) group by cte_customer_with_year.customer_id
@@muditmishra9908 again a good approach which filters extra steps compare to your previous one, but the current query's output is incomplete which is missing customer_name in the result, which will eventually requires some changes in the code.
here a try I gave : with c as ( SELECT f.customer_id, f.customer_name ,year, billed_amount FROM (SELECT DISTINCT customer_id, customer_name FROM billing) f CROSS JOIN (SELECT 2019 AS year UNION SELECT 2020 AS year UNION SELECT 2021 AS year) y LEFT JOIN billing b ON b.customer_id = f.customer_id AND YEAR(b.billing_creation_date) = y.year ORDER BY customer_id, year, billed_amount) SELECT customer_id, customer_name , ROUND(AVG(COALESCE(billed_amount, 0)),2) AS avg_billed_amount from c GROUP BY customer_id,customer_name ;
Great video explanation and walk through. At the very end, I recommend doing a final step by step summary of the entire formula just to reiterate exactly what is going on and then pausing. Thanks for everything that you do!
Yet another gem from you Sir. Hats off man. Maybe just comment each step directly in the query so that the walk-through makes even more sense. Even though it is already pretty clear. Thanks again for everything you’re doing for the Data Community. This is priceless PS: have you ever considered putting together a comprehensive series of tutorials for UDEMY based on proficiency levels? SQL and Pyhton for Data Analysis. I’d buy them for sure if you did 😊
Hi Taufiq, First of all thank you for your service. You are doing a wonderful job for citizen data analyst like me. Hence I am sharing my code for review. SQL flavor - PostgreSQL. with year as ( Select extract(year from billing_creation_date) as year from billing where extract(year from billing_creation_date) is between '2019' and '2021' ) Select b.customer_id as customer_id, b.customer_name as cust_name, avg(coalesce(b.billing_amount,0)) as avg_bill_amt from year y left join billing b on y.year = extract(year from b.billing_creation_date) group by y.year
Nice!! I’ve never seen someone explain SQL problems for interviews and gotta admit I love the format and the way you explain it. Insta-subscribed🎉. I’ve been practicing in codewars but I always feel like they’re either too simple tasks asking for a JOIN or something I just have never thought before. This really helped me improving my problem solving skills and the way I tackle SQL problems
Thank to your lessons I was able to solve this query with joins in MySQL with cte as (select distinct b1.customer_id,b1.customer_name, year from billing b1 cross join (select 2019 year union select 2020 union select 2021) k) select cte.customer_id, cte.customer_name,concat(round(avg(coalesce(b.billed_amount,0)),2),'$') av_billing_amount from cte left join billing b on b.customer_id=cte.customer_id and year(b.billing_creation_date)=cte.year group by cte.customer_id,cte.customer_name;
Why everyone is over complicating this? Check this simple solution : It could be done with where clause to filter years instead of writing all these rows for each year, also the divisor amount could considered when we turn null values into 0: Select customer_id, customer_name, Avg(case when billing is null then 0 else billing) as billing, From table Where Extract ( year from date_column ) in (“2019”, “2020”) Group by customer_id, customer_name I wrote this from my phone so iam sorry if it is not clear, i use this code in plSql could be there are some differences such as extract year but iam sure there is an equivalent in sql Thanks
Bhai aapke approach me wo year ko consider nhi kiya ja rha jiska record exist nhi kr rha table me between 19 and 21. Suppose if we consider the customer A then according to table we have two records for year 20 and one record for year 21 but no record for year 19. If there would have been a record for year 19 with billingamt as null then your query may work well. But for now its of no use. That's what makes this question tricky and lengthy. According to your query the first output record would be: 1 A 116.666 but the expected output mentioned in ques is: 1 A 87.5
@@50_saifalikhan33hmmm i see that if we have no record of year 2019 we should add a record with 0 value. I understand now my answer could be not correct
Great video! Thank you for the insightful explanation. I applied a Nested CASE WHEN approach to solve this problem, and it produced the same result as shown in the video. Here's the query I used: SELECT customer_id, customer_name, (SUM(CASE WHEN strftime('%Y', billing_creation_date) IN ('2019','2020','2021') THEN billing_amount END) / (CASE WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2019' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2019' THEN 1 ELSE 0 END) END + CASE WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2020' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2020' THEN 1 ELSE 0 END) END + CASE WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2021' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2021' THEN 1 ELSE 0 END) END )) AS avg_billing_amount FROM billing GROUP BY customer_id, customer_name;
Great Explanation ! but if date range increases in will be difficult using case . Here I have dynamic Query WITH integer_sequence(n) AS ( SELECT 2019 -- starting value UNION ALL SELECT n+1 FROM integer_sequence WHERE n < 2021 -- ending value )
Select customer_id,customer_name,AVG(Amount) from ( Select A.n as bill_Year,A.customer_id ,A.customer_name,ISNULL(Amount,0) AS Amount from ( SELECT * FROM integer_sequence A cross join (Select distinct customer_id,customer_name from Test_SQL)B ) A left outer join Test_SQL B on A.n=DATEPART(YEAR,B.ddate) and A.customer_id=B.customer_id
I always learn so much from you. You have a way of explaining things to so that it makes sense. Great job. As a person who is read only access and cannot create tables, these CTE statements are always so helpful!
An alternative solution to this query without using case statement with cte as (select customer_name,EXTRACT(Year from billing_creation_date) as billing_year,billed_amount from billing where EXTRACT(Year from billing_creation_date)>=2019), table1 as (select customer_name,billing_year,billed_amount,(select count(distinct billing_year) as cnt2 from cte t1 where t1.customer_name=t2.customer_name group by customer_name ),count(*) over(partition by customer_name) as cnt1 from cte t2 order by customer_name,billing_year), table2 as (select customer_name,billing_year,billed_amount,(cnt1+(3-cnt2)) as cnt3 from table1), table3 as (select distinct customer_name,sum(billed_amount) over(partition by customer_name) as s1,cnt3 from table2) (select customer_name,round(s1::decimal/cnt3,2) as avg_val from table3)
Hi Toufiq , good problem statement and video. My Soluition for the problemt . %sql WITH CTE AS( select distinct customer_id,customer_name,Year from Billing join( select '2019' as Year UNION select '2020' as Year UNION select '2021' as Year ) Y ) select C.customer_id,C.customer_name, Round(AVG(coalesce(B.billed_amount,0)),2) as avg_billing_amount from CTE C left join billing B ON C.customer_id=B.customer_id AND C.customer_name=B.customer_name AND C.Year=EXTRACT(year from B.billing_creation_date) group by C.customer_id,C.customer_name order by 1,2
I was able to solve before watching solution. Used a different method using row_number() to count additional payments in one year. Assumed every groups sum should be divided by 3 plus any additional payments. Basically just get the sum of all payments for group and then decide what to divide by. -- my solution select customer_id,customer_name, concat('$',cast(total_sales / cast(3 + additional_payments as decimal(5,2)) as decimal(5,2))) as avg_billing_amount from ( select distinct customer_id, customer_name, total_sales, max(additional_payments) over (partition by customer_name) - 1 as additional_payments from ( select distinct customer_id,customer_name, sum(billed_amount) over (partition by customer_name ) as total_sales, max(num_billing) over (partition by customer_name, Y order by y) as additional_payments from ( select customer_id,customer_name, billed_amount, y, ROW_NUMBER() over (partition by customer_name, Y order by y) as num_billing from ( select *, year(billing_creation_date) as y from billing where billing_creation_date > '2019-01-01') X) z) q) n
The answer and approach is nice and this is good solution, I admire that. However, had I been interviewer then would have asked now can you write the same for between year 1990 - 2022? Would you write case statement for those 32 years? The approach should have been that the query will work for dynamic input..... PS : I am also trying to think of a dynamic solution.
For average total: select id, name, sum(amount)/sum(counter) as avg_billed from ( select customer_id as id, customer_name as name, billing_amount as amount, if amount>0 then 1 else 0 end if as counter where billing_creation_date between '1990-01-01' and '2021-12-31' ) A group by id,name; For average per year goup by column year(billing_creation_date) as year.
very bad solution, outer join on sub select years (can be replace by args values) and groupe by average. More more more simple. Can be also write by using "with as ..." for more clarity
Create a sequence of years in a cte based on year start and end. Create another cte that includes year as a column calculated from the original table. Right outer join 1st cte on to 2nd cte on year. Create a subsequent cte that groups on year and other columns of interest, aggregating measures of interest. In this case, avg(billingAmount). Boom, now you can aggregate over any year range including years with no billing.
Good narration dear. I tried this approach and it seems to be fine. select distinct year(#billing_creation_date) as yr into #yearmaster from #billing where year(#billing_creation_date) between 2019 and 2021; select distinct customer_name as customer_name into #custmaster from #billing where year(#billing_creation_date) between 2019 and 2021; select * into #masterrecord from #yearmaster ym cross join #custmaster cm order by ym.yr,cm.customer_name select ym.customer_name,AVG(convert(money,isnull(bl.billed_amount,0))) as average from #masterrecord ym left join #billing bl on (year(#billing_creation_date) = ym.yr AND ym.customer_name = bl.customer_name) group by ym.customer_name order by ym.customer_name
I create rows not columns. Not as sophisticated as some below but here is my version with a cross join and each step in a subquery. Yeah too long, but fun :) with bill_qry as ( select customer_id, customer_name, year(billing_creation_date) as bill_year, convert( decimal (10,2) , sum(billing_amount) ) as bill_sum , convert( decimal (10,2) , count(billing_id) ) as bill_cnt from billing where year(billing_creation_date) > 2018 -- should be a prompt group by customer_id, customer_name, year(billing_creation_date) ), d_year as ( select min(bill_year) as min_year, max(bill_year) as max_year -- one row from bill_qry ), y_range as ( select n = 0, (min_year) as t_year --year 1 from d_year union all select n+1, (min_year) as t_year --year 1 from y_range, --recursive query d_year -- one row where n < max_year - min_year -- can't be over 100 rows ), year_table as ( select t_year + n as t_year from y_range --use this table to cross join the data table ), bill_qry2 as ( select t_year , customer_id, customer_name ,case when t_year = bill_year then bill_sum else 0 end as bill_sum2 --create rows instead of columns ,case when t_year = bill_year then bill_cnt else 0 end as bill_cnt2 --create rows instead of columns from year_table cross join bill_qry ), bill_qry3 as ( select t_year, customer_id, customer_name, sum(bill_sum2) as bill_sum3, --years with no entry reduced to one row, zero amount, zero instance sum(bill_cnt2) as bill_cnt3 --years with no entry reduced to one row, zero amount, zero instance from bill_qry2 group by t_year, customer_id, customer_name ), bill_qry4 as ( select customer_id, customer_name, bill_sum3 as bill_sum4, case when bill_cnt3 = 0 then 1 else bill_cnt3 end as bill_cnt4 from bill_qry3 ) --final query select customer_id, customer_name, convert ( decimal (10,2), ( sum(bill_sum4) / sum(bill_cnt4) ) ) as avg_pay_final from bill_qry4 group by customer_id, customer_name
with prasad as ( select *, extract (year from billing_creation_date) as years from billing where billing_creation_date between '2019-01-01' and '2021-12-31' ) select customer_id, sum(billed_amount)/ (case when count(distinct years)=1 then count(years)+2 when count(distinct years)=2 then count(years)+1 else count(years) end) as billed_amount from prasad group by 1;
For the first condition A they have mentioned billing amount should be taken for a year then the total billing amount for 2020 would be 100+150$=250$ and then we have to take the average which would be 0+250+100=350/3=116.66 rather than dividing it by 4. I think this is also a typo error.
Yes, that is clearly an error. The average is per year therefore the division must be done always by 3 years, not four. Unless the average is per billing event, then we need to ignore the cases with zero.
I think Below Query Is more Dynamic with smaller output and less character But I was Learn From Your video. declare @f_y int = 2019,@l_y int = 2021 ;with ctc as( select @f_y as y union all select y+1 as y from ctc where y
I create rows not columns. 😃 I used SQLLITE drop table billing; create table billing ( customer_id int , customer_name varchar(1) , billing_id varchar(5) , billing_creation_date DATE , billed_amount int ); insert into billing values (1, 'A', 'id1', DATE('2020-10-10'), 100); insert into billing values (1, 'A', 'id2', DATE('2020-11-11'), 150); insert into billing values (1, 'A', 'id3', DATE('2021-11-12'), 100); insert into billing values (2, 'B', 'id4', DATE('2019-11-10'), 150); insert into billing values (2, 'B', 'id5', DATE('2020-11-11'), 200); insert into billing values (2, 'B', 'id6', DATE('2021-11-12'), 250); insert into billing values (3, 'C', 'id7', DATE('2018-01-01'), 100); insert into billing values (3, 'C', 'id8', DATE('2019-01-05'), 250); insert into billing values (3, 'C', 'id9', DATE('2021-01-06'), 300); select * from billing; -- Soltion ❤ with billingTable as ( select customer_id, customer_name, strftime('%Y',billing_creation_date) as year, billed_amount from billing where year in ('2019', '2020', '2021') ), missingYears as ( select customer_id, customer_name, 3 - count(distinct year) as missingYears from billingTable group by customer_id, customer_name ), sumBilling as ( select customer_id, customer_name, sum(billed_amount) as amount_sum, count(year) as countYears from billingTable group by customer_id, customer_name ), result as ( select missingYears.customer_id, missingYears.customer_name, sumBilling.amount_sum, (sumBilling.countYears + missingYears.missingYears) as countYears from missingYears inner join sumBilling on missingYears.customer_id = sumBilling.customer_id and missingYears.customer_name = sumBilling.customer_name ) select customer_id, customer_name, ROUND(amount_sum*1.0 / countYears, 2)||'$' as avg_billed_amount from result
this works for me: SELECT customer_id, customer_name, sum(billing_amount) AS total_billing_amt, count(billing_id) as billing_cnt, 3-count(distinct year) as missing_billing_cnt, sum(billing_amount)/(count(billing_id)+3-count(distinct year)) as avg
FROM (select * from df where year >= 2019 and year
Why do we divide by 4 for 'A'? In the data for 'A' we see only 3 bill_id (2 in 2020 and 1 in 2021). For 2019 we set bill_amount as 0 but there is no bill_id. How it could be 4th bill_id if there was nothing to sell?
It is wrong. The average for customer A and B are incorrect in the problem. I feel that this problem was never used in an interview, or if it was it was done very poorly.
We can create a cte with their customer name and number of time it comes in between 2019 to 2021 and then we can find the sum of the values for the years between 2019 and 2021 and then divide it by the count values which we get from cte and group by the customer names
How about this? select customer_id, customer_name, concat(round((sum(billing_amount)/(count(distinct billing_id) - count(distinct year(billing_creation_date)) + 3)),2),"$") as avg_bill_amt from data where bill_date between '2019-01-01' and '2021-12-31' group by 1,2;
Simplified solution in mssql select customer_id, sum(billed_amount)/ (case when count(distinct year(billing_creation_date))=1 then count(year(billing_creation_date))+2 when count(distinct year(billing_creation_date))=2 then count(year(billing_creation_date))+1 else count(year(billing_creation_date)) end) as billed_amount from billing group by customer_id;
I would resolve it like that: with all_year as (select 2019 as year from dual union select 2020 as year from dual union select 2021 as year from dual), all_ids_sal as( select distinct id, b.year, case when b.year in (select year from billing c where a.id=c.id) then (select salary from billing d where d.id=a.id and d.year=b.year) Else 0 end as salary from billing a, all_year b) select id, round(avg(salary)) from all_ids_sal group by id
Following is my dynamic solution to the problem: With c_tot as { Select customer_id ,customer_name ,SUM(billing_amount) over( partition by customer_id) as sm ,COUNT(billing_creation_date) over (partition by customer_id) as nbr ,COUNT(DISTINCT YEAR(billing_creation_date)) over ( partition by customer_id) as discnbr From Where YEAR(billing_creation_date) ≥2019 and YEAR(billing_creation_date) ≤2021 } Select customer_id ,customer_name. sm/(nbr+(3-discnbr)) as average_billing From c_tot P.s: (end_year - start_year)+1 =3
Interesting: classic case of overfitting with the test sample. The averages should be calculated by 3(considering 3 years), unless this is a made up scenario where they are looking for per year average and the denominator is the count of bill amounts
Sir actually the output for B is 150 as given in question and not 200 as you said. In 2021 there is no transaction for B so it will be considered as 0. Final avg will be 600/4 = 150
I see there is a mistake in the output..What toufiq said is right. We have 2021 data of $250 for Customer B . We have data for 3 years. Hence should be divided by 3. However, what I feel is for Customer A we have data for 2 years ideally and 2019 data isn't available. Which I ideally feel should be averaged for 3 years rather 4 years by clubbing 2020 data. So the result should be 116.67 . What are your thoughts?
@@arjundev4908 Here, we need to find 'avg billing amount' for each transaction, not for each year...so it has to be divided by total number of transactions ... i hope you got it
@@arjundev4908 Actually your thinking is also correct but what Toufiq has done is also correct. In the interviewer's question it's not clearly mention that if we have to do the average by year or overall average..
I find the solution you provided to be quite complex, inefficient and a bit too specific. Here's my solution: Table Definition: Billing ( cid int, cname varchar(10), bid varchar(10) primary key, bdate date, bamount float ) Query: SELECT b.cid, b.cname, SUM(b.bamount) / (COUNT(*) + 3 - COUNT(DISTINCT(YEAR(bdate)) AS average FROM billing b WHERE YEAR(b.bdate) BETWEEN 2019 AND 2021 GROUP BY b.cid; Explanation: Instead of having so many cases, we can add together the total no of billings per customer + the no of years that don't have billings (3 - COUNT(DISTINCT(YEAR(bdate)).Moreover, before grouping by customer, we filter using "WHERE YEAR(b.bdate) BETWEEN 2019 AND 2021". Hope it's useful! :))
We can use CTE to create a temp table carrying distinct years from the input data and then join it back with the data again to compute the sum n count values without running case for each year !!
Good solution as per the problem statement. But while calculating Average billing amount for three years, we should divide the total billing amount by 3 rather than by number of bills being considered. May be the problem statement example is not accurate.
my approach: with c as (select * from (select *,row_number() over(partition by y,customer_id order by customer_id) rn from (select *, year(billing_creation_date) y from billing) t) t1 where rn=1 and y >= 2019 and y
select customer_id,round(sum(billing_amount)/3) as avg_bills from customer_demo where customer_id = 1 or customer_id = 2 or customer_id = 3 group by customer_id; Is this querry correct?
A lot of code. You can do select customer_name, customer_id, sum(billed_amount)/3 where year(billing_creation_date) between 2019 and 2021 group by customer_name, customer_id
This won't work. If there are multiple transactions in a year. Avg() won't work as years with no data should be treated as 0 with a count of 1. This code is good as it is.
Hi Thoufiq - Great explanation. I was thinking, what if the problem statement and data set was expanded to 10 or 20 years and how would that impact the solution.
I think this solution works nicely for a small range like the one he shown, but if we were to expand it, I'd rather use nested selects in order for it to be scalable
Hi @techTFQ. In the above video,I have a doubt. Here the date range is only from 2019 to 2021. Suppose we have a data range from 2009 to 2020. I would like to know if this solution works if we have the above range? Please let me know.
WHERE CAST(STRFTIME('%Y", billing_creation_date) AS INT) BETWEEN (2009 AND 2020) . . . . In this case get the year by using strftime we will get for example: "2010" in string so change the string into integer by using CAST function then we will get 2010 in INTEGER then use BETWEEN operator (2009 AND 2020) it will give in between years like 2009, 2010, .....2020)
@techTFQ Please Help!!! at 15:11 it is when i execute this query with cte as it is giving me error ERROR: syntax error at end of input LINE 22: group by customer_id, customer_name) I ma stuck here i have writtten exactly same code
I would be asking the interviewer if they wanted the average transaction billing for a customer for all transactions between 2019 to 2021 or if they wanted the average for each customer for each year (2019,2020,2021). Throwing in a transaction for 0 just because the customer did not have any other transactions that year makes no sense.
Either the author of the post copied the problem's details wrongly or the problem was wrongly stated in the first place. In any case what average is wanted by the interviewer should be more clearly stated. If only the given info were available I'd assume that you should make an average over the single year for a single customer ( E_Y = Sum[transactions_yearY]/#transactions_yearY] ) and then average over the years ( E = Sum[ E_Y1, E_Y2, ..., E_YN]/N]. With this formula customer A in the table would have an average of 75, not 87.5.
hello Thoufiq, assume you're working on a huge data, is it gonna be the perfect solution to provide? second why didn't you use the avg function with a where clause testing on the extracted year from billing_ date which should be between 2019 and 2021 ? I mean this way : with billing_2019 as ( select customer_id,customer_name, extract(year from billing_creation_date) as billing_year, billed_amount as billing_amount from p_work.billing) select customer_id,customer_name,round(avg(billing_amount),2) as billed_amount from billing_2019 where billing_year between 2019 and 2021 group by 1,2 order by 1 asc ;
Did you try running the query? That doesn't meet the answer/criteria. You need to account for years customers were not billed, which changes the averages. Yes we are trying to find the average billing amount, but you need to include 0 for the years the customer wasn't billed which is broken down in this video.
select customer_name, sum(billing_amount)/(count(*) - count(distinct year(billing_creation_date)) +3) AS AVG_Billed_amount from table1 where year(billing_creation_date) BETWEEN '2019' and '2021' group by customer_name;
Hi Thoufiq! Great way to explain. Thank you. Please let me know if you can help with solving any particular interview query that is related to Day 1 retention rate for users. Really appreciate it.
This is a generic one... you can pass the year range as parameters. with y as ( select yr from generate_series(2019, 2021, 1) yr ), c as ( select distinct(cust_id) from bill ) select x.cust_id, sum(coalesce(y.sum_amt, 0)), avg(coalesce(y.sum_amt, 0)) from ( select yr, cust_id from c cross join y ) x left join ( select cust_id, date_part('year', bill_date) as billyr, sum(bill_amt) as sum_amt from bill group by cust_id, date_part('year', bill_date) ) y on x.cust_id = y.cust_id and x.yr = y.billyr group by x.cust_id order by x.cust_id;
Please let me know this will or not with cte as (select *,year(billing_creation_date) as yr from billing where year(billing_creation_date) in (2019,2020,2021)), cte2 as (select customer_id,customer_name,sum(billed_amount) as total_amount ,case when count(distinct yr)=1 then count(*)+2 when count(distinct yr) = 2 then count(*)+1 when count(distinct yr)=3 then count(*) end as all_yr from cte group by customer_id,customer_name) select customer_id,customer_name, round(total_amount/all_yr,0) as average_bill_amount from cte2;
Fun challenge! My take on it: WITH cte AS ( SELECT DISTINCT YEAR(b1.billing_creation_date) AS 'year' , b2.customer_id , b2.customer_name , COALESCE(b3.billed_amount,0) AS billing_amount FROM billing b1 LEFT JOIN billing b2 ON 1=1 LEFT JOIN billing b3 ON b2.customer_id = b3.customer_id AND YEAR(b1.billing_creation_date) = YEAR(b3.billing_creation_date) WHERE YEAR(b1.billing_creation_date) IN (2019,2020,2021) ) SELECT customer_id , customer_name , AVG(billing_amount) AS avg_billing_amount FROM cte GROUP BY 1,2 ORDER BY 1,2
My Query with bill_summary as (select customer_id, customer_name, sum(billed_amount) as total_bill, count(customer_name) as bill_count, count(distinct(extract(year from billing_creation_date))) as distinct_years from billing where extract(year from billing_creation_date)>='2019' group by customer_id, customer_name) select customer_id, customer_name, round(total_bill/(bill_count+3-distinct_years),2) as average from bill_summary
hi I need one more example of SQL YOY contribution% and rolling average if you have year from 2017-2022...and random sales numbers... Pls can u make one more video as your explanation is quite simple to understand This was a interview question
Easy. You might need to add if else statements in case count(value) = 0…. Any ways you can do select year(date_column) as “year”, sum(value_column)/count(value_column) as “Average Sale” group by year(date_column) order by year(date_column)
easier solution- select customername,customerid, case when max(dr) = 3 then round(sum(billing_amount)/count(customerid),2) when max(dr) = 2 then round(sum(billing_Amount)/(count(customerid)+1),2) when max(dr) = 1 then round(sum(billing_Amount)/(count(customerid)+2),2) end as dlsf from(select *,dense_rank() over(partition by customerid,customername order by billing_cd) as dr from averagee where year(billing_cd) in('2019','2020','2021')) group by customerid,customername
Cant we use avg function directly? And also we can give filter like Year(creation date) between 2019 and 2021 in the where clause before group by clause.
No, that will not give the correct denominator in the average for example for A this give denominator as 3 instead of 4 as we have only 3 entries for A and do not have 2019 entry.
Very complex approach in the video. Here is the most simplest approach (MySQL) : -
SELECT
customer_id,
customer_name,
ROUND(SUM(billed_amount) / (3 - COUNT(DISTINCT YEAR(billing_creation_date)) + COUNT(*)),1) AS avg_billed_amount
FROM billing
WHERE YEAR(billing_creation_date) BETWEEN 2019 and 2021
GROUP BY 1, 2
Im confused by the second part of the AVG_billed_amount ( after the / ), wouldn't a simple AVG statement work since it's grouped ?
@shubhamagrawal7068 I have written the below code for this but not getting the desired output. Can you help me find the mistake please.
select customer_id,customer_name,sum(billing_amount)/(count(*)+3-count(distinct(year(x.billing_year))))
from(
select *,year(billing_creation_date) as billing_year
from billing
where year(billing_creation_date) between 2019 and 2021) x
group by customer_id,customer_name;
this is a banger answer
Perfect!
@@amadei2 No, it won't. Try to grasp the concept behind it.
Suppose you have given 3 years to evaluate for, as in our case - 2019 to 2021.
As for records- consider 1st id , it has 3 records, for 2020 twice and 2021.
No. of years to evaluate for (2019 to 2021) = 3
No. of years (in 1st record )= 3 (i.e 2020, 2020, 2021)
No. of distinct years(in 1st record) = 2 (i.e 2020, 2021)
so the formula is -
{ No. of years (to evaluate for) - No.of distinct years } + No. of years
= { 3 - Count(Distinct(years)) } + Count *
= {3-2} + 3 = 4
for 1st record i.e id = 1, name =A
Sum = 350 Count = 4
Avg = 350/4 = 87.5 .
Hoping, now it is cleared.
A little browsing, we can use recursive cte to generate date rows between start and end,
And we just do right/left join from main table and the generated rows with year(tbl1.date) and year(tbl2.date)
From that join, we will get result null if there's no trx for that user in that year
Last, we just group by year(date) and customer_id, and select sum(trx amount)/count(trx amount)
Nice question and a great explanation. Thanks a lot.
I also tried my own and used string functions to come up with the solution.
with cte1 as
(
select
customer_id, customer_name, count(*) as ct , sum(billed_amount) as amount,
string_agg(cast(extract (year from billing_creation_date) as varchar), ',' ) as all_year
from billing
where extract(year from billing_creation_date) in (2019,2020,2021)
group by 1,2
)
,
cte2 as
(
select
*,
case
when all_year not like '%2019%' then 1 else 0
end as is_2019,
case
when all_year not like '%2020%' then 1 else 0
end as is_2020,
case
when all_year not like '%2021%' then 1 else 0
end as is_2021
from cte1
)
select
customer_id, customer_name, round(amount*1.0/(ct+is_2019 + is_2020+is_2021),2)as avg
from cte2
well written query, quite simple and well concise query.
@@KavishSrivastava thanks for the comment , I got a chance to revisist this question again, but now when i solved it again,i used different approach using the joins. sharing the recent solution below:
with cte_year as
(
SELECT 2019 AS year
UNION ALL
SELECT 2020
UNION ALL
SELECT 2021
)
,
cte_customer as
(
select
distinct customer_id
from billing
where year(billing_creation_date) in(2019,2020,2021)
)
,
cte_customer_with_year as
(
select * from cte_customer cross join cte_year
)
select
cte_customer_with_year.customer_id , round(sum(billed_amount)/count(*),2) as average_billing_amount
from cte_customer_with_year left join billing on
cte_customer_with_year.customer_id = billing.customer_id and cte_customer_with_year.year = year(billing.billing_creation_date )
group by cte_customer_with_year.customer_id
@@muditmishra9908 again a good approach which filters extra steps compare to your previous one, but the current query's output is incomplete which is missing customer_name in the result, which will eventually requires some changes in the code.
here a try I gave :
with
c as ( SELECT f.customer_id, f.customer_name ,year, billed_amount
FROM (SELECT DISTINCT customer_id, customer_name FROM billing) f
CROSS JOIN (SELECT 2019 AS year UNION SELECT 2020 AS year UNION SELECT 2021 AS year) y
LEFT JOIN billing b ON b.customer_id = f.customer_id AND YEAR(b.billing_creation_date) = y.year
ORDER BY customer_id, year, billed_amount)
SELECT customer_id, customer_name , ROUND(AVG(COALESCE(billed_amount, 0)),2) AS avg_billed_amount
from c GROUP BY customer_id,customer_name ;
Great video explanation and walk through. At the very end, I recommend doing a final step by step summary of the entire formula just to reiterate exactly what is going on and then pausing. Thanks for everything that you do!
good suggestion Sean, let me consider it for future
Yet another gem from you Sir. Hats off man. Maybe just comment each step directly in the query so that the walk-through makes even more sense. Even though it is already pretty clear. Thanks again for everything you’re doing for the Data Community.
This is priceless
PS: have you ever considered putting together a comprehensive series of tutorials for UDEMY based on proficiency levels? SQL and Pyhton for Data Analysis. I’d buy them for sure if you did 😊
Hi Taufiq,
First of all thank you for your service.
You are doing a wonderful job for citizen data analyst like me.
Hence I am sharing my code for review. SQL flavor - PostgreSQL.
with year as (
Select
extract(year from billing_creation_date) as year
from
billing
where
extract(year from billing_creation_date) is between '2019' and '2021'
)
Select
b.customer_id as customer_id,
b.customer_name as cust_name,
avg(coalesce(b.billing_amount,0)) as avg_bill_amt
from year y left join billing b
on y.year = extract(year from b.billing_creation_date)
group by y.year
Nice!! I’ve never seen someone explain SQL problems for interviews and gotta admit I love the format and the way you explain it. Insta-subscribed🎉.
I’ve been practicing in codewars but I always feel like they’re either too simple tasks asking for a JOIN or something I just have never thought before.
This really helped me improving my problem solving skills and the way I tackle SQL problems
Superb explanation...even a guy who doesnt even hear about SQL can understand from your video....great
Glad it helped bro
Thank to your lessons I was able to solve this query with joins in MySQL
with cte as (select distinct b1.customer_id,b1.customer_name, year from billing b1
cross join (select 2019 year union select 2020 union select 2021) k)
select cte.customer_id, cte.customer_name,concat(round(avg(coalesce(b.billed_amount,0)),2),'$') av_billing_amount
from cte left join billing b
on b.customer_id=cte.customer_id and year(b.billing_creation_date)=cte.year
group by cte.customer_id,cte.customer_name;
Why everyone is over complicating this?
Check this simple solution :
It could be done with where clause to filter years instead of writing all these rows for each year, also the divisor amount could considered when we turn null values into 0:
Select
customer_id,
customer_name,
Avg(case when billing is null then 0 else billing) as billing,
From table
Where Extract ( year from date_column ) in (“2019”, “2020”)
Group by customer_id, customer_name
I wrote this from my phone so iam sorry if it is not clear, i use this code in plSql could be there are some differences such as extract year but iam sure there is an equivalent in sql
Thanks
Bhai aapke approach me wo year ko consider nhi kiya ja rha jiska record exist nhi kr rha table me between 19 and 21.
Suppose if we consider the customer A then according to table we have two records for year 20 and one record for year 21 but no record for year 19. If there would have been a record for year 19 with billingamt as null then your query may work well. But for now its of no use.
That's what makes this question tricky and lengthy.
According to your query the first output record would be:
1 A 116.666
but the expected output mentioned in ques is:
1 A 87.5
@@50_saifalikhan33hmmm i see that if we have no record of year 2019 we should add a record with 0 value. I understand now my answer could be not correct
Great video! Thank you for the insightful explanation.
I applied a Nested CASE WHEN approach to solve this problem, and it produced the same result as shown in the video. Here's the query I used:
SELECT
customer_id,
customer_name,
(SUM(CASE WHEN strftime('%Y', billing_creation_date) IN ('2019','2020','2021') THEN billing_amount END) /
(CASE
WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2019' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2019' THEN 1 ELSE 0 END) END +
CASE
WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2020' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2020' THEN 1 ELSE 0 END) END +
CASE
WHEN SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2021' THEN 1 ELSE 0 END) = 0 THEN 1 ELSE SUM(CASE WHEN strftime('%Y', billing_creation_date) = '2021' THEN 1 ELSE 0 END) END
)) AS avg_billing_amount
FROM billing
GROUP BY customer_id, customer_name;
That was wild. A bunch of lights came on as I have been studying simple SQL terminology and basic query s cool thx
The way u explain is simply awesome.. initially I thought it must be very complex but after ur explanation it looks simple…
Great Explanation ! but if date range increases in will be difficult using case . Here I have dynamic Query
WITH integer_sequence(n) AS (
SELECT 2019 -- starting value
UNION ALL
SELECT n+1 FROM integer_sequence WHERE n < 2021 -- ending value
)
Select customer_id,customer_name,AVG(Amount) from (
Select A.n as bill_Year,A.customer_id ,A.customer_name,ISNULL(Amount,0) AS Amount from (
SELECT * FROM integer_sequence A cross join (Select distinct customer_id,customer_name from Test_SQL)B
) A
left outer join Test_SQL B on A.n=DATEPART(YEAR,B.ddate) and A.customer_id=B.customer_id
) B
group by customer_id,customer_name
I always learn so much from you. You have a way of explaining things to so that it makes sense. Great job. As a person who is read only access and cannot create tables, these CTE statements are always so helpful!
Sir your the dictionary for my SQL practice, Thanks for the video..!!😀😀
Glad to be helpful bro
An alternative solution to this query without using case statement
with cte as
(select customer_name,EXTRACT(Year from billing_creation_date) as billing_year,billed_amount
from billing
where EXTRACT(Year from billing_creation_date)>=2019),
table1 as
(select customer_name,billing_year,billed_amount,(select count(distinct billing_year) as cnt2
from cte t1 where t1.customer_name=t2.customer_name group by customer_name
),count(*) over(partition by customer_name) as cnt1
from cte t2
order by customer_name,billing_year),
table2 as
(select customer_name,billing_year,billed_amount,(cnt1+(3-cnt2)) as cnt3
from table1),
table3 as
(select distinct customer_name,sum(billed_amount) over(partition by customer_name) as s1,cnt3
from table2)
(select customer_name,round(s1::decimal/cnt3,2) as avg_val
from table3)
Hi Toufiq , good problem statement and video.
My Soluition for the problemt .
%sql WITH CTE AS(
select distinct customer_id,customer_name,Year from Billing
join(
select '2019' as Year
UNION
select '2020' as Year
UNION
select '2021' as Year
) Y
)
select C.customer_id,C.customer_name,
Round(AVG(coalesce(B.billed_amount,0)),2) as avg_billing_amount from CTE C
left join
billing B ON
C.customer_id=B.customer_id
AND C.customer_name=B.customer_name
AND C.Year=EXTRACT(year from B.billing_creation_date)
group by C.customer_id,C.customer_name
order by 1,2
I learned something new .the way you explained it cleared my concept thank you so much .
I was able to solve before watching solution. Used a different method using row_number() to count additional payments in one year. Assumed every groups sum should be divided by 3 plus any additional payments. Basically just get the sum of all payments for group and then decide what to divide by.
-- my solution
select customer_id,customer_name,
concat('$',cast(total_sales / cast(3 + additional_payments as decimal(5,2)) as decimal(5,2))) as avg_billing_amount
from (
select distinct customer_id, customer_name, total_sales,
max(additional_payments) over (partition by customer_name) - 1 as additional_payments
from (
select distinct customer_id,customer_name,
sum(billed_amount) over (partition by customer_name ) as total_sales,
max(num_billing) over (partition by customer_name, Y order by y) as additional_payments
from (
select customer_id,customer_name, billed_amount, y,
ROW_NUMBER() over (partition by customer_name, Y order by y) as num_billing
from (
select *,
year(billing_creation_date) as y
from billing
where billing_creation_date > '2019-01-01') X) z) q) n
Wonderful explanation Toufiq 👌👌
Thank you for this! More concept understanding of every step you are taking and thinking
The answer and approach is nice and this is good solution, I admire that. However, had I been interviewer then would have asked now can you write the same for between year 1990 - 2022? Would you write case statement for those 32 years? The approach should have been that the query will work for dynamic input..... PS : I am also trying to think of a dynamic solution.
if u get the dynamic solution pls paste here, it will be helpful
For average total: select id, name, sum(amount)/sum(counter) as avg_billed from ( select customer_id as id, customer_name as name, billing_amount as amount, if amount>0 then 1 else 0 end if as counter where billing_creation_date between '1990-01-01' and '2021-12-31' ) A group by id,name;
For average per year goup by column year(billing_creation_date) as year.
For a dynamic solution you would use a date scaffold
very bad solution, outer join on sub select years (can be replace by args values) and groupe by average. More more more simple.
Can be also write by using "with as ..." for more clarity
Create a sequence of years in a cte based on year start and end. Create another cte that includes year as a column calculated from the original table. Right outer join 1st cte on to 2nd cte on year. Create a subsequent cte that groups on year and other columns of interest, aggregating measures of interest. In this case, avg(billingAmount). Boom, now you can aggregate over any year range including years with no billing.
Good narration dear. I tried this approach and it seems to be fine.
select distinct year(#billing_creation_date) as yr into #yearmaster from #billing where year(#billing_creation_date) between 2019 and 2021;
select distinct customer_name as customer_name into #custmaster from #billing where year(#billing_creation_date) between 2019 and 2021;
select * into #masterrecord from #yearmaster ym cross join #custmaster cm order by ym.yr,cm.customer_name
select ym.customer_name,AVG(convert(money,isnull(bl.billed_amount,0))) as average from #masterrecord ym
left join #billing bl on (year(#billing_creation_date) = ym.yr AND ym.customer_name = bl.customer_name)
group by ym.customer_name
order by ym.customer_name
man i love how do you explain, thanks from argentina genio!
Your welcome buddy ☺️
Big thanks bhai, you are doing amazing job
thanks for the constant support Ahmed ❤
Great one , enjoyed every bit of it , thanks for sharing this 🙂
glad to hear that
Great Vid !!!! I really enjoy your way of explaining complex things so easily !!! Keep going !!
Thank you Allan ☺️
Hi TFQ, thanks for sharing this problem and all your efforts!
such a wonderful video.. thanks for sharing.
Thank you thoufiq for the question.
Your welcome bro
I create rows not columns. Not as sophisticated as some below but here is my version with a cross join and each step in a subquery. Yeah too long, but fun :)
with
bill_qry as
(
select customer_id, customer_name,
year(billing_creation_date) as bill_year,
convert( decimal (10,2) , sum(billing_amount) ) as bill_sum ,
convert( decimal (10,2) , count(billing_id) ) as bill_cnt
from billing
where year(billing_creation_date) > 2018 -- should be a prompt
group by customer_id, customer_name, year(billing_creation_date)
),
d_year as
(
select min(bill_year) as min_year, max(bill_year) as max_year -- one row
from bill_qry
),
y_range as
(
select n = 0, (min_year) as t_year --year 1
from d_year
union all
select n+1, (min_year) as t_year --year 1
from y_range, --recursive query
d_year -- one row
where n < max_year - min_year -- can't be over 100 rows
),
year_table as
(
select t_year + n as t_year
from y_range --use this table to cross join the data table
),
bill_qry2 as
(
select t_year , customer_id, customer_name
,case when t_year = bill_year then bill_sum else 0 end as bill_sum2 --create rows instead of columns
,case when t_year = bill_year then bill_cnt else 0 end as bill_cnt2 --create rows instead of columns
from year_table cross join bill_qry
),
bill_qry3 as
(
select t_year, customer_id, customer_name,
sum(bill_sum2) as bill_sum3, --years with no entry reduced to one row, zero amount, zero instance
sum(bill_cnt2) as bill_cnt3 --years with no entry reduced to one row, zero amount, zero instance
from bill_qry2
group by t_year, customer_id, customer_name
),
bill_qry4 as
(
select
customer_id, customer_name,
bill_sum3 as bill_sum4,
case when bill_cnt3 = 0 then 1 else bill_cnt3 end as bill_cnt4
from bill_qry3
)
--final query
select customer_id, customer_name,
convert ( decimal (10,2), ( sum(bill_sum4) / sum(bill_cnt4) ) ) as avg_pay_final
from bill_qry4
group by customer_id, customer_name
Wonderful thoufique . It was really helpful
Super brooo...u rocked it maaaaaannnnn really u are SQL god
Nice video. Very clear explanation. Keep it up.
with prasad as
(
select
*,
extract (year from billing_creation_date) as years
from billing
where billing_creation_date between '2019-01-01' and '2021-12-31'
)
select
customer_id,
sum(billed_amount)/
(case
when count(distinct years)=1 then count(years)+2
when count(distinct years)=2 then count(years)+1
else count(years)
end) as billed_amount
from prasad
group by 1;
Great work. Lots of fun. 👍🏽
Superb explanation 👌 👏 👍
🙂Very Nice Explanation.
Thank you 🙏🏼
For the first condition A they have mentioned billing amount should be taken for a year then the total billing amount for 2020 would be 100+150$=250$ and then we have to take the average which would be 0+250+100=350/3=116.66 rather than dividing it by 4. I think this is also a typo error.
I think so too, it confusing me at first. What about it @techFTQ?
Yes, that is clearly an error. The average is per year therefore the division must be done always by 3 years, not four.
Unless the average is per billing event, then we need to ignore the cases with zero.
@@mikatu Wrong, learn english
Thank You, been looking through the comments to see if anyone else had the same query. So the amounts should be :
1 A 116.67
2 B 200.00
3 C 183.33
I think Below Query Is more Dynamic with smaller output and less character But I was Learn From Your video.
declare @f_y int = 2019,@l_y int = 2021
;with ctc as(
select @f_y as y
union all
select y+1 as y
from ctc
where y
Excellent explanation bro...
Hi bro, I learnt a lot after watching your videos, please make more videos
You nailed it.
Great explanation as usual, thanks for great walkthrough 💯
Good Explanation ❤
I create rows not columns. 😃
I used SQLLITE
drop table billing;
create table billing
(
customer_id int
, customer_name varchar(1)
, billing_id varchar(5)
, billing_creation_date DATE
, billed_amount int
);
insert into billing values (1, 'A', 'id1', DATE('2020-10-10'), 100);
insert into billing values (1, 'A', 'id2', DATE('2020-11-11'), 150);
insert into billing values (1, 'A', 'id3', DATE('2021-11-12'), 100);
insert into billing values (2, 'B', 'id4', DATE('2019-11-10'), 150);
insert into billing values (2, 'B', 'id5', DATE('2020-11-11'), 200);
insert into billing values (2, 'B', 'id6', DATE('2021-11-12'), 250);
insert into billing values (3, 'C', 'id7', DATE('2018-01-01'), 100);
insert into billing values (3, 'C', 'id8', DATE('2019-01-05'), 250);
insert into billing values (3, 'C', 'id9', DATE('2021-01-06'), 300);
select * from billing;
-- Soltion ❤
with billingTable as (
select customer_id, customer_name, strftime('%Y',billing_creation_date) as year, billed_amount from billing where year in ('2019', '2020', '2021')
), missingYears as (
select customer_id, customer_name, 3 - count(distinct year) as missingYears from billingTable group by customer_id, customer_name
), sumBilling as (
select customer_id, customer_name, sum(billed_amount) as amount_sum, count(year) as countYears from billingTable
group by customer_id, customer_name
), result as (
select missingYears.customer_id, missingYears.customer_name, sumBilling.amount_sum, (sumBilling.countYears + missingYears.missingYears) as countYears
from missingYears inner join sumBilling on missingYears.customer_id = sumBilling.customer_id and missingYears.customer_name = sumBilling.customer_name
)
select customer_id, customer_name, ROUND(amount_sum*1.0 / countYears, 2)||'$' as avg_billed_amount from result
Thanks for explanation ☺
your welcome
this works for me:
SELECT customer_id,
customer_name,
sum(billing_amount) AS total_billing_amt,
count(billing_id) as billing_cnt,
3-count(distinct year) as missing_billing_cnt,
sum(billing_amount)/(count(billing_id)+3-count(distinct year)) as avg
FROM (select *
from df
where year >= 2019 and year
Why do we divide by 4 for 'A'?
In the data for 'A' we see only 3 bill_id (2 in 2020 and 1 in 2021). For 2019 we set bill_amount as 0 but there is no bill_id.
How it could be 4th bill_id if there was nothing to sell?
It is wrong. The average for customer A and B are incorrect in the problem.
I feel that this problem was never used in an interview, or if it was it was done very poorly.
We can create a cte with their customer name and number of time it comes in between 2019 to 2021 and then we can find the sum of the values for the years between 2019 and 2021 and then divide it by the count values which we get from cte and group by the customer names
How about this?
select customer_id,
customer_name,
concat(round((sum(billing_amount)/(count(distinct billing_id) - count(distinct year(billing_creation_date)) + 3)),2),"$") as avg_bill_amt
from data
where bill_date between '2019-01-01' and '2021-12-31'
group by 1,2;
::decimal (only this part) isn't working in microsoft sql server..whats the other way of changing this avg_bill_amount into decimal?
Simplified solution in mssql
select
customer_id,
sum(billed_amount)/
(case
when count(distinct year(billing_creation_date))=1 then count(year(billing_creation_date))+2
when count(distinct year(billing_creation_date))=2 then count(year(billing_creation_date))+1
else count(year(billing_creation_date))
end) as billed_amount
from billing
group by customer_id;
Great practice
Keep it up 😃 I just love the way you explain minute things about sql ....thank you for not making me think that sql is complex 😇
Thank you 🙏🏼 glad you liked it
Please solve more interview questions like this
I would resolve it like that:
with all_year as
(select 2019 as year from dual
union
select 2020 as year from dual
union
select 2021 as year from dual),
all_ids_sal as(
select distinct id, b.year,
case when b.year in (select year from billing c where a.id=c.id) then (select salary from billing d where d.id=a.id and d.year=b.year) Else 0 end as salary
from billing a, all_year b)
select id, round(avg(salary)) from all_ids_sal group by id
can we use select *, avg(billing amount) from table name where date between 2019 and 2021 group by custmer name ;
Which one u r using for execution
Following is my dynamic solution to the problem:
With c_tot as {
Select
customer_id
,customer_name
,SUM(billing_amount) over( partition by customer_id) as sm
,COUNT(billing_creation_date) over (partition by customer_id) as nbr
,COUNT(DISTINCT YEAR(billing_creation_date)) over ( partition by customer_id) as discnbr
From
Where YEAR(billing_creation_date) ≥2019 and YEAR(billing_creation_date) ≤2021
}
Select
customer_id
,customer_name.
sm/(nbr+(3-discnbr)) as average_billing
From c_tot
P.s: (end_year - start_year)+1 =3
Interesting: classic case of overfitting with the test sample. The averages should be calculated by 3(considering 3 years), unless this is a made up scenario where they are looking for per year average and the denominator is the count of bill amounts
PLEASE PROVIDE THE TABLE CREATION AND DATA INSERTAION QUERIES WHEN U DO SUCH KIND OF VIDEOS.THAT WILL HELP SAVE TIME
I couldn't download the dataset just the SQL script available for download.
Sir actually the output for B is 150 as given in question and not 200 as you said. In 2021 there is no transaction for B so it will be considered as 0. Final avg will be 600/4 = 150
I see there is a mistake in the output..What toufiq said is right. We have 2021 data of $250 for Customer B . We have data for 3 years. Hence should be divided by 3. However, what I feel is for Customer A we have data for 2 years ideally and 2019 data isn't available. Which I ideally feel should be averaged for 3 years rather 4 years by clubbing 2020 data. So the result should be 116.67 . What are your thoughts?
@@arjundev4908 Here, we need to find 'avg billing amount' for each transaction, not for each year...so it has to be divided by total number of transactions ... i hope you got it
@@Siddharth_Matada wrong! otherwise you don't divide the years with zero since there was no transactions.... duh!
@@arjundev4908 Actually your thinking is also correct but what Toufiq has done is also correct. In the interviewer's question it's not clearly mention that if we have to do the average by year or overall average..
am I tripping or you blind
its clearly an error
is it not possible to setup a scan 2019 to 2021 instead of individual listings?
I find the solution you provided to be quite complex, inefficient and a bit too specific. Here's my solution:
Table Definition:
Billing (
cid int,
cname varchar(10),
bid varchar(10) primary key,
bdate date,
bamount float
)
Query:
SELECT b.cid, b.cname, SUM(b.bamount) / (COUNT(*) + 3 - COUNT(DISTINCT(YEAR(bdate)) AS average
FROM billing b
WHERE YEAR(b.bdate) BETWEEN 2019 AND 2021
GROUP BY b.cid;
Explanation:
Instead of having so many cases, we can add together the total no of billings per customer + the no of years that don't have billings (3 - COUNT(DISTINCT(YEAR(bdate)).Moreover, before grouping by customer, we filter using "WHERE YEAR(b.bdate) BETWEEN 2019 AND 2021".
Hope it's useful! :))
Thank you sir. How do we approach it if we have about 50 years worth of data and obviously we don't want to write CASE statement for each year.
we can solve it without case statement too. there is always multiple ways to solve a sql problem. just need to spend some time to think alternate ways
@@techTFQ That's True. Thank you for all your efforts in uploading these videos.
We can use CTE to create a temp table carrying distinct years from the input data and then join it back with the data again to compute the sum n count values without running case for each year !!
@@naash9137 With cross join? That worked for me. But if there are millions of customers not sure if cross join is efficient.
Can we do this without case statement by just using.."between date1 and date2" along with where condition, group by ??
Would a better, more dynamic solution be a date scaffold? Reduce the need for lots of case statements
Good solution as per the problem statement.
But while calculating Average billing amount for three years, we should divide the total billing amount by 3 rather than by number of bills being considered.
May be the problem statement example is not accurate.
my approach:
with c as
(select *
from
(select *,row_number() over(partition by y,customer_id order by customer_id) rn
from
(select *, year(billing_creation_date) y
from billing) t) t1
where rn=1 and y >= 2019 and y
select customer_id,round(sum(billing_amount)/3) as avg_bills
from customer_demo where
customer_id = 1 or customer_id = 2 or customer_id = 3 group by customer_id;
Is this querry correct?
A lot of code. You can do select customer_name, customer_id, sum(billed_amount)/3 where year(billing_creation_date) between 2019 and 2021 group by customer_name, customer_id
Exactly what I thought a lot of code for a simple solution. I think this is where understanding basic principles of math really help simplify code.
This is a smart and simpler way to do this. Everyone else is just out here writing unnecessarily compounded and lengthy codes.
Wouldn't work
We can also use avg() if using groupby
This won't work. If there are multiple transactions in a year.
Avg() won't work as years with no data should be treated as 0 with a count of 1. This code is good as it is.
Why we weren't change null to 1?,it can be show 1 instead of null value?
How many case statements we would write if we r asked to find average from 2001 to 2022 or even larger no of year....any other solution please ?
I think an easier solution is to aggregate the sum amount divided by the number of years and use a where clause to limit the billing year.
Hi Thoufiq - Great explanation. I was thinking, what if the problem statement and data set was expanded to 10 or 20 years and how would that impact the solution.
I feel the solution would work fine without much performance issue but still need to test it out with large data to be sure..
I think this solution works nicely for a small range like the one he shown, but if we were to expand it, I'd rather use nested selects in order for it to be scalable
More videos please🙂
noted, will try
very nice
Great explanation sir....👍
I have doubt
Can't we use aggregate function to sum_2019,20,21 to print average of them
Definitely. This does not require any if statements
Select customer_id, avg(Billing_amount) from table where date between 2019 and 2020
PS: PSUEDO Code
Hi @techTFQ.
In the above video,I have a doubt.
Here the date range is only from 2019 to 2021.
Suppose we have a data range from 2009 to 2020.
I would like to know if this solution works if we have the above range?
Please let me know.
WHERE CAST(STRFTIME('%Y", billing_creation_date) AS INT) BETWEEN (2009 AND 2020) .
.
.
.
In this case get the year by using strftime we will get for example: "2010" in string
so change the string into integer by using CAST function then we will get 2010 in INTEGER
then use BETWEEN operator (2009 AND 2020) it will give in between years like 2009, 2010, .....2020)
@@thisiskarthik4348 Thank you for the solution.
How are we getting count as 3 for all the cells
Which function will use in sql server instead of to_char ( there is no in-built function in sql server)
Check year() once
@techTFQ Please Help!!! at 15:11 it is when i execute this query with cte as it is giving me error
ERROR: syntax error at end of input
LINE 22: group by customer_id, customer_name)
I ma stuck here i have writtten exactly same code
I would be asking the interviewer if they wanted the average transaction billing for a customer for all transactions between 2019 to 2021 or if they wanted the average for each customer for each year (2019,2020,2021).
Throwing in a transaction for 0 just because the customer did not have any other transactions that year makes no sense.
Either the author of the post copied the problem's details wrongly or the problem was wrongly stated in the first place.
In any case what average is wanted by the interviewer should be more clearly stated.
If only the given info were available I'd assume that you should make an average over the single year for a single customer ( E_Y = Sum[transactions_yearY]/#transactions_yearY] ) and then average over the years ( E = Sum[ E_Y1, E_Y2, ..., E_YN]/N]. With this formula customer A in the table would have an average of 75, not 87.5.
Journalist here. I would do that with the Help of knime software, using joiner node and math formula node, also, a little bit or rule engine node.
amazing.
hello Thoufiq, assume you're working on a huge data, is it gonna be the perfect solution to provide? second why didn't you use the avg function with a where clause testing on the extracted year from billing_ date which should be between 2019 and 2021 ?
I mean this way :
with billing_2019 as (
select customer_id,customer_name, extract(year from billing_creation_date) as billing_year,
billed_amount as billing_amount
from p_work.billing)
select customer_id,customer_name,round(avg(billing_amount),2) as billed_amount from billing_2019
where billing_year between 2019 and 2021 group by 1,2 order by 1 asc ;
Did you try running the query? That doesn't meet the answer/criteria. You need to account for years customers were not billed, which changes the averages. Yes we are trying to find the average billing amount, but you need to include 0 for the years the customer wasn't billed which is broken down in this video.
select customer_name,
sum(billing_amount)/(count(*) - count(distinct year(billing_creation_date)) +3) AS AVG_Billed_amount
from table1
where year(billing_creation_date) BETWEEN '2019' and '2021'
group by customer_name;
does this works
Hi Thoufiq! Great way to explain. Thank you. Please let me know if you can help with solving any particular interview query that is related to Day 1 retention rate for users. Really appreciate it.
what if we are asked to cacluate last 20 years like this ? is there any recursive way of doing this ?
This is a generic one... you can pass the year range as parameters.
with y as
(
select yr from generate_series(2019, 2021, 1) yr
),
c as
(
select distinct(cust_id) from bill
)
select x.cust_id,
sum(coalesce(y.sum_amt, 0)),
avg(coalesce(y.sum_amt, 0))
from (
select yr, cust_id
from c cross join y
) x
left join
(
select cust_id,
date_part('year', bill_date) as billyr,
sum(bill_amt) as sum_amt
from bill
group by cust_id, date_part('year', bill_date)
) y
on x.cust_id = y.cust_id and x.yr = y.billyr
group by x.cust_id
order by x.cust_id;
Hi, I an interview I got a question, alternative commands for full outer join
Please let me know this will or not
with cte as (select *,year(billing_creation_date) as yr from billing where year(billing_creation_date) in (2019,2020,2021)),
cte2 as (select customer_id,customer_name,sum(billed_amount) as total_amount ,case when count(distinct yr)=1 then count(*)+2 when
count(distinct yr) = 2 then count(*)+1 when count(distinct yr)=3 then count(*) end as all_yr from cte group by customer_id,customer_name)
select customer_id,customer_name, round(total_amount/all_yr,0) as average_bill_amount from cte2;
Do we have playlist for this
Fun challenge! My take on it:
WITH cte AS (
SELECT
DISTINCT YEAR(b1.billing_creation_date) AS 'year'
, b2.customer_id
, b2.customer_name
, COALESCE(b3.billed_amount,0) AS billing_amount
FROM billing b1
LEFT JOIN billing b2
ON 1=1
LEFT JOIN billing b3
ON b2.customer_id = b3.customer_id
AND YEAR(b1.billing_creation_date) = YEAR(b3.billing_creation_date)
WHERE YEAR(b1.billing_creation_date) IN (2019,2020,2021)
)
SELECT
customer_id
, customer_name
, AVG(billing_amount) AS avg_billing_amount
FROM cte
GROUP BY 1,2
ORDER BY 1,2
wow an easy question
My Query
with bill_summary as
(select customer_id,
customer_name,
sum(billed_amount) as total_bill,
count(customer_name) as bill_count,
count(distinct(extract(year from billing_creation_date))) as distinct_years
from billing
where extract(year from billing_creation_date)>='2019'
group by customer_id,
customer_name)
select customer_id,
customer_name,
round(total_bill/(bill_count+3-distinct_years),2) as average
from bill_summary
hi I need one more example of SQL YOY contribution% and rolling average if you have year from 2017-2022...and random sales numbers... Pls can u make one more video as your explanation is quite simple to understand
This was a interview question
Easy. You might need to add if else statements in case count(value) = 0…. Any ways you can do select year(date_column) as “year”, sum(value_column)/count(value_column) as “Average Sale” group by year(date_column) order by year(date_column)
Anything like item_id that you want to add in… just add to group by and select statement.
easier solution-
select customername,customerid,
case when max(dr) = 3 then round(sum(billing_amount)/count(customerid),2)
when max(dr) = 2 then round(sum(billing_Amount)/(count(customerid)+1),2)
when max(dr) = 1 then round(sum(billing_Amount)/(count(customerid)+2),2)
end as dlsf
from(select *,dense_rank() over(partition by customerid,customername order by billing_cd) as dr from averagee where year(billing_cd) in('2019','2020','2021')) group by customerid,customername
Cant we use avg function directly? And also we can give filter like Year(creation date) between 2019 and 2021 in the where clause before group by clause.
No, that will not give the correct denominator in the average for example for A this give denominator as 3 instead of 4 as we have only 3 entries for A and do not have 2019 entry.