I am happy to say that I have learned new thing today. The code which you have explained we can write very easily compare than traditional approach. Very well explained. Thank you very much!
Hi Priyanshu I will try a different approach to explain it. Say you have 100 apples in a bucket and now you want to move all the apples to another bucket , one apple at a time. Also you want to add a serial number to apple on the basis of weight such that heaviest apple get number 1 and lightest one gets number 100. So you will first sort all the apples weight-wise and then will pick the heaviest one and mark it as 1 and put it in new bucket. This apple becomes current apple for now. Once you pick next apple for labelling it become current apple and the one already kept in new bucket is preceding one. So serial 1 is preceding, 2 is current & 3 is following. Similarly when Teradata scans a table to retrieve all the rows on the basis of "ORDER BY" you have mentioned in OLAP function , it picks first row and apply function to it then next row and so on. There are many use cases for this. Like if you want to compare daily sales amount with yesterday sales amount to see if today is profitable and by how much. Then you may compare current row with previous row and just subtract the sales amount. Example Date Sale_Amount 01-Jan 2000 02-Jan 3000 03-Jan 2500 So for 01-Jan there is no previous row so profit is NA for 02-Jan , you do current row(3000) - previous row (2000) and get 1000 Rs profit. for 03-Jan, you do current row (2500) - previous row (3000) and get -500 Rs as loss. Hope this helps.
Yes. Random selection is also because teradata distributes data across AMPs. So each amp will return the rows it has and then sorting and applying corresponding analytical function.
It depends on the distribution of data that you have: if there are semantic gaps ( sparsity ) -> you use rows. Else ( dense distribution ) -> you use Range ... For a better understanding, go check the difference between the logical and the physical grouping within the computation window
Hello, this is really an amazing video! All that row_number & cte concept has gone. I have 2 questions : 1. what if I want to fill the data of previous row in my current row and value is in string format not integer. 2. If I want to fill the current row from previous row data but I don't know that exactly row position where data is present (means not null) let's say for large dataset. In this case how to identify which previous row has value. Thank you so much for explaining this concept Sir.😊 Subscribed ❤
Thank you & from the 2 questions, I can clearly see that you have build next level understanding of SQL concepts. So Kudos to you. 1) When we say previous row or current row or next row, there is a default sort order tagged to it. Else how come we can determine which row is previous or next. So it will not matter if the column is string or numeric , as long as ORDER BY clause has proper columns as per requirement. Beware , 2000 > 30 but '2000' < '30' so use the order by column carefully with string values. Generally , we use numeric or date columns. 2) You can use unbounded preceding and use MAX/MIN function (depending on the sorting) to get the value of last NOT NULL row. Let me know if you need more input on this. I can create a quick video for you with explanation :)
@@ETLSQL Thanks for your quick reply! Yes, I got it and understand the logic for both the questions. You can make a video on this scenario : We have a row with 3 columns - cust_id, status, captured_dt. Example: Cust_id | Status | captured_dt 11 | Opt-In | 2021-01-01 11 | Opt-out | 2021-05-01 11 | Opt-In | 2022-03-01 Now I have to expand this for each month till may-2023 where I have data like this: Cust_id | Status | captured_dt 11 | Opt-In | 2021-01 11 | Opt-In | 2021-02 11 | Opt-In | 2021-03 11 | Opt-In | 2021-04 11 | Opt-out | 2021-05 11 | Opt-out | 2021-06 .... Like this till last month (may-2023). PS: avoid using cte function Sorry for long explaination! 😬 Just encounter this problem somewhere.
This is superb!! I have 2 Questions: 1. How is the performance of this query as compared to our general group by / join / window functions? (I have a data of about 2 billion rows) 2. Also, I have a use case which involves going back 7 days on the basis of a 'date' column (since 1 day can have random number of rows hence cannot use 7 PROCEEDINGS) and doing a sum(amount). Is there a way I can achieve it using this?
Hi Bhuvanesh, Glad you liked it. 1) Performance wise in most of the cases it should be better than combination of group by + join. But you may run both the SQL and check it. 2) In this case I think it is better you approach the problem by creating histogram of 7 days bucket. As the number of rows are dynamic, implementing it using straight forward windows function could be challenging here. Best, Nitin
I never had stopped to understand this topic but I had to use it last week and I was confused. Thanks for the clear explanation!
Happy to help. 😀
Thanks for this sir..
Glad you liked it 👍
I am happy to say that I have learned new thing today. The code which you have explained we can write very easily compare than traditional approach. Very well explained. Thank you very much!
Glad you liked it and it helped.
That was one close to perfect tutorial for me.
Please let me know where did I miss the perfect score.
Ha Ha
Glad you liked it and happy to help 😊
Thanks for explaining an advance topic in such an easy way
Happy to help.
Any other topic would you like me to cover in future videos ?
Thanks brother you are life saver. May god bless you.
Glad you liked it
Gone through many videos and articles but your explanation cleared all doubts
Glad it helped.
Very nice explanation,thanks
Glad you liked it 👍
Really amazing all in 1 video very crisp and covered all the combinations. Really Thanks!
Glad you liked it 👍
You make it easy 👍👍
Glad you liked it 👍
you are always telling my current row, how you deciding on current row and in which real time scenario, we use it
Hi Priyanshu
I will try a different approach to explain it.
Say you have 100 apples in a bucket and now you want to move all the apples to another bucket , one apple at a time. Also you want to add a serial number to apple on the basis of weight such that heaviest apple get number 1 and lightest one gets number 100. So you will first sort all the apples weight-wise and then will pick the heaviest one and mark it as 1 and put it in new bucket. This apple becomes current apple for now. Once you pick next apple for labelling it become current apple and the one already kept in new bucket is preceding one. So serial 1 is preceding, 2 is current & 3 is following.
Similarly when Teradata scans a table to retrieve all the rows on the basis of "ORDER BY" you have mentioned in OLAP function , it picks first row and apply function to it then next row and so on.
There are many use cases for this. Like if you want to compare daily sales amount with yesterday sales amount to see if today is profitable and by how much. Then you may compare current row with previous row and just subtract the sales amount.
Example
Date Sale_Amount
01-Jan 2000
02-Jan 3000
03-Jan 2500
So for 01-Jan there is no previous row so profit is NA
for 02-Jan , you do current row(3000) - previous row (2000) and get 1000 Rs profit.
for 03-Jan, you do current row (2500) - previous row (3000) and get -500 Rs as loss.
Hope this helps.
@@ETLSQL so its like random selection and then sorting method
thanks
Yes. Random selection is also because teradata distributes data across AMPs. So each amp will return the rows it has and then sorting and applying corresponding analytical function.
@@ETLSQL A good example. But be careful that Apple doesn't sue you ;-)
Thank you very much! Well explained!
Glad you liked it.
Better explanation than chatgpt
Ha Ha.
Very well explained, I have been looking for a video like this. Thank you !
Glad it was helpful!
Very well explained, thank you so much!
Glad it was helpful!
Great explanations, thanks so much
Glad you liked it.
awesome explanation. super cool.
Glad you liked it!
Excellent, Thank you.
Glad you liked it 👍
Great video, thank you!
Glad you liked it
what if you want the rows to be based on a date? instead of looking back 7 rows look back 7 days which could be 3 rows or it could be 15 rows?
Interesting question.
Let me know if you find the efficient solution for this.
This is really well explained.
Glad you liked it.
thank you
Happy to help ☺️
Great explanation, very useful 👍
Glad it was helpful!
Very useful 👍
Glad to hear that
Hey this was awesome! Great explanations, thanks so much
Glad it was helpful!
THANK YOU SO MUCH!
Happy to help
Thanks for this.
Happy to help.
Will you please explain the difference between range and rows?
It depends on the distribution of data that you have: if there are semantic gaps ( sparsity ) -> you use rows. Else ( dense distribution ) -> you use Range ... For a better understanding, go check the difference between the logical and the physical grouping within the computation window
I hope this is clear to you now
Hello, this is really an amazing video! All that row_number & cte concept has gone.
I have 2 questions :
1. what if I want to fill the data of previous row in my current row and value is in string format not integer.
2. If I want to fill the current row from previous row data but I don't know that exactly row position where data is present (means not null) let's say for large dataset. In this case how to identify which previous row has value.
Thank you so much for explaining this concept Sir.😊
Subscribed ❤
Thank you & from the 2 questions, I can clearly see that you have build next level understanding of SQL concepts. So Kudos to you.
1) When we say previous row or current row or next row, there is a default sort order tagged to it. Else how come we can determine which row is previous or next. So it will not matter if the column is string or numeric , as long as ORDER BY clause has proper columns as per requirement.
Beware , 2000 > 30 but '2000' < '30' so use the order by column carefully with string values.
Generally , we use numeric or date columns.
2) You can use unbounded preceding and use MAX/MIN function (depending on the sorting) to get the value of last NOT NULL row.
Let me know if you need more input on this. I can create a quick video for you with explanation :)
@@ETLSQL Thanks for your quick reply! Yes, I got it and understand the logic for both the questions.
You can make a video on this scenario :
We have a row with 3 columns - cust_id, status, captured_dt.
Example:
Cust_id | Status | captured_dt
11 | Opt-In | 2021-01-01
11 | Opt-out | 2021-05-01
11 | Opt-In | 2022-03-01
Now I have to expand this for each month till may-2023 where I have data like this:
Cust_id | Status | captured_dt
11 | Opt-In | 2021-01
11 | Opt-In | 2021-02
11 | Opt-In | 2021-03
11 | Opt-In | 2021-04
11 | Opt-out | 2021-05
11 | Opt-out | 2021-06
....
Like this till last month (may-2023).
PS: avoid using cte function
Sorry for long explaination! 😬 Just encounter this problem somewhere.
This is superb!!
I have 2 Questions:
1. How is the performance of this query as compared to our general group by / join / window functions? (I have a data of about 2 billion rows)
2. Also, I have a use case which involves going back 7 days on the basis of a 'date' column (since 1 day can have random number of rows hence cannot use 7 PROCEEDINGS) and doing a sum(amount). Is there a way I can achieve it using this?
Hi Bhuvanesh,
Glad you liked it.
1) Performance wise in most of the cases it should be better than combination of group by + join. But you may run both the SQL and check it.
2) In this case I think it is better you approach the problem by creating histogram of 7 days bucket. As the number of rows are dynamic, implementing it using straight forward windows function could be challenging here.
Best,
Nitin