Thank you so so much!!! I took notes, pictures and wrote down every single word you said and I really am starting to understand proc sql! Thank you so much for taking the time to do this!
Thank you! I do have a question though: at minute 47:05, I don't understand why were you able to access yearhired in a group by? Group by was supposed to be the third in the priority order, so yearhired should not be accessible? What am I missing? Thanks a lot for considering answering. I'm a newbie in SQL, so I'm often confused :)
See reply from the presenter below: I will break it down to 2 statements where Yearhired appears. First we see Yearhired on the SELECT. The SELECT is simply a request to SQL to list columns you wish to see and in what order. Here we specify 2 non summarized columns: country, year(hiredate) which is our YearHired column. And one summary column - count(*) as numemp. Next we see Yearhired on the GROUP BY as the 2nd column by which to group. This is of interest to us and will likely answer your question. In the GROUP BY we let SQL know that we wish to group by country and then yearhired. Thus, the SELECT in conjunction with the GROUP BY, gives us rolled up rows and summary counts for each combination of Country and yearhired ( thanks to our summarized column numemp on the SELECT).
Hi charu, Both where and having is been executed before select statement but why where doesn't take a new variable created where as having takes. I cannot understand why is it so. For example count (*) as count in select statement. This count column can be accessed by having where as not by where. Pls explain.
Thanks for the question Asha. The WHERE only has access to existing data, meaning original, unprocessed data from tables on the FROM. the minute data gets processed as in the COUNT(*) and has no resemblance to the original, the WHERE can no longer filter that data. I find a good way to remember the distinction is : The WHERE acts as a filter on individual rows before they are grouped. The HAVING acts as a filter on grouped data. Hope this helps understand the distinction.
From the syntax logical order the "groupby" and "having" come before the "select". And so the calculated column can't be used in the "where". But why are you able to use the calculated column in the "groupby" which also comes before the "select" without a syntax error? This is at the 48min of the above recording.
Please listen again to the whole section that starts at time stamp 41:46 with the step-by-step breakdown of SQL processing. In this section, the instructor explains that even though the SELECT clause appears first in SYNTAX coding order, the WHERE clause is actually processed by SQL before the SELECT clause. So, at the point in time when SQL is executing the SELECT, the new column name has not been created yet. This is why the approach shown in the code example (to use a date constant) or the alternate approach of moving the YEAR function to the WHERE would work. Another possibility would have been to use the "calculated" keyword with the new column name. The "calculated" keyword helps SQL understand that this is a calculation that must be pulled from the SELECT. And in fact, if the instructor had used the calculated keyword on the WHERE clause (SQL would perform the calculation on the WHERE). Both the WHERE & The GROUP BY can access newly built columns on the SELECT that don’t already exist on the table listed on the FROM. In this example that you mentioned, the instructor didn’t use the calculated keyword on the GROUP BY, and SQL accepts that syntax without explicitly using "calculated", but SQL is still going to perform the calculation on the GROUP BY even though it’s the 3rd logical step SQL executes before the SELECT. It is important to remember that the coding order of the clauses is NOT the same as the execution order. That is one of the important points of this last section in the video. For more information about SQL processing, we have 2 classes you might want to look at: SQL 1: Essentials 2.sas.com/6055HOmmX SQL Methods and More 2.sas.com/6056HOmmk And, here’s a Community Forum link that also talks about this. 2.sas.com/6057HOmmZ
As shown at time stamp 44:39 - 45:30, you can see when the presenter runs the code that, it is a mistake to use YEARHIRED in the WHERE clause because she explains in the video that the FROM is executed first and then the WHERE is executed next. At the point in time when the WHERE is executed, YEARHIRED has not been calculated yet so it is not available to the WHERE. However, as shown at time stamp 47:09 when she runs the GROUP BY code that lists YEARHIRED, you can see that it works to have YEARHIRED in the GROUP BY clause. At the point in time when the GROUP BY is being executed, the YEARHIRED variable has already been created. To learn more about SQL processing with SAS, we recommend our SQL 1 class, as described here: 2.sas.com/6053HZeRH .
Hello again, Shenwaski, and thanks for your question about dictionary tables in SAS! If you scroll up to the information section below this video and click on "SHOW MORE", you will see, under "Additional Resources", a link for "PROC SQL DICTIONARY Tables PDF", which is a conference paper by our own Charu Shankar: "Know Thy Data: Techniques for Data Exploration" 2.sas.com/6054GASwp This paper will help you "learn to employ powerful PROC SQL’s dictionary tables to easily explore aspects of your metadata." It also explains what dictionary tables are, and gives several examples of how and why you might want to use them. "Dictionary tables contain a wealth of information about your SAS session. They are special read-only PROC SQL tables or views. They are created upon SAS invocation, updated automatically by SAS and are available throughout a SAS session. They provide information about SAS libraries, SAS data sets, SAS system options, and external files that are associated with the current SAS session and much, much more." Here is another paper that might help: "How Do I Look it Up If I Cannot Spell It: An Introduction to SAS Dictionary Tables" 2.sas.com/6055GASwV Our SAS documentation on the topic may also be helpful: 1) DICTIONARY Tables (from "SAS 9.4 Language Reference: Concepts, Sixth Edition") 2.sas.com/6056GASwn 2) DICTIONARY Tables (from "SAS 9.4 FedSQL Language Reference, Fifth Edition") 2.sas.com/6057GASwX Please let us know if we can help you with anything else!
Sharing more feedback: SQL is the acronym for for Structured Query Language. SQL is an ANSI standard language that is used for querying and reporting on data that is stored in relational databases. Most software vendors will have some implementation of SQL that you can use with their databases. The vendors usually all support the ANSI standard for SQL and then add their own features to further enhance their implementation of SQL. So, for example, Oracle SQL and Microsoft SQL Server SQL will all share features of the ANSI standard, but they might have features that are unique to their database and relational database manager software. For example this article explains in a lot of detail about the differences between these 2 SQL "flavors": 2.sas.com/6055JhSDZ
Gopi, SAS SQL does not have a PDV, therefore you do not have the ability to retain variable values like you do with the DATA step. SAS SQL has a powerful subroutine, the SQL Optimizer, that decides how the SQL query should be executed in order to minimize run time. The Optimizer examines submitted SQL code and characteristics of the SAS system and then creates efficient executable statements for the submitted query. There was a SUGI paper written about the SQL Optimizer years ago, but not sure if you want to reference it. For beginners, it may be over their head, but adding it here for your reference: The SQL Optimizer Project: _Method and _Tree in SAS®9.1 in case you want to look over it and see if you wish to add this as a reference.
Users who are continuously improving their analytical processing will benefit from this video. Follow along as Charu demonstrates the following elements to master PROC SQL: 1. Understand the syntax order in which to submit queries to PROC SQL. 2. Internalize the logical order in which PROC SQL processes queries. 3. Manage metadata using dictionary tables. 4. Join tables using join conditions like inner join and reflexive join. 5. Summarize data using Boolean operations.
We'll check on this for you and get back to you. In the meantime, you may wish to search for an answer or post your question in our SAS Support Communities. For example, this post may be helpful: How to create variables in PROC SQL? 2.sas.com/6051MUCDZ If you need to post your question, please read this: How to get fast, helpful answers 2.sas.com/6052MUCDw
Hello again, Sreeja! This video is approximately one hour of SQL topics. By comparison, our SAS SQL 1: Essentials class 2.sas.com/6050MqRfC is two days long with approximately 14 hours of lecture, demo, and hands-on practice. We cover how to create variables in the class in more depth. To create new variables, when working with an existing table or when creating a new table, you'll need to use the CASE expression. While we can't post code or screen shots here, our CASE Expression documentation (in our SAS SQL Procedure User's Guide) has two examples of how to create a new variable REGION from the existing variable CONTINENT here 2.sas.com/6051MqRfh We hope that helps!
Thank you so so much!!! I took notes, pictures and wrote down every single word you said and I really am starting to understand proc sql! Thank you so much for taking the time to do this!
We're glad you enjoyed it!
SAS aside this is a solid SQL walkthru.
Fantastic presentation, Charu!
thanks so much Kirk. Heading over to watch yours now.
Thank you for sharing your PROC SQL tips....very wonderful teaching!
Glad it was helpful!
Full SAS Sessions :-
ruclips.net/p/PLCmzhzAgjyBwafMtp7Fm7ZJj1oiwn1U5V
Clear, relevant, useful...just excellent!!
Glad it was helpful!
Fantastic tutorial. Keep going Charu 😎👍
Thank you Charu. Very useful information
Haresh, thank you so much! Glad you found it helpful!
Really nice course this course also motivated me thanks for video
Glad you found it helpful and thanks for tuning in!
Detailed explanation.. Easy to understand. Thank you 😊
Absolutely! Thanks for sharing!
Thank you! I do have a question though: at minute 47:05, I don't understand why were you able to access yearhired in a group by? Group by was supposed to be the third in the priority order, so yearhired should not be accessible? What am I missing?
Thanks a lot for considering answering. I'm a newbie in SQL, so I'm often confused :)
Thank you for your inquiry! We are checking on this for you!
See reply from the presenter below:
I will break it down to 2 statements where Yearhired appears. First we see Yearhired on the SELECT. The SELECT is simply a request to SQL to list columns you wish to see and in what order. Here we specify 2 non summarized columns: country, year(hiredate) which is our YearHired column. And one summary column - count(*) as numemp.
Next we see Yearhired on the GROUP BY as the 2nd column by which to group. This is of interest to us and will likely answer your question. In the GROUP BY we let SQL know that we wish to group by country and then yearhired. Thus, the SELECT in conjunction with the GROUP BY, gives us rolled up rows and summary counts for each combination of Country and yearhired ( thanks to our summarized column numemp on the SELECT).
Very good tutorial, I found it very easy to follow and very clear. Thanks!
Great! Emanuele, thanks for the feedback and glad you found it helpful!
Thanks this was really helpful
You're so welcome, Ravi! We're glad it helped, should you need any further assistance please don't hesitate to give us a shout! And, good luck!
lovely, great
Hi charu,
Both where and having is been executed before select statement but why where doesn't take a new variable created where as having takes.
I cannot understand why is it so.
For example count (*) as count in select statement. This count column can be accessed by having where as not by where.
Pls explain.
Thanks for the question Asha.
The WHERE only has access to existing data, meaning original, unprocessed data from tables on the FROM.
the minute data gets processed as in the COUNT(*) and has no resemblance to the original, the WHERE can no longer filter that data.
I find a good way to remember the distinction is :
The WHERE acts as a filter on individual rows before they are grouped.
The HAVING acts as a filter on grouped data.
Hope this helps understand the distinction.
Checking on this for you!
Muito bom! Excelente!
Thank you for your feedback!
From the syntax logical order the "groupby" and "having" come before the "select". And so the calculated column can't be used in the "where". But why are you able to use the calculated column in the "groupby" which also comes before the "select" without a syntax error? This is at the 48min of the above recording.
We are looking into this for you.
Please listen again to the whole section that starts at time stamp 41:46 with the step-by-step breakdown of SQL processing. In this section, the instructor explains that even though the SELECT clause appears first in SYNTAX coding order, the WHERE clause is actually processed by SQL before the SELECT clause. So, at the point in time when SQL is executing the SELECT, the new column name has not been created yet. This is why the approach shown in the code example (to use a date constant) or the alternate approach of moving the YEAR function to the WHERE would work. Another possibility would have been to use the "calculated" keyword with the new column name.
The "calculated" keyword helps SQL understand that this is a calculation that must be pulled from the SELECT. And in fact, if the instructor had used the calculated keyword on the WHERE clause (SQL would perform the calculation on the WHERE). Both the WHERE & The GROUP BY can access newly built columns on the SELECT that don’t already exist on the table listed on the FROM. In this example that you mentioned, the instructor didn’t use the calculated keyword on the GROUP BY, and SQL accepts that syntax without explicitly using "calculated", but SQL is still going to perform the calculation on the GROUP BY even though it’s the 3rd logical step SQL executes before the SELECT.
It is important to remember that the coding order of the clauses is NOT the same as the execution order. That is one of the important points of this last section in the video. For more information about SQL processing, we have 2 classes you might want to look at:
SQL 1: Essentials 2.sas.com/6055HOmmX
SQL Methods and More 2.sas.com/6056HOmmk
And, here’s a Community Forum link that also talks about this. 2.sas.com/6057HOmmZ
I had the same question.. Perhaps, she made a mistake by adding "yearhired" in the GROUP BY command
As shown at time stamp 44:39 - 45:30, you can see when the presenter runs the code that, it is a mistake to use YEARHIRED in the WHERE clause because she explains in the video that the FROM is executed first and then the WHERE is executed next. At the point in time when the WHERE is executed, YEARHIRED has not been calculated yet so it is not available to the WHERE. However, as shown at time stamp 47:09 when she runs the GROUP BY code that lists YEARHIRED, you can see that it works to have YEARHIRED in the GROUP BY clause. At the point in time when the GROUP BY is being executed, the YEARHIRED variable has already been created. To learn more about SQL processing with SAS, we recommend our SQL 1 class, as described here: 2.sas.com/6053HZeRH .
Hello charu..I want to learn SAS..do u also give classes separately too?
Vivek, tell us more how you would be using SAS and your role. We do offer e-learnin and live web courses as well.
Thanks , it was really helpful
Thanks for sharing!
sorry but I still don't know what's the function of the dictionary table or in which situation we should use it when managing data?
We're checking on this for you, Shenwaski, and will get back to you soon!
Hello again, Shenwaski, and thanks for your question about dictionary tables in SAS!
If you scroll up to the information section below this video and click on "SHOW MORE", you will see, under "Additional Resources", a link for "PROC SQL DICTIONARY Tables PDF", which is a conference paper by our own Charu Shankar: "Know Thy Data: Techniques for Data Exploration" 2.sas.com/6054GASwp
This paper will help you "learn to employ powerful PROC SQL’s dictionary tables to easily explore aspects of your metadata." It also explains what dictionary tables are, and gives several examples of how and why you might want to use them. "Dictionary tables contain a wealth of information about your SAS session. They are special read-only PROC SQL tables or views. They are created upon SAS invocation, updated automatically by SAS and are available throughout a SAS session. They provide information about SAS libraries, SAS data sets, SAS system options, and external files that are associated with the current SAS session and much, much more."
Here is another paper that might help: "How Do I Look it Up If I Cannot Spell It: An Introduction to SAS Dictionary Tables" 2.sas.com/6055GASwV
Our SAS documentation on the topic may also be helpful:
1) DICTIONARY Tables (from "SAS 9.4 Language Reference: Concepts, Sixth Edition") 2.sas.com/6056GASwn
2) DICTIONARY Tables (from "SAS 9.4 FedSQL Language Reference, Fifth Edition") 2.sas.com/6057GASwX
Please let us know if we can help you with anything else!
Amazing .!
Glad you enjoyed it!
So is SAS and SQL two seperate things... Like SAS and python??
Thank you for you inquiry! Yes, SAS and SQL are both programming languages.
Sharing more feedback: SQL is the acronym for for Structured Query Language. SQL is an ANSI standard language that is used for querying and reporting on data that is stored in relational databases. Most software vendors will have some implementation of SQL that you can use with their databases. The vendors usually all support the ANSI standard for SQL and then add their own features to further enhance their implementation of SQL. So, for example, Oracle SQL and Microsoft SQL Server SQL will all share features of the ANSI standard, but they might have features that are unique to their database and relational database manager software. For example this article explains in a lot of detail about the differences between these 2 SQL "flavors": 2.sas.com/6055JhSDZ
is it possible to retain charecter variables in pdv
Gopi, thank you for your inquiry! We are checking on this for you!
Gopi, SAS SQL does not have a PDV, therefore you do not have the ability to retain variable values like you do with the DATA step. SAS SQL has a powerful subroutine, the SQL Optimizer, that decides how the SQL query should be executed in order to minimize run time. The Optimizer examines submitted SQL code and characteristics of the SAS system and then creates efficient executable statements for the submitted query.
There was a SUGI paper written about the SQL Optimizer years ago, but not sure if you want to reference it. For beginners, it may be over their head, but adding it here for your reference: The SQL Optimizer Project: _Method and _Tree in SAS®9.1 in case you want to look over it and see if you wish to add this as a reference.
after learning this tutorial what i will be capable of?
Users who are continuously improving their analytical processing will benefit from this video. Follow along as Charu demonstrates the following elements to master PROC SQL:
1. Understand the syntax order in which to submit queries to PROC SQL.
2. Internalize the logical order in which PROC SQL processes queries.
3. Manage metadata using dictionary tables.
4. Join tables using join conditions like inner join and reflexive join.
5. Summarize data using Boolean operations.
What if you have a date to dichotomize like you did for the mangers and employees?
Thank you for your inquiry! We are checking on this for you!
You would list your date variable in the GROUP BY clause.
which dataset are you using
You can find the data used for this tutorial on GitHub 2.sas.com/6051GeNyJ.
Also, the associated paper in resources details where to download the data sets.
How to define sex variable as f = female and m= male in proc SQL program what is the code for that
We'll check on this for you and get back to you. In the meantime, you may wish to search for an answer or post your question in our SAS Support Communities.
For example, this post may be helpful: How to create variables in PROC SQL? 2.sas.com/6051MUCDZ
If you need to post your question, please read this: How to get fast, helpful answers 2.sas.com/6052MUCDw
Hello again, Sreeja!
This video is approximately one hour of SQL topics. By comparison, our SAS SQL 1: Essentials class 2.sas.com/6050MqRfC is two days long with approximately 14 hours of lecture, demo, and hands-on practice. We cover how to create variables in the class in more depth.
To create new variables, when working with an existing table or when creating a new table, you'll need to use the CASE expression. While we can't post code or screen shots here, our CASE Expression documentation (in our SAS SQL Procedure User's Guide) has two examples of how to create a new variable REGION from the existing variable CONTINENT here 2.sas.com/6051MqRfh
We hope that helps!