Thank you for the wonderful video...I have a question as you mentioned you should use sortWithinPartitions to avoid expensive transformations when you know that the particular data is in one partition, how will you know that?? I am assuming that is only possible when you partition the data based on the values of that particular column.
Thank you, Sir my learning curve with regards to Spark has taken an exponential trend after watching your videos. It has been a rich learning experience. I have been trying to practice this parallely. I have a question regarding data frame in pyspark. When I tried to create the variable "bad_loan" using withColumn and when (for the various cases of loan_status), the variable doesnt get created in the table, though I can see it in the dataframe. When I try to access this column using a select statement, I get an error. Can you please throw some light on this?
Thanks Ranjani.. did u assign it to dataframe and use that dataframe to save. In my video I think I saved old dataframe object and not the one I assigned to new columns. Can you please validate it?
@AIEngineering - Thanks a lot for your video. May I kindly check all your spark video codes are based on python? You don't use scala/java? Whatever we do in scala/java can also be done using python?
@@AIEngineeringLife - do you think CCA175 cloudera certification for Apache spark and hadoop developer is good one to attempt for someone who is working as Data Engineer? Do you recommend any other certifications? And can the certification be done using Pyspark as well? Your help is highly appreciated
Deepak.. Spark videos are not yet in my git repo.. It will take time to get there. Below is my repo that has other video code at this time github.com/srivatsan88/RUclipsLI
Probably the only person who tells you facts and reality in the data science community.
Very well.. Very helpful to learn Apache spark with real business end to end case.
Thank you for the wonderful video...I have a question as you mentioned you should use sortWithinPartitions to avoid expensive transformations when you know that the particular data is in one partition, how will you know that?? I am assuming that is only possible when you partition the data based on the values of that particular column.
Yes you can have to custom partition the data in that case
instablaster.
Thank you, Sir my learning curve with regards to Spark has taken an exponential trend after watching your videos. It has been a rich learning experience. I have been trying to practice this parallely. I have a question regarding data frame in pyspark. When I tried to create the variable "bad_loan" using withColumn and when (for the various cases of loan_status), the variable doesnt get created in the table, though I can see it in the dataframe. When I try to access this column using a select statement, I get an error. Can you please throw some light on this?
Thanks Ranjani.. did u assign it to dataframe and use that dataframe to save. In my video I think I saved old dataframe object and not the one I assigned to new columns. Can you please validate it?
@@AIEngineeringLife Thank you for the response, Sir. I was able to resolve this issue. It was related to the way the when function was to be used.
Very elaborate and well-explained! Can you please share the code and notebook?
Sir you are an Inspiration.
Thanks Srinivasan for the wonderful explanation
Thanks Kishore
Thanks ton..! You made spark easy. Please make a video on how to optimize spark code and data skewness..
Videos are all very informative.
Is there anyway we can sort based on more than one attribute? eg: Country Ascending and Date Descending
Ans: orderBy(col("City").asc(),col("Date").desc())
@@nagarajuch2412 .. You got the answer :) .. It is there in one of my data engineering video as well
Thanks Srivatsan...Nice explanation!
Excellent presentation sir!!
Thank you bro thanks for this wonderful content video
@AIEngineering - Thanks a lot for your video. May I kindly check all your spark video codes are based on python? You don't use scala/java? Whatever we do in scala/java can also be done using python?
All of my videos are using pyspark. So python is the one I have used but same can be easily done on Scala as well
@@AIEngineeringLife - do you think CCA175 cloudera certification for Apache spark and hadoop developer is good one to attempt for someone who is working as Data Engineer? Do you recommend any other certifications? And can the certification be done using Pyspark as well? Your help is highly appreciated
Amazing video thank you!!!!
Amazing video sir.
Do you have this Databricks page somewhere in git?
Best video 👍
Thanks a lot
very nice explanation sir.....could you please upload the code, sir?
Deepak.. Spark videos are not yet in my git repo.. It will take time to get there. Below is my repo that has other video code at this time
github.com/srivatsan88/RUclipsLI
Grateful for this series
Day 3 : colab.research.google.com/drive/1yTDcFFcUAynSXqZxjmu6UJ8bFAkEgnqV?usp=sharing&authuser=1#scrollTo=O9naSW-WLWR5
Please
provide your GitHub link and also provide corona data and twitter data
You can find all codes here - github.com/srivatsan88/Mastering-Apache-Spark