Hi Sir, I have came across some conflicting information regarding show() method. Kindly confirm the below information: The show method does not bring all the data from a single partition to the driver, but rather collects a sample of rows from each partition and display them.
As of my understanding, show method brings all the data from a single partition of single executor to the driver, but displays a sample of rows from that partition.
My driver is stopping while writing a dataframe. I didn't use any action rather than write. We can mot increase cluster configuration. How to resolve the issue? Any suggestion?
It's not first time when I was happy with error. Jab bhi hamara code Bina error ke galat output deta hai then I want it to give me error so that I know exactly where to fix this 😀
Application Master/Driver is Driver node not worker node. Master Node is where your resource manager is, though DB does give option to have RM and AM/D to be on the same machine now.
Let's say we read a csv file of 10.1 GB stored in datalake and have to do some filtering of data, how many tasks will run? is there a possibility of, out of memory error in the above scenario?
10100 mb/128mb = 79 partition. So total 79 task will be created and since you are doing filtering which doesn't require any dependency from other partition which means it is a narrow dependency transformation. Every partition can do it's own filtering. As long as you have more than 500mb of executor size you will not face any oom. But make sure that you are not calling a collect function otherwise you may face driver oom.
That smile on your face when driver OOM error came😂, shows how much you enjoy teaching us🤗
awesome tutorials brother! Really really simplified things! Absolutely brilliant!
I have recently joined MNC, and I must say this is so accurate content. thank you so much for sharing your knowledge and experience
Bhai itne easy or detailed way me koi ni samjhata hai. Great work !!
Bhai aapne pdhaya to bht badiya, thanks a lot for that but ye practical or theory ki alg alg playlist kyu bnayi h smjh nhi aata kb konsi dekhna h.
What do we mean by container here? Does it has any other name which we studied in earlier videos?
ekdum mast hai bhai ye wala video...keep going
Can you please make videos in English language. It will be easy to understand .
Awesome most asked interview question this channel should get more subscribers and views
sab reels dekhne me vyast hai bhai
What are objects here due to which driver overhead oom error occurs? Can you please explain?
How would we know that which file is small when we do broadcast Join, please tell me
Hi Sir, I have came across some conflicting information regarding show() method. Kindly confirm the below information: The show method does not bring all the data from a single partition to the driver, but rather collects a sample of rows from each partition and display them.
As of my understanding, show method brings all the data from a single partition of single executor to the driver, but displays a sample of rows from that partition.
Awesome content please continue the series
great video bhaiya
perfect video sir
hats off bro loving your channel
My driver is stopping while writing a dataframe. I didn't use any action rather than write. We can mot increase cluster configuration. How to resolve the issue? Any suggestion?
This information is not sufficient to provide some suggestion
Thanks sir for your great video ❤
Please make the series continue
ok so garbage collection comes under overhead or jvm heap memory?
Jvm heap memory
5:16 literally first guy in my life who is happy to get error.😆
It's not first time when I was happy with error. Jab bhi hamara code Bina error ke galat output deta hai then I want it to give me error so that I know exactly where to fix this 😀
Hi Manish, Any good resource to lean Scala ?
nice video
day 6 done 👍
Hi Manish, driver programme will run on master node or worker node as you have written Application Master (worker node)?
Application Master/Driver is Driver node not worker node.
Master Node is where your resource manager is, though DB does give option to have RM and AM/D to be on the same machine now.
Let's say we read a csv file of 10.1 GB stored in datalake and have to do some filtering of data, how many tasks will run?
is there a possibility of, out of memory error in the above scenario?
10100 mb/128mb = 79 partition. So total 79 task will be created and since you are doing filtering which doesn't require any dependency from other partition which means it is a narrow dependency transformation. Every partition can do it's own filtering. As long as you have more than 500mb of executor size you will not face any oom. But make sure that you are not calling a collect function otherwise you may face driver oom.
why 500 mb?
@@manish_kumar_1
Hi Manish
could you please upload same videos in english
Hi Manish , when you will arranged live chat ....
As soon as possible
i saw few videos they mention max(0.07*memory, 384mb)
Mastt video!
Description me S22 ultra bilkul nahi lena hai aise kyu likha hai 😂
👍
Bro, please make vedio in English
i think this is Lec-15 but by mistake it is written lec-18 sir