Bro I have a question. In this video data was distributed to two nodes even before executors are created ( before program logic starts ryt) But in my program if I read data from another system (like s3) during program execution only data will be loaded to cluster. So program execution ( creation of executor) should start first ryt before distributing data?
Is it fine to use Spark Standalone for a POC project , or should I use with Hadoop itself? The requirement is basically to migrate Django Cron jobs what we run on Celery(with RabbitMQ) to Spark .
Bro, when I run in standalone mode with 1 master and 2 workers (in laptop, I have made host as master and first worker... (and) 2nd worker in VM)... Only when I put the same input file in both the worker node in gives me correct output. For eg. I need to put the same 1gb input file in both workers. If I partition them into 512mb (I did it manually, but the location of file in both worker is same) , first half of the file in one worker and other half in second worker, it gave incorrect results. Why is it so?
When manually partitioning data in a Spark standalone cluster, ensure that each partition resides on a separate worker node and has unique file paths to avoid data duplication and incorrect results
In Spark's standalone cluster mode, worker nodes provide resource isolation, fault tolerance, scalability, parallelism, and efficient resource management, even if data splitting is not required.
Excellent!!!! Hats off to you teaching. Don't stop teaching. Each and every point got cleared. Keep it up.
Bro I have a question.
In this video data was distributed to two nodes even before executors are created ( before program logic starts ryt)
But in my program if I read data from another system (like s3) during program execution only data will be loaded to cluster.
So program execution ( creation of executor) should start first ryt before distributing data?
Fantastic teaching with amazing clarity, point by point explanation. Thank you.
Your way of teaching is excellent Gowtham
This is the best explanation i ever saw
Very indepth explanation
It's crystal clear i like ur way of teaching
Brother here master and worker are deamons or physical servers?
Is it fine to use Spark Standalone for a POC project , or should I use with Hadoop itself? The requirement is basically to migrate Django Cron jobs what we run on Celery(with RabbitMQ) to Spark .
Bro, when I run in standalone mode with 1 master and 2 workers (in laptop, I have made host as master and first worker... (and) 2nd worker in VM)... Only when I put the same input file in both the worker node in gives me correct output. For eg. I need to put the same 1gb input file in both workers.
If I partition them into 512mb (I did it manually, but the location of file in both worker is same) , first half of the file in one worker and other half in second worker, it gave incorrect results. Why is it so?
When manually partitioning data in a Spark standalone cluster, ensure that each partition resides on a separate worker node and has unique file paths to avoid data duplication and incorrect results
Spark typically requires a distributed file system like hdfs I think, otherwise you need to make each file available to all the executors manually
Excellent video, very well explained.
In Standalone mode, if there is no requirement of splitting the file into multi nodes what is the purpose of having worker nodes(multi nodes).
In Spark's standalone cluster mode, worker nodes provide resource isolation, fault tolerance, scalability, parallelism, and efficient resource management, even if data splitting is not required.
Excellent
Pls post current spark architecture again
Great Explanation!!
Hi, ji can you upload spark architecture with yarn
Fantastic brother, but in real-time most entities uses Spark with Yarn Deployment mode, these many months you still didn't upload that video.!!!
Hi bro
Thanks
Please find the video for yarn deployment in spark
ruclips.net/video/3c62-F6bu5k/видео.html
Thalaiva❤❤❤❤
Thank you !
In general
How will u read data line by line in python
Really its crystal clear explantion i like ur videos way of explantion tq
Thank you !!!