Hadoop Tutorial - High Availability, Fault Tolerance & Secondary Name Node
HTML-код
- Опубликовано: 18 сен 2024
- Spark Programming and Azure Databricks ILT Master Class by Prashant Kumar Pandey - Fill out the google form for Course inquiry.
forms.gle/Nxk8...
-------------------------------------------------------------------
Data Engineering using is one of the highest-paid jobs of today.
It is going to remain in the top IT skills forever.
Are you in database development, data warehousing, ETL tools, data analysis, SQL, PL/QL development?
I have a well-crafted success path for you.
I will help you get prepared for the data engineer and solution architect role depending on your profile and experience.
We created a course that takes you deep into core data engineering technology and masters it.
If you are a working professional:
1. Aspiring to become a data engineer.
2. Change your career to data engineering.
3. Grow your data engineering career.
4. Get Databricks Spark Certification.
5. Crack the Spark Data Engineering interviews.
ScholarNest is offering a one-stop integrated Learning Path.
The course is open for registration.
The course delivers an example-driven approach and project-based learning.
You will be practicing the skills using MCQ, Coding Exercises, and Capstone Projects.
The course comes with the following integrated services.
1. Technical support and Doubt Clarification
2. Live Project Discussion
3. Resume Building
4. Interview Preparation
5. Mock Interviews
Course Duration: 6 Months
Course Prerequisite: Programming and SQL Knowledge
Target Audience: Working Professionals
Batch start: Registration Started
Fill out the below form for more details and course inquiries.
forms.gle/Nxk8...
--------------------------------------------------------------------------
Learn more at www.scholarnes...
Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
========================================================
SPARK COURSES
-----------------------------
www.scholarnes...
www.scholarnes...
www.scholarnes...
www.scholarnes...
www.scholarnes...
KAFKA COURSES
--------------------------------
www.scholarnes...
www.scholarnes...
www.scholarnes...
AWS CLOUD
------------------------
www.scholarnes...
www.scholarnes...
PYTHON
------------------
www.scholarnes...
========================================
We are also available on the Udemy Platform
Check out the below link for our Courses on Udemy
www.learningjo...
=======================================
You can also find us on Oreilly Learning
www.oreilly.co...
www.oreilly.co...
www.oreilly.co...
www.oreilly.co...
www.oreilly.co...
www.oreilly.co...
www.oreilly.co...
www.oreilly.co...
=========================================
Follow us on Social Media
/ scholarnest
/ scholarnesttechnologies
/ scholarnest
/ scholarnest
github.com/Sch...
github.com/lea...
========================================
Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
www.learningjournal.guru/courses/
Nothing is tough when you have a good teacher. Kudos for your work sir.
No one, i repeat no one has explained hadoop with this perfection. A million thanks
The best explanation of standby node in the Internet!!
Crisp, Simple and Picture is what called as best teaching. You are a best tutor.
I have gone through so many tutorials but the way you explained sir makes it so easy to understand hadoop. Thanks a lot sir!!
I have become a fan of your style of teaching. Thank you, sir. 😊
Really thank you for such topics,I spent a lot of time reading books but I couldn't understand anything till I watched your tutorials. big thanks
As you described, the role of Secondary Name Node is to regularly take the checkpoint at configured interval and update the on disc FS Image by applying the editlogs that were captured in the time window when it took last checkpoint. And to further reduce the restart time of Primary Name Node, it does the same checkpoint process where it reads the on disc FS Image stored by SNN and apply the editlogs entry to create latest FS Image and store it in memory. Few questions wrt these : -
1. Where does SNN stores the FS Image. Is it inside disc on local file system ?
2. How does primary name node get access to that Secondary NN ?
1. zookeeper election
2. split-brain concepts
3. Hadoop 3, erasure coding and storage policies
Could you please explain all above
You make things very simple to understand..... Hats off to your effort !!
You are the best teacher.. Thanks a lot
Thanks. It's very clear. Piece of advice for viewers: These tutorials can easily be watched in 2x speed.
i learn HDFS from last 7 days but still my concepts are not clear..but today i watched your video i am clear with everything...thank you
Great explanation, thanks for your efforts :)
Excellent explanation Sir, Hat's off.
Awesome Sir ..Thank You
Your explanation is very clear thank you. Kindly keep update the new videos.
What an Explanation 🙏🙏🙏🙏🙏🙏❤️❤️❤️❤️❤️❤️❤️
Explanation was clear.
I have few questions ?
1)while setting cluster using Hadoop 2,Initially how will zookeeper elects the leader among the namenodes?
2)Can you explain the funcitonality of failcontrollers of namenode?
Bro i would love to answer
When u setup a new cluster the NN will be the active NN which u have selected to be a NN
AND
Later if it fails the zkfc(zookeeper failover controller ) is responsible for making standby node as a active node
Hope this will help u
When u set up a new cluster the active namenode will be the one which you selected and if NN goes down the zookeeper will work here the demand of zookeeper ZKFC which stands for zookeeper failover and it is responsible for making standby namenode active namenode
Great Work sir. Thanx for video.
This was beautiful! Thank you.
Very good Tutorial. Only thing I want to say is fsimage is not only in memory but also stored on disk. Please excuse me if I am not correct on this point.
Really nice explanation. If you can start practical implementation of one POC with end to end project , it will be very useful for all of us. Thanks for your efforts and time.
Presentation and explanation was excellent..
It was very good information.
Highly recommended for anyone who wishes to learn about how fault tolerance is managed in HDFS.
In addition to this, I've a question: Are block recovery, lease recovery and pipeline recovery done in addition to the methods describe in video for fault tolerance or these are done at deeper level of the described methods?
Very good tutorial. Easy to understand.
very useful explaination
Simple and superb explained
Thanks for the detailed explanation.
Thanks for your clear explanation. Awesome!
Awesome sir...great explanation👌👌
very nice.I could not understand too much about secondary name node but will try to understand it.
Why? is it because the explanation is not clear? You can ask your doubts if there are any?
no explantn is so nice but my fsimages nd editlog is not clear so
nd thank u very much
Very nicely explained.
Very nice explanation!
great and clear explanation thanks.
Great Tutorial..thanks for sharing
Very informative. Thanks
Thank you sir
Awesome tutorial
very well explained
Great explanation
Hello! Is there any ppt format of this video? Need to explain students.. the representation is superb
Excellent !
Good.
Good lecture.
Superb...Thank you so much
Why cant we dump the fsimage directly to disk during restarting of the NameNode . After restarting it can read the fsimage and then push it to memory it will be faster.
nice tutorial sir
very good video
Great work, I have 2 questions.
-Regarding the checkpoint activity does the secondary NN keeps the "on Disk FS" Image on it's local HD or is it on the Active NN HD ?
-and the hour between each checkpoint is it configurable?
nicely explained
Thank you for this excellent tutorial. I am new to this topic and all the tutorials or blogs I went through, did not put up a clear picture of what is happening with Checkpoint process of SNN and that of NN too. So, can you please confirm my understanding about this topic (Related to NON HA mode) ?...
1) After every Checkpoint run, SNN clears the Edit Log on Name Node as well? So at any time, Edit log on NN has data only since the last Checkpoint run on SNN.
2) fsimage of the NN gets updated automatically in real time (i.e as and when changes are made to the file system). Which means , Name Node always has latest fsimage in its memory at all times.
3) At any given time fsimage on the Secondary Name Node holds file system image updated as of last Checkpoint run.
4) After a reboot, Name Node picks up the fsimage from the "Secondary Name Node" and the Edit Log from NN local disc and merges them to create new fsimage file which is up to date with all changes as of then.
Thank you so much ...
can u make a video why RDD is immutable and what would have happened had it not been immutable
Sir great explation sir. I have a dout sir 1)how to install cloudera without internet sir & and what is parcel method and packeges method.
it was clear about topic thank you so much , can you show with example
Nice One
Why there's an odd no. Of JN 3 or 5??
What's the reason behind that
Thanks. could you please explain how to create Cloudera cluster as now a days many clients are prefer cloudera instead of Hortonworks..
Awesome
Sir, What will happen, if the DN-1 is slow, and it does not send heartbeat as fast as compared to other nodes. If NN then thought that DN-1 is down and started replicating the data on different node say DN-2 and during replicating the data the DN-1's heartbeat reached to NN. Will it stop replicating the data on DN-2?
+Pranav Wagde, I think it is hypothetical question. Either I get the heartbeat within expected interval or I don't. There is no concept of slow heartbeat. If NN realized that the block is under replicated, it will make more replicas to fix it. There is no concept of stopping in between. Later when NN realizes that block is over replicated, it will fix that also by throwing away some replicas.
Thanks for the explanation. Understood the concept.
helpful
please upload some hive and pig related videos ..
sure, maybe in a month.
Can we have multiple replication factor for multiple tenants?
You can have it at the topic level and I guess all Tanents of the cluster are not going to share the topics. So, answer is a Yes.
Can we make a single node for both NameNode and as a Secondary NameNode..?
Yes, we can. However, we don't do it in production.
okk....thanks
how fsimage file and editlog file communicate each other?
fsimage will not communicate with editlog but during checkpointing process new fsimage will be created by merging old fsimage with new editlog
1.75 x
Hadoop is very fault tolerant. The only point of failure can be Maharashtra State Electricity Board.
Lol! You can keep backup in Inverters.. Its not costly.
Great explanation