Kafka Tutorial - Custom Partitioner

Поделиться
HTML-код
  • Опубликовано: 10 дек 2016
  • Spark Programming and Azure Databricks ILT Master Class by Prashant Kumar Pandey - Fill out the google form for Course inquiry.
    forms.gle/Nxk8dQUPq4o4XsA47
    -------------------------------------------------------------------
    Data Engineering using is one of the highest-paid jobs of today.
    It is going to remain in the top IT skills forever.
    Are you in database development, data warehousing, ETL tools, data analysis, SQL, PL/QL development?
    I have a well-crafted success path for you.
    I will help you get prepared for the data engineer and solution architect role depending on your profile and experience.
    We created a course that takes you deep into core data engineering technology and masters it.
    If you are a working professional:
    1. Aspiring to become a data engineer.
    2. Change your career to data engineering.
    3. Grow your data engineering career.
    4. Get Databricks Spark Certification.
    5. Crack the Spark Data Engineering interviews.
    ScholarNest is offering a one-stop integrated Learning Path.
    The course is open for registration.
    The course delivers an example-driven approach and project-based learning.
    You will be practicing the skills using MCQ, Coding Exercises, and Capstone Projects.
    The course comes with the following integrated services.
    1. Technical support and Doubt Clarification
    2. Live Project Discussion
    3. Resume Building
    4. Interview Preparation
    5. Mock Interviews
    Course Duration: 6 Months
    Course Prerequisite: Programming and SQL Knowledge
    Target Audience: Working Professionals
    Batch start: Registration Started
    Fill out the below form for more details and course inquiries.
    forms.gle/Nxk8dQUPq4o4XsA47
    --------------------------------------------------------------------------
    Learn more at www.scholarnest.com/
    Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
    ========================================================
    SPARK COURSES
    -----------------------------
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/d...
    KAFKA COURSES
    --------------------------------
    www.scholarnest.com/courses/a...
    www.scholarnest.com/courses/k...
    www.scholarnest.com/courses/s...
    AWS CLOUD
    ------------------------
    www.scholarnest.com/courses/a...
    www.scholarnest.com/courses/a...
    PYTHON
    ------------------
    www.scholarnest.com/courses/p...
    ========================================
    We are also available on the Udemy Platform
    Check out the below link for our Courses on Udemy
    www.learningjournal.guru/cour...
    =======================================
    You can also find us on Oreilly Learning
    www.oreilly.com/library/view/...
    www.oreilly.com/videos/apache...
    www.oreilly.com/videos/kafka-...
    www.oreilly.com/videos/spark-...
    www.oreilly.com/videos/spark-...
    www.oreilly.com/videos/apache...
    www.oreilly.com/videos/real-t...
    www.oreilly.com/videos/real-t...
    =========================================
    Follow us on Social Media
    / scholarnest
    / scholarnesttechnologies
    / scholarnest
    / scholarnest
    github.com/ScholarNest
    github.com/learningJournal/
    ========================================

Комментарии • 52

  • @ScholarNest
    @ScholarNest  3 года назад

    Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
    www.learningjournal.guru/courses/

  • @user-rr3cc3uv2h
    @user-rr3cc3uv2h 7 лет назад +3

    Thanks for explaining such important detail!

  • @glpathy
    @glpathy 4 года назад

    Excellent Sir, Thank you for giving detailed insights and showing how message key partition will get in to trouble. Very Practical.. Thank you sir.

  • @filat239
    @filat239 6 лет назад +2

    nice explanation! You saved my time!

  • @rickydebroy
    @rickydebroy 2 года назад

    Just to add, "cluster.partitionCountForTopic(topic);" will also give you the number of partitions for a topic. I am using in kafka_2.13-2.8.1, so this should be available in the upper versions as well. Thank you for this awesome series on Apache Kafka.

  • @ckudalkar
    @ckudalkar 4 года назад

    Fantastic!!!!!

  • @varunm.v3009
    @varunm.v3009 4 года назад

    Awesome sir..Good Learning for me...

  • @roshanbisht9304
    @roshanbisht9304 6 лет назад

    Kafka Security tutorials will be helpful as well

  • @veeranjikomatineni2269
    @veeranjikomatineni2269 6 лет назад

    In Kafka we are using and write the code Only in Java.... we don't need to write the code in Spark. if Spark is possible send some links for learning Spark Producer codes and etc...
    thanks for giving very expensive Knowledge!!

  • @sahilgarg7171
    @sahilgarg7171 6 лет назад

    Sir, how can we create topic while reading some csv file ? Like there are 10-15 fields in CSV and we want to create topics on every unique value which is present in column 6. is this possible ?

  • @TheNayanava
    @TheNayanava 7 лет назад

    I think there is a little mistake in the explanation where you say that SSP3 and SSP8 have been allotted the same partition because the hash doesn't guarantee unique value. It could very well be because we do a % (numberOfParitions - sp), which in this case will range between 3-9.. so there is a probability that every 1/6th key gets assigned to the same partition.. Correct me if I am wrong please

  • @ragavkb2597
    @ragavkb2597 6 лет назад +1

    Thanks for the explanation, Is there a benefit of implementing such custom partitioner to determine the partition to store instead of using different topic altogether for such cases? I might be missing something here, appreciate if someone could help me answer. Thanks again

    • @abdul3rauf
      @abdul3rauf 4 года назад

      I have the same doubt

    • @LivenLove
      @LivenLove 4 года назад

      I think partitioner will help in evenly distributing the messages in the cluster of machines..it would be bad if most of our messages end up getting stored in one server while others are not utilised.

    • @balaramganesh2970
      @balaramganesh2970 3 года назад

      It depends on your use case. If you feel the need to create an entirely new topic to accommodate a completely different set of events, then you should go ahead and do since topics are intended for that purpose. However, if there isn't a need for a new topic, then you add partitions to your existing topic and scale horizontally - this is a perfect use of Kafka's amazing scalability. You can continue to use the same brokers, but you'd be compromising on your fault tolerance. So, when you increase partitions, better to increase the number of brokers as well.

  • @ANUKARTHIM
    @ANUKARTHIM 6 лет назад

    Hi sir,
    Thanks for the playlist and all are well explained.
    Will you please show me an example how to write the same in Scala. How to write custom partitioner in scala.
    I tried but implemented methods not shown in scala, bcz scala doesn't support interface.
    Is there any alternate way to write the same logic in scala.
    Thanks
    Venkatesh

    • @ScholarNest
      @ScholarNest  6 лет назад

      I don't think there is an official Kafka client for Scala.

  • @amitranjan6998
    @amitranjan6998 3 года назад

    How we can print the Partition with Message in windows . same like you have printed at @11:44. Actually I am using window machine and can't able to find any command which give me in which partition what message reside . Through topic we can read but want to check through the partition .If possiable read all the partition and message to the topic .it's greatful to me

  • @saurabhsaxena3327
    @saurabhsaxena3327 4 года назад

    Thanks for this wonderful video.
    I have a doubt here: In custom partitioner section, you are hashing the message value in step3 and hashing key in step 4.
    So in step 3, do you mean message value is the actual message sent by producer? Are you hashing the actual message?

    • @balaramganesh2970
      @balaramganesh2970 3 года назад

      yup, he is hashing the entire message value to ensure its cardinality, vs. using the same key for all these messages.

  • @mkvjaipur
    @mkvjaipur 7 лет назад

    it would be great if you can tell something about how you used sbt tool :)

    • @ScholarNest
      @ScholarNest  7 лет назад

      Sure, will add it soon. Probably in Scala+Spark Tutorials.

    • @dedipyabolisetty8666
      @dedipyabolisetty8666 7 лет назад +1

      Thanks for the detailed expalnation. Did you create scala + spark tutorials?

    • @_deepuprem629
      @_deepuprem629 7 лет назад +1

      running with sbt can be skipped altogether and could use an ide straight, if not really familiar with sbt yet. I used netbeans/ maven to run and it just worked great.

  • @venkateswarlukomirisetti1006
    @venkateswarlukomirisetti1006 7 лет назад

    Hi sir, I have some queries ...., can you please clear those?
    In above program we are passing broker configuration through "bootstrap.servers" property. Is it mandatory, or any possibility to achieve same through Zee-Keeper?
    How Producer interacting with Zoo-Keeper to have brokers information(cluster information) for a topic?

    • @ScholarNest
      @ScholarNest  7 лет назад

      It is a necessary property. New Kafka API is removing dependency for the Client application on Zookeeper.

  • @rameshhawks9709
    @rameshhawks9709 5 лет назад

    Where your calling the int partitioner???

  • @pramodsripada2380
    @pramodsripada2380 7 лет назад

    Sir, in the Custom partitioner that you have implemented, starts returning different partition when the number of partitions increase right? Because of the dependency of number of partitions we have moved away from Key based partitioning. Am I missing anything

    • @ScholarNest
      @ScholarNest  7 лет назад +1

      You can use Key for partitioning. But be aware that data for more than one keys can come to the same partition. If that's not your requirement, Implement your own partitioner.

  • @shrinivasashetty846
    @shrinivasashetty846 4 года назад

    Hi Sir, Am using Java IDE and i get below error messge
    Exception in thread "main" org.apache.kafka.common.config.ConfigException: Invalid value SensorPartitioner for configuration partitioner.class: Class SensorPartitioner could not be found.
    SensorPartitioner has been define under same package

  • @LivenLove
    @LivenLove 4 года назад

    If different keys can lead to same hash..isnt it possible for different values to give same hash??

    • @ScholarNest
      @ScholarNest  4 года назад

      Yes it is. And they all get routed to to same reducer. But that's not a problem because grouping happens on the key and hash is only to distribute the workload.

  • @hadoopworld35
    @hadoopworld35 7 лет назад

    Sir , Can I get list of partition from consumer class.

    • @ScholarNest
      @ScholarNest  7 лет назад +1

      Yes, you can get it from the Cluster object availablePartitionsForTopic

    • @hadoopworld35
      @hadoopworld35 7 лет назад

      Learning Journal thanks , I will try this ....

  • @GowthamS-zb3gq
    @GowthamS-zb3gq 6 лет назад

    shall we run kafka without sbt ?

    • @ScholarNest
      @ScholarNest  6 лет назад

      You don't need SBT Kafka. SBT is a build tool compile and package your code.

  • @vivekkatakam9024
    @vivekkatakam9024 7 лет назад

    when i run the program,i am getting below error,please help me..
    Exception in thread "main" org.apache.kafka.common.config.ConfigException: Invalid value SensorPartitioner for configuration partitioner.class: Class SensorPartitioner could not be found.

    • @ScholarNest
      @ScholarNest  7 лет назад

      How are you compiling the code? If you are using SBT, you need to have the code for SensorPartitioner in the same directory. If using some other tool, make sure the class for SensorPartitione is in your class path.

    • @vivekkatakam9024
      @vivekkatakam9024 7 лет назад

      I have sensorpartitioner in same directory only..I am not using any build tool...

    • @ScholarNest
      @ScholarNest  7 лет назад

      Send me the steps that you are following. Don't paste it in the comment. Send me a private message. I will try to resolve your error.

    • @vivekkatakam9024
      @vivekkatakam9024 7 лет назад

      Thanks!!

    • @MohammadRasoolShaik
      @MohammadRasoolShaik 7 лет назад

      Please mention absolute path(With Package name) of the class in config.

  • @ayyappa428
    @ayyappa428 7 лет назад

    p = Utils.toPositive(Utils.murmur2(valueBytes)) % sp;
    I am getting the error at .toPositive.
    i am unable to find toPositive method in sensorpartitioner.
    Plzz Help me!!

    • @ScholarNest
      @ScholarNest  7 лет назад

      Do you have import org.apache.kafka.common.utils.*; in your code.
      This function is in Utils.

    • @ayyappa428
      @ayyappa428 7 лет назад

      Yess i have imported!!But still it is not working!!

    • @hadoopworld35
      @hadoopworld35 7 лет назад

      I am also getting the same error..please let me know if you have resolved it

    • @rakeshpattanayak8283
      @rakeshpattanayak8283 7 лет назад

      sumit kumar : can you share ur code for that line along with your import from top

    • @rakeshpattanayak8283
      @rakeshpattanayak8283 7 лет назад

      Ayyappa Eswar can you share ur line u getting error also import. Also do share the kafka version

  • @umerayazbaig2025
    @umerayazbaig2025 5 лет назад

    This is difficult. Describe it visually along with code also