Kafka Tutorial - Exactly once processing

Поделиться
HTML-код
  • Опубликовано: 25 июл 2024
  • Spark Programming and Azure Databricks ILT Master Class by Prashant Kumar Pandey - Fill out the google form for Course inquiry.
    forms.gle/Nxk8dQUPq4o4XsA47
    -------------------------------------------------------------------
    Data Engineering using is one of the highest-paid jobs of today.
    It is going to remain in the top IT skills forever.
    Are you in database development, data warehousing, ETL tools, data analysis, SQL, PL/QL development?
    I have a well-crafted success path for you.
    I will help you get prepared for the data engineer and solution architect role depending on your profile and experience.
    We created a course that takes you deep into core data engineering technology and masters it.
    If you are a working professional:
    1. Aspiring to become a data engineer.
    2. Change your career to data engineering.
    3. Grow your data engineering career.
    4. Get Databricks Spark Certification.
    5. Crack the Spark Data Engineering interviews.
    ScholarNest is offering a one-stop integrated Learning Path.
    The course is open for registration.
    The course delivers an example-driven approach and project-based learning.
    You will be practicing the skills using MCQ, Coding Exercises, and Capstone Projects.
    The course comes with the following integrated services.
    1. Technical support and Doubt Clarification
    2. Live Project Discussion
    3. Resume Building
    4. Interview Preparation
    5. Mock Interviews
    Course Duration: 6 Months
    Course Prerequisite: Programming and SQL Knowledge
    Target Audience: Working Professionals
    Batch start: Registration Started
    Fill out the below form for more details and course inquiries.
    forms.gle/Nxk8dQUPq4o4XsA47
    --------------------------------------------------------------------------
    Learn more at www.scholarnest.com/
    Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
    ========================================================
    SPARK COURSES
    -----------------------------
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/s...
    www.scholarnest.com/courses/d...
    KAFKA COURSES
    --------------------------------
    www.scholarnest.com/courses/a...
    www.scholarnest.com/courses/k...
    www.scholarnest.com/courses/s...
    AWS CLOUD
    ------------------------
    www.scholarnest.com/courses/a...
    www.scholarnest.com/courses/a...
    PYTHON
    ------------------
    www.scholarnest.com/courses/p...
    ========================================
    We are also available on the Udemy Platform
    Check out the below link for our Courses on Udemy
    www.learningjournal.guru/cour...
    =======================================
    You can also find us on Oreilly Learning
    www.oreilly.com/library/view/...
    www.oreilly.com/videos/apache...
    www.oreilly.com/videos/kafka-...
    www.oreilly.com/videos/spark-...
    www.oreilly.com/videos/spark-...
    www.oreilly.com/videos/apache...
    www.oreilly.com/videos/real-t...
    www.oreilly.com/videos/real-t...
    =========================================
    Follow us on Social Media
    / scholarnest
    / scholarnesttechnologies
    / scholarnest
    / scholarnest
    github.com/ScholarNest
    github.com/learningJournal/
    ========================================

Комментарии • 56

  • @ScholarNest
    @ScholarNest  3 года назад

    Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
    www.learningjournal.guru/courses/

  • @nawaz4321
    @nawaz4321 5 лет назад

    very nicely explained, big thank you

  • @DineshKumar-by4sk
    @DineshKumar-by4sk 7 лет назад

    Excellent and crisp explaination.

  • @MaheshSingh-ev8yh
    @MaheshSingh-ev8yh 4 года назад

    Hi Sir,
    really become a big fan of you. The way u r explaining each concept, r up to the marks 5/5. Short videos and u r categorizing them. Are r excellent.I was not expecting this when i got u r link. I was looking kafka with c# for micro-services but ur videos have given me a lot clear idea about it.

  • @praveenkumar-oy5zt
    @praveenkumar-oy5zt 5 лет назад

    your way of teaching is awesome..

  • @VaibhavPatil-rx7pc
    @VaibhavPatil-rx7pc 3 года назад

    Excellent explained !! thanks you !!

  • @yog2915
    @yog2915 4 года назад

    very nice cleared alot of things

  • @vitinho0610
    @vitinho0610 4 года назад

    Hey sir,
    Thank you once again for your excellent tutorials!
    I may have have one doubt:
    1 - If this consumer dies, will Kafka redistribute the TSS partitions for other consumers? if so, how will the other consumers know where the commited offset stands?

  • @lytung1532
    @lytung1532 Год назад

    Thanks for this tutorial. I am a fan of yours in Udemy.

  • @gopinathGopiRebel
    @gopinathGopiRebel 7 лет назад +1

    how do we know how many partitions to assign to a particular topic ?
    what is the default length of partitioner in kafka ?

  • @akhilanandbenkalvenkanna5057
    @akhilanandbenkalvenkanna5057 7 лет назад

    Do we use MYSQL db in real time project as well?? Are there any performance issues with using relational DB??

  • @gauravluthra7959
    @gauravluthra7959 6 лет назад

    Great Explanation. One doubt, Suppose I want exactly once processing and the consumer is of same type as you wrote in this example, where I write data and offset in database with single commit. But I want to use a group of consumer instead of only one consumer. Then how will it do exactly once processing? (My doubt, if we have three consumers and C0 is reading from P0 and C1 is from P1 and C2 from P2. Then if C0 gets down/killed, and never run again. Then data from P0 will never get read. Can we solve this problem with exactly once?)

  • @SauravOjha94
    @SauravOjha94 4 года назад

    Hi Sir. Excellent explanation. Just one doubt, since this is not a case of auto commit, don't you think we have forgotten to commit the offset to kafka?

  • @neil3507
    @neil3507 6 лет назад +2

    Is this a way to achieve exactly once semantics in kafka?

  • @cellisisimo
    @cellisisimo 7 лет назад

    Excellent video!! What if, after updating the first table with data, the consumer fails before updating tables with offsets. In this case, the same data will be processed twice, won´t it?

    • @ScholarNest
      @ScholarNest  7 лет назад +2

      No, The data in the table is not permanent until we execute commit. The commit is the last statement after insert and update both.

  • @max9260712
    @max9260712 3 года назад

    Thank you for your detailed videos. I am new to the channel and hope to come here more often. I have a bit of difficulty understanding the problem statement here , If you could please help
    at 5:11 where you are explaining how storing into the DB and adding offSet to rebalanceListener are not atomic and this is problem. If the consumer crashes , lets say just after storing into the database, then even if the RebalanceListner is triggered it is unable to commit this particular offset ( The record just stored in DB) to Kafka. Reason being our method call .addOffset did not occur. Is my understanding correct?

  • @Prabhatkumardiwaker
    @Prabhatkumardiwaker 5 лет назад

    Hi,I have one question. Why did consumer application consumed 10 records in 2 diff polls. i.e. 6 in 1st poll and 4 records in 2nd poll. It could have got all 10 records in 1 poll as message were already available in topic.
    Thanks in Advance

  • @theashwin007
    @theashwin007 7 лет назад +1

    Hi I have one doubt. Consider there are two different groups of consumers. And say both the groups subscribed to same topic. Now, how does the Kafka stores these offsets (commit offset & read offset)? I mean whether it stores it per consumer group?

    • @ScholarNest
      @ScholarNest  7 лет назад +1

      Kafka maintains current offset & Committed offset per consumer. However, rebalace happens at consumer group level.

  • @reachmurugeshanm7750
    @reachmurugeshanm7750 3 года назад

    Hi Sir.. I have one doubt,.. You have explained in this video one consumer with multiple custom partitions but if my requirement is multiple consumer with multiple custom partitions, in this case wht would be the code snippet.. And if one consumer crashes when process the message, how partitions takes away from consumer1 and assign to consumer 2.
    Do we need to handle any exception when consumer crashes?

  • @madhuthakur2523
    @madhuthakur2523 5 лет назад +2

    This will make consumption super slow

    • @ScholarNest
      @ScholarNest  5 лет назад +2

      This method is obsolete. Kafka streams has got better options.

    • @humanGenAI
      @humanGenAI 2 года назад

      @@ScholarNest any link?

  • @rbsood
    @rbsood 4 года назад

    hi Learning Journal - Have a questoin ? I have a kafka log retention policy based on size. So if the size is 1 gig kafka will delete the log. How can i make sure that kafka does not delete the log if Consumer has not finished reading all messages ? In other words kafka should delete the log only when consumer's current offset is same as the latest offset in the log. Does kafka do this automatically or is there some manipulation thats needed ?

  • @lonelybard19
    @lonelybard19 7 лет назад

    Hi. In this example you didn't have parallel processing because one single consumer assigned the 3 partitions to itself. How would I achieve "exactly once" processing in a scenario with multiple consumers? I could give each consumer an ID and have a table in the external database to store which partitions should be assigned to each consumer, but then I would have to perform the rebalance myself, which could be some hard work :(

    • @AmitITpartner
      @AmitITpartner 6 лет назад

      Answer to your question "How would I achieve "exactly once" processing in a scenario with multiple consumers? "is by implementing multiple consumer within a consumer group. Advantage of this is unique data fetch by each consumer. Hope this helps.

  • @HollyJollyTolly
    @HollyJollyTolly 7 лет назад

    Hi sir,
    What is the difference between high level consumer and low level cnsumer

    • @ScholarNest
      @ScholarNest  7 лет назад

      That's an outdated concept. Old Kafka API used to have high-level consumer, but new Kafka API doesn't have such concept. I cover new API since the old one is not supported now.

  • @JoaoGomes-ff2pz
    @JoaoGomes-ff2pz 7 лет назад

    There is no Rebalance Listener in this example.
    What happens if you more than one costumer like via subscribe, one of them received 100 records and after processing and saving 50 records a rebalacing is initialized? The offsets in kafka in will be stored as the actual commited offsets and the next consumer assigned to that partition will receive the data from the beginning?

    • @ScholarNest
      @ScholarNest  7 лет назад +1

      Good question. When we are not using Automatic group management (Like in this example), There is no rebalance activity. Kafka can't rebalance because there is no group in this case.

    • @JoaoGomes-ff2pz
      @JoaoGomes-ff2pz 7 лет назад

      Oh cool ! didn't notice that you aren't using any group. Thank u !

  • @kumarvairakkannu360
    @kumarvairakkannu360 7 лет назад

    on poll() first time 6 records, second time 5 records, etc..- Curious how Kafka decides how many records to pull? default max.poll.records=2147483647, is it random below the max poll limit?

    • @ScholarNest
      @ScholarNest  7 лет назад +2

      The poll method will try to give you as many as it can within the various limits specified by you. The max.poll.records is one of them (default 500). The timeout parameter passed to poll method is another such limit.

  • @hugodeiro
    @hugodeiro 5 лет назад

    Very good. But it would be nice if you provide the code in somewhere like Github...

    • @ScholarNest
      @ScholarNest  5 лет назад +1

      It is already there in github
      github.com/LearningJournal/ApacheKafkaTutorials

  • @4ukcs2004
    @4ukcs2004 6 лет назад

    Great video.Sir I need a reply. I have a kafka topic which contain jobname filed.using consumer when I read the topic with jobname those jobnames should get triggered and start running.it looks to be event triggering or event driven.any link or snippet would help.How do I take care this part.Pls help

  • @somethingbig8072
    @somethingbig8072 7 лет назад

    how to send different data to different consumer from single topic

    • @ScholarNest
      @ScholarNest  7 лет назад

      The answer to your question is in the videos. Watch the full playlist.

  • @singhsankar
    @singhsankar 6 лет назад

    where do we commit kafka processed message? , we do commit only mysql(db) connection.

    • @ScholarNest
      @ScholarNest  6 лет назад

      The idea is to make a single transaction to commit after processing the message and the offset number.

  • @glt123
    @glt123 7 лет назад

    Can producer send messages during the rebalance is happening ? Or the Kafka Producer will get exception during the rebalancing process...

    • @ScholarNest
      @ScholarNest  7 лет назад

      Rebalance is an activity for the consumer group. It has nothing to do with a producer.

    • @glt123
      @glt123 7 лет назад

      Okay... When a new partition is added to a topic then how does Producer starts sending the message to new partition?

    •  7 лет назад

      I don't think you can add a partition in "real-time". You have to specify them when you create the topic.

  • @KajalSingh-og7fk
    @KajalSingh-og7fk 3 года назад

    why is setAutommmit to false.. it should be true right? Am I missing something?

  • @robind999
    @robind999 7 лет назад

    HI LJ,
    I struggled with kafka-mongodb-sink connector setups,
    github.com/startappdev/kafka-connect-mongodb
    Seemed it needs curl to convert mongodb configuration file(json file) to xml(need add header too). ... need modify httpd.config file to open port and still could not upload file through curl on the localhost etc.
    By watching your demo, the process is fully monitored, if I use this kafka connector, I just dont know how to monitor my process, especially the partition part.
    so question to you, instead of using kafka-mongodb-sink connector, can I use your similar code to sink kafka-mongodb?
    please advise, yours is the most advanced detailed kafka demo so far.
    Thanks,
    Robin

    • @ScholarNest
      @ScholarNest  7 лет назад

      You can always write your own code to sink. However, it may be convenient to use a connector. Unfortunately, there is no certified connector for MongoDB yet. Check this link www.confluent.io/product/connectors/
      There are 4 Mongo DB Sinks listed. I never tried any of them, but you can give it a try. One of them should be mature enough.

    • @robind999
      @robind999 7 лет назад

      Thank you so much for your quick feedback,
      I just found a spark code to sink data to mongodb, since you told me there is no certified connector for mongodb yet, so I will give this a try as following:
      rklicksolutions.wordpress.com/2017/04/04/read-data-from-kafka-stream-and-store-it-in-to-mongodb/
      how you think about this link?
      Confluent involved another tool installation, and I still dont find use case of this. only find one to pull out data from mongodb to kafka.
      thank you so much my mentor.
      Robin

  • @sujeeshsvalath
    @sujeeshsvalath 6 лет назад

    "exactly once" processing have been incorporated now built in starting from Kafka 0.11 version. The concept is the same explained in this video. Please refer www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ to enable "exactly once" processing in Kafka

  • @reachmurugeshanm7750
    @reachmurugeshanm7750 3 года назад

    I am a big fan of you Sir, the way of your explanation is awesome.
    Could you pls share with me your mail id for cpmmunication and clarify my doubts.

  • @learn9475
    @learn9475 Год назад

    please check if kafka-streams, kafka-transactions solve your issues
    since they have been release in 2017 NOV