Stream your PostgreSQL changes into Kafka with Debezium

Поделиться
HTML-код
  • Опубликовано: 18 сен 2024
  • Watch your Postgres changes stream into Kafka in realtime using Debezium! End to end example of CDC from Postgres all the way into Kafka in realtime.
    In this video we go over a tutorial where we stream PostgreSQL changes into Kafka using the Debezium Connector. We go over the docker containers necessary and we demonstrate end to end how the whole thing works.
    This is a real life example of how CDC works with Postgres & Kafka.
    Code: github.com/irt...
    Theory of CDC: • Change Data Capture (C...
    #kafka #postgres #streaming #realtime #debezium #cdc #systemDesign #tutorial #programming
    Visit me at: irtizahafiz.com?
    Contact me at: irtizahafiz9@gmail.com

Комментарии • 137

  • @manojjoshi4321
    @manojjoshi4321 Месяц назад

    This is an extremely rich and high quality video. I stumbled upon this by chance when I was searching for some info about Debezium for some project we might be interested in. Interestingly enough, the files provided in the video description did not quite work for MAC and it took about 2 days to make the required adjustments including getting the correct images and port configuration, network configuration etc. but in the end things worked. My experience was that I learned a lot as I went through the process, not just about Debezium but a great deal of other things on what docker-compose structure, network dependency, port issues and all that. So it was worth. Kudos to Irtiza for taking all efforts to put together such as amazing tutorial....!!

  • @intruderstube
    @intruderstube 2 года назад +7

    Thanks, it is helpful. "Weird" thing with psql shell is just because of missing semicolon at the end.

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Ahh I see! That’s a good callout. I swear it used to work before without the semicolon :/
      Glad you found it helpful regardless : )

  • @bnssoftware3292
    @bnssoftware3292 9 месяцев назад +2

    Really like your teaching style. Thank you.

    • @irtizahafiz
      @irtizahafiz  9 месяцев назад +1

      Thanks for watching!

  • @vjnt1star
    @vjnt1star 3 месяца назад +1

    Very good presentation short and to the point. I got the information I needed. Thank you

  • @chrisschuck3316
    @chrisschuck3316 Год назад +2

    Awesome concise explanation. Appreciate your work!

    • @irtizahafiz
      @irtizahafiz  11 месяцев назад

      Thank you! I will start posting again soon, so please let me know what type of content interests you the most.

  • @morph-87
    @morph-87 4 месяца назад

    Thanks for the comprehensive explanation.

  • @SpiritOfIndiaaa
    @SpiritOfIndiaaa Год назад

    Thanks a lot , simple and straight forward and clear

  • @martingonzalez9298
    @martingonzalez9298 2 года назад +4

    I have a question, when you did the update the event on the kafka topic had a '{"before": null, ' why is that null if the row already had information before ?

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! That's a very good observation. I was actually running into that intermittently. Like 8/10 updates were working properly, but for 2/10 the before was NULL for some reason. Resetting the WAL of the database fixed it every time, but I am not sure why it happens in the first place.

    • @iuriivdovin731
      @iuriivdovin731 2 года назад +2

      @@irtizahafiz Hi! Before is null because you missed semicolon ';' after ALTER TABLE REPLICA IDENTITY FULL

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад

    Just what i was searching for👏

  • @geneschmidt9337
    @geneschmidt9337 Год назад

    THIS WAS GREAT! Thank you!

  • @r-rtz
    @r-rtz Год назад +1

    Your 'Trevor' update is showing up in Kafka not as an update, but as a new record ("before": null)

  • @CuriouslyContent
    @CuriouslyContent Год назад +3

    Any idea why the last update had a 'before' value of 'null'? Since the last thing was an update, shouldn't the 'before' have the values as they were before the update (id:2, name:mike)?

    • @bulgakovwork2022
      @bulgakovwork2022 Год назад

      same question

    • @irtizahafiz
      @irtizahafiz  Год назад

      Hi! Thank you for bringing this up. I think I answered this in another comment.
      I was running into this bug intermittently. I think it had to do with some misconfiguration of the postgres cluster.

    • @manojjoshi4321
      @manojjoshi4321 Месяц назад

      I had the same issue but resolved it quickly. Make sure you use - alter table replica identity full.

  • @rajan-390
    @rajan-390 Год назад

    Explain really well

    • @irtizahafiz
      @irtizahafiz  11 месяцев назад

      Thank you! I will start posting again soon, so please let me know what type of content interests you the most.

  • @38vbharat
    @38vbharat 2 года назад

    Nice explanation ! Thanks

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Glad you found it helpful : )

  • @sushantsinha4038
    @sushantsinha4038 2 года назад +1

    Thanks ! Clear Explanation in this video . Do you have PostgreSQL Kafka Sink connector tutorial similar to this ?

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! Unfortunately, I don't! I do plan on getting to it at some point though.
      Meanwhile, I am doing most of these data pipeline stuff using AWS. So if you are interested, please check out those videos.

  • @sanjaymadhavan1893
    @sanjaymadhavan1893 Год назад

    How to deal with the concept of foreign key in debezium?
    Great video BTW! Life saviour

  • @devamzanzmera2231
    @devamzanzmera2231 10 месяцев назад

    It's very informative

  • @alphonsetaylor2788
    @alphonsetaylor2788 Год назад

    Really helpful ! thanks

  • @mmfStudent
    @mmfStudent 2 месяца назад

    Why after the sql update command, event has `before: null` in Kafka?

  • @udaynj
    @udaynj 8 месяцев назад

    Nice video. Like your teaching style. Quick question - How stable is Debezium if you want to run this at scale in production?

    • @irtizahafiz
      @irtizahafiz  8 месяцев назад +1

      Many companies use it at scale.

  • @AkshitBansal-fk9gt
    @AkshitBansal-fk9gt 5 месяцев назад

    nice tutorial. you missed ; in some commands, that's why the output was not visible.

  • @PS-ff1tq
    @PS-ff1tq 2 года назад +1

    Great introduction 👍,I have a question ,how to consume the data into a target database or consumer the data from topic using kafka python or any other api ?

    • @irtizahafiz
      @irtizahafiz  2 года назад +1

      For consuming data into a target database, you should be able to do it with a Sink Connector of your respective database. Almost all Sink Connectors should be able to read from a Kafka topic.
      As for consuming Kafka using Python, I have an end to end example of that here in this video. I think you will find it helpful: ruclips.net/video/qi7uR3ItaOY/видео.html

  • @mariemhajjem
    @mariemhajjem 2 года назад +1

    Thanks a lot!

    • @irtizahafiz
      @irtizahafiz  2 года назад +2

      Glad it was helpful. I added the idea to my “future videos” list. The next few weeks I will be focusing on the System Design course, so unfortunately it will take some time :(

  • @KshitijKohli-h8k
    @KshitijKohli-h8k Год назад

    Awesome tutorial! I wanted to understand 1 piece more. There are 2 patterns which I am now aware of of streaming CDC to Kafka. 1 via debezium connector that you talked about, other is via the Outbox pattern where the application service commits to an Outbox table in the same commit it writes to other application tables, post which tools like Ceres can stream the data to Kafka. What are the core differences in these 2 approaches and is there a recommendation of one over the other?

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      I am actually not familiar with Outbox :(

    • @JoseGuerra-qc5hq
      @JoseGuerra-qc5hq 9 месяцев назад

      Typically the outbox pattern is to guarantee once delivery of the message, it's not to perform cdc (imagine a critical service that you must guarantee message delivery). Your application does not send messages to a kafka topic but instead send to a db table. The outbox service then is responsible to see which messages in your table (outbox) have been successfully published to kafka and retries for the ones who haven't.There is a chance that the outbox publishes a event more than once, so your consumers must be idempotent.
      From the consumers side you may have what would be called an inbox pattern, which blocks the same message to be consumer more than once, or they are okay with receiving the same message more than once.
      What you may have is CDC with Debezium but instead of publishing directly to kafka it publishes the events to a DB table via outbox.

  • @mukeshkumar-il3ct
    @mukeshkumar-il3ct Год назад

    please help me I'm getting the error '% ERROR: Invalid token 'k' in key pack-format, see -s usage."

  • @koustavghosh802
    @koustavghosh802 Год назад

    Thanks for the explanation. Can you please tell how can we store the result of kafka connector which we are getting from the data base?

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      I don't think I follow the question.
      If you mean the CDC data from kafka, you can route it to whatever you want to - a different DB, application logic, cache, etc.

  • @vijeandran
    @vijeandran 2 года назад

    Hi, Thanks for the video... Whenever I insert or update the data in postgres, those changes are not showing in the docker run command

    • @irtizahafiz
      @irtizahafiz  2 года назад

      I would recommend restarting Postgres after you change its configuration.

    • @vijeandran
      @vijeandran 2 года назад

      @@irtizahafiz Thanks you Sir... Now it is working as expected....

  • @kurshy
    @kurshy 5 месяцев назад

    when i run the command to fire up dbz, i get "empty reply from server". kindly assist

  • @edisonngizwenayo5752
    @edisonngizwenayo5752 Год назад

    Great explanations, just have few questions. The first one is how the configuration should be made in debezium.json file if the the source is an API other than Database. The last one, how to allow debezium to listen to change from multiple database tables. Thanks

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      I can consider making future videos on these. Thank you!

  • @uma_mataji
    @uma_mataji 2 года назад

    sorry when I ran the docker-compose command it just keeps on spinning forever ...it doesnot work.

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! Sorry it's difficult for me to help without some more context.

  • @SoumilShah
    @SoumilShah Год назад

    Hello i need some help i was able to implement everything you showed in video
    i cannot consume messages from my local kafka-python can you please help ?

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Hi! It's difficult to help without more details.

  • @RJ-wj7lc
    @RJ-wj7lc Год назад

    Nice video, do you have an example for Spring boot with Apache kafka and Debezium connector(MySQL, MS SQL Server)?

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Unfortunately, I do not. Maybe something in the future.

  • @RakantaRifky24
    @RakantaRifky24 7 месяцев назад

    If I want Yugabytedb postgresql for doing this migration, is it using same way like postgresql?

    • @irtizahafiz
      @irtizahafiz  6 месяцев назад

      It should be similar as long as there's a connector available.

  • @吳信宏-z7i
    @吳信宏-z7i 2 года назад

    Great introduction. Could you share your scripts in your video?

    • @irtizahafiz
      @irtizahafiz  2 года назад +2

      Hi! I can share as much as my iterm command history allows me. If you need anything in particular, let me know.
      curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" 127.0.0.1:8083/connectors/ --data "@debezium.json"
      docker run --tty --network postgres_debezium_cdc_default confluentinc/cp-kafkacat kafkacat -b kafka:9092 \
      -C -s key=s -s value=avro -r schema-registry:8081 -t postgres.public.student

  • @pallavi1388
    @pallavi1388 2 года назад

    Hi I am trying to add skipped.operations = "c" in debezium.json but its still showing create/update/delete?

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! Unfortunately, I don't think I can be too helpful when it comes to debezium specifics.

  • @pyramidofmerlinii4368
    @pyramidofmerlinii4368 Год назад

    If we have airflow in docker-compose, we don't need Debezium, right?

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      I think you can do a lot with Airflow, but not sure about the specifics.

  • @christianangelomsulit3759
    @christianangelomsulit3759 Год назад

    table.include.list: can this accept array or list separated by ,?

  • @shubhamkapoor1123
    @shubhamkapoor1123 Год назад

    Hi,
    I am getting this error when running kafkacat command
    ERROR: Failed to format message in postgres.public.student [0] at offset 0: Avro/Schema-registry message deserialization: REST request failed (code -1): HTTP request failed: URL using bad/illegal format or missing URL : terminating
    Please tell the next step how to solve this.
    thanks!

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Not sure without more details. Can you provide with examples of your payload?

  • @bnssoftware3292
    @bnssoftware3292 9 месяцев назад

    How come on the update it doesn't show the before data?

    • @irtizahafiz
      @irtizahafiz  9 месяцев назад

      That was happening intermittently. I believe there are some discussions about this in the comments section.

  • @AnimeZone247
    @AnimeZone247 4 месяца назад

    Is it possible to listen to join queries?

  • @AnkitKumar25
    @AnkitKumar25 Год назад

    I'm getting this error after running the final kafkacat command. I'm using windows to run this. I checked in both cmd and powershell.
    Error: file/topic list only allowed in producer(-P)/kafkaconsumer(-G) mode

    • @AnkitKumar25
      @AnkitKumar25 Год назад

      docker run --tty --network bin_default confluentinc/cp-kafkacat kafkacat -b kafka:9092 -C -s key=s -s value=avro -r schema-registry:8081 -t postgres.public.student
      this command works for me now.

  • @fkaanoz
    @fkaanoz Год назад

    I guess "ALTER TABLE ..." part didn't work, due to lack of semi column at the end. @4.20

  • @vilius518
    @vilius518 Год назад

    Thats weird, messages that end up in kafka topic are not in json, do you have clue why? Im using kafdrop for inspecting topics. Could it be that your kafkacat command also parses messages to json?

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Potentially. I know that the Kafkacat command has lots of "quality of life improvements" baked in.

  • @Bhavesh_Bhuvad
    @Bhavesh_Bhuvad Год назад

    docker: Error response from daemon: network postgres_debezium_cdc_master_default not found. i have getting this error on docker run -tty .....command.plz help

    • @irtizahafiz
      @irtizahafiz  Год назад

      Make sure you define the network in the docker-compose file.

  • @HikarusVibrator
    @HikarusVibrator 2 года назад

    Is there any way at all to do a major version DB upgrade without manually stopping writes, then killing the Debezium connector, etc. ? For a microservice architecture it really is quite costly to have such prolonged downtimes whenever a major upgrade is done. I don’t see a solution anywhere

    • @irtizahafiz
      @irtizahafiz  2 года назад

      I don’t have any experience with it :( So unfortunately I won’t be of much help here sorry.

    • @arnavdman
      @arnavdman Год назад

      I dont think so, the process of doing a major version upgrade of a database typically requires some level of downtime. But if you can setup a replica and divert the traffic there for the time being it might be a solution

  • @nurulasyikin1895
    @nurulasyikin1895 2 года назад

    Is it possible to implement this without using docker environment? If it's possible can you demonstrate how to perform that?

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! Yes it is possible! I might be able to demonstrate that some time in the future, but it’s currently not in my plans unfortunately.

  • @prasadkintali6560
    @prasadkintali6560 2 года назад

    Good one brother. Do you have a video to write to a source database from this Kafka source topic?

    • @irtizahafiz
      @irtizahafiz  2 года назад +1

      I don’t have a video yet, but I do plan on making one soon. : )

    • @neeraj2626
      @neeraj2626 8 месяцев назад

      @@irtizahafiz Did you happen to create one?

  • @inphosoftindonesia3694
    @inphosoftindonesia3694 2 года назад

    hi, i wonder, how about delete, does debezium also support delete?

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Yes. It supports delete too.

  • @brucewen9326
    @brucewen9326 11 месяцев назад

    In PostgreSQL, we often forgot to append ";" at the end of the SQL statement 🤪

    • @irtizahafiz
      @irtizahafiz  11 месяцев назад

      Thank you for the correction! I will start posting again soon, so please let me know what type of content interests you the most.

  • @ansarhayat6276
    @ansarhayat6276 2 года назад

    Hi Irtiza,
    any stuff related SQL server source and sink connectors

    • @irtizahafiz
      @irtizahafiz  2 года назад +1

      I do have a few SQL videos coming up soon.

  • @vaibhavpandey5615
    @vaibhavpandey5615 2 года назад

    Great man ,could you please give me docker commond to start consumer on topic!

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Unfortunately I don't have the code for this anymore :(

  • @pammyarw
    @pammyarw Год назад

    is there any way I can provide a topic name to which it should be published?

  • @vj_fantasy4298
    @vj_fantasy4298 Год назад

    Hello actually I'm doing this currently and I'm stuck at 8:20 so can anyone please tell me what is --network and also what is postgres_debezium_default it's saying it doesn't exist

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Depends on the app you are building. I tried keeping things generic here. The ranking can be based on some score generated by a machine learning model, etc.

  • @parteeks9012
    @parteeks9012 Год назад

    Thanks, it helped me a lot but the weird thing is you updated the data in the database in the end, and still Kafka "before key" is null. Anyone has any thoughts on this?😁

    • @irtizahafiz
      @irtizahafiz  Год назад

      That has to do with some kind of Postgres configuration. I remember fixing it immediately after the video, but can't remember :(

    • @rssaini138
      @rssaini138 Год назад

      @@irtizahafiz It is working after below command, because in video it was not executed because you've missed semicolon.
      ALTER TABLE public.student REPLICA IDENTITY FULL;

  • @rahadianpramono4050
    @rahadianpramono4050 2 года назад

    hi need your help, i try to run docker run --tty --network postgres_debezium_cdc_default but it showing ERROR: Failed to query metadata for topic postgres.public.student: Local: Broker transport failure, please help, thank you

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Are you sure the Kafka topic was created successfully? I would recommend listing the available topics in the broker first.

    • @rahadianpramono4050
      @rahadianpramono4050 2 года назад

      @@irtizahafiz I'm not sure that kafka topic created successfully because I followed all the instructions in your video :D, how to listing the available topics in the broker? thank you

    • @irtizahafiz
      @irtizahafiz  2 года назад

      You can run something like this in your CLI to list the Kafka topics:
      ./usr/bin/kafka-topics --zookeeper zookeeper:2181 --list

    • @valentinadevanna9448
      @valentinadevanna9448 2 года назад

      @@irtizahafiz i have the same problem and the topic postgres.public.student exists

    • @valentinadevanna9448
      @valentinadevanna9448 2 года назад

      it works!! It is sufficient runs the command two or threee times

  • @dempile
    @dempile 2 года назад +1

    Even with the update , you get null ! ! ! !

    • @irtizahafiz
      @irtizahafiz  2 года назад

      I know :( I was getting null every now and then. I believe it was because my Postgres docker container wasn't retaining the WAL setting every time I exited out of it.

  • @ranisari3110
    @ranisari3110 2 года назад

    please make video postgresql -> debezium -> pubsub, thank you

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! I do plan on doing that in the near future. After the system design videos that is.

  • @darshankadiya2824
    @darshankadiya2824 2 года назад

    can we use same using airflow

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Yup for sure! Products like AirFlow make data engineering super easy by abstracting away all these.

  • @GordonShamway1984
    @GordonShamway1984 Год назад

    delete would be nice to see

  • @bjugdbjk
    @bjugdbjk 2 года назад

    I am trying to build a project on top of this tutoral , got stuck below, could you help :)
    I ran the compose file and i can see from docker desktop all containers are up.
    From the go code i was trying to communicate with kafka to create a topic, throws me below error, could you help
    "panic: failed to dial: failed to open connection to kafka:9092: dial tcp: lookup kafka: no such host"
    Go code :
    conn, err := kafka.DialLeader(context.Background(), "tcp", "localhost:9092", "my-topic", 0)
    if err != nil {
    panic(err.Error())
    }
    conn.SetReadDeadline(time.Now().Add(10 * time.Second))

    • @irtizahafiz
      @irtizahafiz  Год назад +1

      Hi! I am not really familiar with the project.
      Just make sure both the Go project and your kafka containers are running on the same network.

  • @zaidahmed4254
    @zaidahmed4254 2 года назад

    hi need your help, i try to run docker run --tty --network debezium_postgres_cdc_default(my app name start with debezium_postgres_cdc_default) but it showing ERROR: file/topic list only allowed in producer(-P)/kafkaconsumer(-G)

    • @prasadkintali6560
      @prasadkintali6560 2 года назад

      I have faced the same issue in windows command prompt, but it works in powershell.

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Are you trying it out in Windows or Mac or Linux?

    • @robinthomas7350
      @robinthomas7350 Год назад

      @@prasadkintali6560 i am having same issue in both command prompt and powershell.i am using windows machine. Can anyone help

  • @atharvadeokar2814
    @atharvadeokar2814 Год назад

    Hi Irtiaz
    I am facing issue - ERROR: Failed to query metadata for topic postgres.public.student: Local: Broker transport failure
    ./usr/bin/kafka-topics --zookeeper zookeeper:2181 --list
    This gives ->
    -bash: ./usr/bin/kafka-topics: No such file or directory
    I will really appreciate you if you could help me this issue
    Thankyou!

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Have you created the Kafka topics successfully? You have to register the topic first, before you can consume from it.

  • @srini1431
    @srini1431 2 года назад

    Hi, I'm getting unable to find image 'confluentinc/cp-kafka:latest locally. How to resolve this please help

    • @srini1431
      @srini1431 2 года назад

      Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "-b": executable file not found in $PATH: unknown

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Hi! I can look at this later on in the week when I have some time.

  • @venkateshajjs5014
    @venkateshajjs5014 2 года назад +1

    docker: Error response from daemon: network postgres_debezium_cdc_default not found. having this error while run this command : docker run --tty --network postgres_debezium_cdc_default confluentinc/cp-kafkacat -b kafka:9092 -C -s key=s value=avro -r schema-registry:8001 -t postgres.public.student ...

    • @venkateshajjs5014
      @venkateshajjs5014 2 года назад

      please help me .. --network postgres_debezium_cdc_default denotes what?

    • @venkateshajjs5014
      @venkateshajjs5014 2 года назад

      please i need help...

    • @iuriivdovin731
      @iuriivdovin731 2 года назад

      That's because the network has another name. Print in the command line "docker network ls" and find the network in the list. In my case it was "debezium-default".

    • @venkateshajjs5014
      @venkateshajjs5014 2 года назад

      @@iuriivdovin731 thanks bro

    • @irtizahafiz
      @irtizahafiz  2 года назад

      Thanks for helping out! : )

  • @christopherma5445
    @christopherma5445 Год назад

    I got the following error :% ERROR: Topic postgres.public.student error: Broker: Leader not available
    after running docker run --tty --network postgres_debezium_cdc_default confluentinc/cp-kafkacat kafkacat -b kafka:9092 -C -s key=s -s value=avro -r schema-registry:8081 -t postgres.public.student

    • @irtizahafiz
      @irtizahafiz  10 месяцев назад

      Maybe try resetting Docker if it's running already?