How to do CDC using debezium, kafka and postgres

Поделиться
HTML-код
  • Опубликовано: 19 янв 2025

Комментарии • 41

  • @startdataengineering
    @startdataengineering  4 года назад +1

    Edit for 10:06 Kafka Broker is one server within the kafka cluster. There can be multiple brokers within the kafka cluster.

  • @chuckinator0
    @chuckinator0 2 года назад +3

    Nice example! My main feedback would be to use docker compose as opposed to a bunch of docker run statements

    • @uchepowers
      @uchepowers Год назад

      Does this your comment make you feel smarter?

  • @MrMikomi
    @MrMikomi 3 года назад +4

    This is very useful. Many thanks.

  • @RJ-wj7lc
    @RJ-wj7lc Год назад +1

    Nice video, do you have an example for Spring boot with Apache kafka and Debezium connector(MySQL, MS SQL Server)?

  • @mohamednabil982
    @mohamednabil982 8 месяцев назад

    Thank you for the tutorial. However, could you please add the article link used in this video? The attached link in the description seems to be redirecting to another blog.

  • @letme4u
    @letme4u 3 года назад +1

    Awesome learning

  • @projavadevelopers
    @projavadevelopers 2 года назад

    I'm facing a issue while starting consumer warning: using default BROKER_ID=1, which is valid only for non clustered installations. The ZOOKEEPER_CONNECT variable must be set or the container must be linked to that runs zookeeper

  • @AhmedMkhinini
    @AhmedMkhinini 2 года назад

    is there a way of using CDC with Rabbit instead of Kafka ?

  • @SasukeUchiha-nt1yw
    @SasukeUchiha-nt1yw 2 года назад

    I am able not fetch any messages using a consumer. It keeps waiting for a while but no response from consumer. Can anyone help?

  • @HemanthYTP
    @HemanthYTP Год назад

    Quick question here -
    Do I need to follow the same pattern here if I want to replicate the data from Postgres databse to snowflake data warehouse ?
    I would like to remove Kafka component and just write the changes into a log text file using debizium and then I will have custom python process for loading those tiny text files to S3 then to snowflake.
    Is this possible or Kafka integration is mandatory ?
    I am very new to debizium and Postgres integration , please help me if you can to answer above question ?
    THANKS!

    • @AppLattice
      @AppLattice Год назад

      Yes you need kafka. It goes like this: Postgres DB gets updated -> Debizium captures that change and sends all the information to the kafka message bus -> Your python application will be listening to kafka (consuming the kafka topic) for these messages, the take their contents and do whatever with them; in your case putting it into snowflake.

  • @MrMikomi
    @MrMikomi 3 года назад

    I'm coming back to this now and wondering, why are you using pgcli and not psql?

    • @startdataengineering
      @startdataengineering  3 года назад

      You can use psql as well. I used pgcli for its autocomplete and syntax highlighting.

    • @MrMikomi
      @MrMikomi 3 года назад

      @@startdataengineering okay thanks 👍

  • @pnkumar8141
    @pnkumar8141 4 года назад +1

    curl -H "Accept:application/json" localhost:8083/connectors/ curl: (7) Failed to connect to localhost port 8083: Connection refused
    getting above error in the docker, what is need to be done?

    • @startdataengineering
      @startdataengineering  4 года назад

      one of 2 things 1. wait for a few minutes before retrying, 2. check if the docker container is running using docker ps and look for a container named connect, if its not there then that means you have not started the container

  • @mitchbregs
    @mitchbregs 4 года назад +1

    What is your deployment strategy for a system as such?

    • @startdataengineering
      @startdataengineering  4 года назад +2

      1. postgres: this will be your application database so standard DB pattern
      2. Apache Kafka: this will be a cluster with data so this must be up always, You can do a rolling upgrade for Kafka
      3. Debezium: this is a kafka connector which runs natively on Kafka, so once registered the cluster will store the connector settings and metadata, hence these workers will be stateless.
      Hope this helps :)

  • @CrashLaker
    @CrashLaker 3 года назад

    is there a reliable kafka consumer from debezium itself to consume from the topic and write to a new db?

    • @startdataengineering
      @startdataengineering  3 года назад

      You can use kafka sink connectors to write to your db. The db has to support a JDBC connection, ref: docs.confluent.io/kafka-connect-jdbc/current/sink-connector/index.html

  • @haseebperspective
    @haseebperspective 2 года назад

    I am getting below error when i try to execute curl -H "Accept:application/json" localhost:8083/connectors/ curl: (7) Failed to connect to localhost port 8083: Connection refused
    curl: (3) [globbing] bad range in column 2

    • @colosus2811
      @colosus2811 2 года назад

      bro, i'm not sure but u should try to use your wlo or eth ip address

  • @johnleonardonwuzuruigbo6213
    @johnleonardonwuzuruigbo6213 4 года назад

    Is it possible to capture Alter table changes with debezium in postgres? If yes, how can I set it up. Because right now I can only stream DML changes and no DDL.

    • @startdataengineering
      @startdataengineering  4 года назад

      AFAIK it is not currently possible in pg. debezium.io/documentation/reference/1.3/connectors/postgresql.html
      There are some pg ddl log parsers which you will need to modify to capture the DDL changes eg) debezium.io/blog/2016/04/15/parsing-ddl/
      Another option is to just use a 3rd party tool like Fivetran that does this automatically for you, but as with everything you will need to weigh the tradeoffs. Hope this helps :)

  • @MrNiceseb
    @MrNiceseb 4 года назад

    What is the difference between this CDC and Kafka Stream? They are all fundamentally logging every single change?

    • @startdataengineering
      @startdataengineering  4 года назад +2

      CDC is the name of the software design pattern, where we create an event for every single change. Kafka is the tool used in CDC through which we transmit this change to another system.

    • @leparrain4374
      @leparrain4374 6 месяцев назад

      @@startdataengineering How can I transmit this change to another pg instances

  • @radjaputra7110
    @radjaputra7110 2 года назад

    Hello when want to start stream.py to holding.txt
    i got permission denied, pipeline broken. how can i fix that?

  • @shivasoumya
    @shivasoumya 4 года назад

    startdataengineering website is down can you please check

    • @startdataengineering
      @startdataengineering  4 года назад

      hmm weird, must have been a temporary outage. www.startdataengineering.com/ is currently running.

  • @richa6695
    @richa6695 3 года назад

    Can you select specific columns in debezium that you would like to push to Kafka topic instead of all the columns from the table? Also, what's the alternative to do this if you have to do the same thing with was services

    • @ShivamSingh-sm2oy
      @ShivamSingh-sm2oy 3 года назад +1

      yes you can by defining custom config for table's columns - for eg usecase -> blacklisting or masking some custom logic for some of the PII details.

  • @blessingeorgevarghese5992
    @blessingeorgevarghese5992 3 года назад

    Deletes aren't working. If I perform DELETE in MySQL, the same isn't being updated on Elasticsearch. Why could this be?

    • @CrashLaker
      @CrashLaker 3 года назад +1

      maybe it's related to that "replica identify FULL" mentioned in 9:20

  • @lhxperimental
    @lhxperimental 2 года назад

    The fact that you have to use debezium''s build is off putting. One should be able to apply changes to standard postgres container instead of using debezium provided blob.

  • @bluex217
    @bluex217 2 года назад

    I followed article precisely but always I am getting:
    "Error while fetching metadata with correlation id 10 : {bankserver1.bank.holding=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)"

  • @priyantynurulfatimah6408
    @priyantynurulfatimah6408 Месяц назад

    hi, here's the final curl that works for powershell.. but now the error is changing
    curl.exe -i -X POST localhost/connectors/ -H "Accept:application/json" -H "Content-Type:application/json" -d '{\"name\": \"sde-connector\", \"config\": {\"connector.class\": \"io.debezium.connector.postgresql.PostgresConnector\", \"database.dbname\": \"start_data_engineer\", \"database.hostname\": \"postgres\", \"database.password\": \"admin\", \"database.port\": \"5432\", \"database.server.name\": \"bankserver1\", \"database.user\": \"postgres\", \"table.whitelist\": \"bank.holding\"}}'

    • @priyantynurulfatimah6408
      @priyantynurulfatimah6408 Месяц назад

      my current error is {"error_code":500,"message":"Could not create PG connection"}

    • @priyantynurulfatimah6408
      @priyantynurulfatimah6408 Месяц назад

      turned out the real error is
      Caused by: org.postgresql.util.PSQLException: FATAL: database "start_data_engineer" does not exist