Thank you so much. I have seen a lot of videos & books already and this is the first time I understand and see all the strength and ease of kafka. Great work !!!
Great talk and walk-through. I am very new to Confluent platform and ksqlDB seems to be a great thing. I have one question about Confluent connect (following your example of looking up users details from MySQL store): how big that remote MySQL table can be and/or whether or not it is important at all if the join happens on the PK? My sense that it does not matter that much, am I correct?
You can join on other fields but would need to re-key the data in the Kafka topic first for the join to succeed - you can do this easily enough with ksqlDB though. In terms of size, in general ksqlDB can scale horizontally but for specifics I'd recommend testing it yourself. Also head to #ksqldb channel on cnfl.io/slack to discuss further.
@@rmoff following question regarding joining: when you create a table in ksqlDB(which is another topic from Kafka perspective) does ksqlDB handles indexes somehow? Is it a full fledged dB engine behind the scenes. And how it does fetch data from the real topic when stream with join is flowing and for each rating it needs to enrich data with user's name fetched from somewhere(my guess it doesn't do random acces on real Kafka topic). Or table creates completely new structure which is handled only by ksqlDB?
Why every one using conflunet kafka thsi and that, I wanted to do it in production and confluent kafka is not open source. Can anyone suggest any article or video to refer, I want to load csv or json file to kafka as a table.
what happens to the streams or tables if in case the ksqkdb or kafka connect cluster crashed ? if I restart the docker where im running the ksqldb streams or kafka connect will the streams starts from the where they left off ? are these any instance where you had too many streams and half of the crashes, how do you recover ?
I am using JDBC Connectors and receive `Key format: ¯\_(ツ)_/¯ - no data processed` although I have set `"key.converter": "org.apache.kafka.connect.storage.StringConverter"` in my connector. I dow see the full stream with key value null: `rowtime: 2021/05/31 08:33:38.411 Z, key: , value: {"id": 10, ...`. Do you have an idea what could went wrong / what typical issues may come up in that point?
I do have some questions: Kafka is using a key value storage for messages and have some great features for data persistency. But usual data streams should not be stored forever - right? I guess Kafka has an internal cleanup policy for removing old streams, specially if topics come to maximum of physical sizes. How does ksqlDB handles that kind of cleanup policy (if it's exists?) since we are using it for database purposes it should be available for lifetime. So my question: What is ksqlDB? Is it a kafka topic consumed always from offset earliest, is it comparable to a Redis key value storage, is it comparable to a MongoDB document storage, is it comparable to sql databases, ... ?
Great question. Kafka stores data based on the retention policy which can be configured per topic. You can retain data based on size, duration - or indeed forever (which is totally valid, see www.confluent.io/blog/publishing-apache-kafka-new-york-times/ and www.confluent.io/blog/okay-store-data-apache-kafka/). It's basically down to the use case for the data. You say "data streams should not be stored forever", but it depends on what that data is - there are plenty of examples where you *would* keep that data forever. There are also compacted topics, in which the latest value of a key is retained forever, whilst earlier values of the key are removed. In terms of ksqlDB itself, it is built on Kafka topics, so the same principles above apply. If you have more questions head over to forum.confluent.io/
Thank you so much.
I have seen a lot of videos & books already and this is the first time I understand and see all the strength and ease of kafka.
Great work !!!
Thanks, glad it helped!
This is really an awesome walkthrough .. Thank you!
Glad it was helpful!
Thanks for all videos!!
Glad you like them :)
Wonderful session, thanks friend.
Thanks :D
Thanks for sharing this.
Thanks a lot. It's crystal clear
Glad it helped :)
Thank you so much.
You're welcome!
Great talk and walk-through. I am very new to Confluent platform and ksqlDB seems to be a great thing. I have one question about Confluent connect (following your example of looking up users details from MySQL store): how big that remote MySQL table can be and/or whether or not it is important at all if the join happens on the PK? My sense that it does not matter that much, am I correct?
You can join on other fields but would need to re-key the data in the Kafka topic first for the join to succeed - you can do this easily enough with ksqlDB though. In terms of size, in general ksqlDB can scale horizontally but for specifics I'd recommend testing it yourself. Also head to #ksqldb channel on cnfl.io/slack to discuss further.
@@rmoff following question regarding joining: when you create a table in ksqlDB(which is another topic from Kafka perspective) does ksqlDB handles indexes somehow? Is it a full fledged dB engine behind the scenes. And how it does fetch data from the real topic when stream with join is flowing and for each rating it needs to enrich data with user's name fetched from somewhere(my guess it doesn't do random acces on real Kafka topic). Or table creates completely new structure which is handled only by ksqlDB?
Why every one using conflunet kafka thsi and that, I wanted to do it in production and confluent kafka is not open source.
Can anyone suggest any article or video to refer, I want to load csv or json file to kafka as a table.
what happens to the streams or tables if in case the ksqkdb or kafka connect cluster crashed ?
if I restart the docker where im running the ksqldb streams or kafka connect will the streams starts from the where they left off ?
are these any instance where you had too many streams and half of the crashes, how do you recover ?
Please post this over at forum.confluent.io/ and I will try to answer it there
I am using JDBC Connectors and receive `Key format: ¯\_(ツ)_/¯ - no data processed` although I have set `"key.converter": "org.apache.kafka.connect.storage.StringConverter"` in my connector. I dow see the full stream with key value null: `rowtime: 2021/05/31 08:33:38.411 Z, key: , value: {"id": 10, ...`. Do you have an idea what could went wrong / what typical issues may come up in that point?
Hi, the best place to ask this is forum.confluent.io/ :)
I do have some questions:
Kafka is using a key value storage for messages and have some great features for data persistency. But usual data streams should not be stored forever - right? I guess Kafka has an internal cleanup policy for removing old streams, specially if topics come to maximum of physical sizes.
How does ksqlDB handles that kind of cleanup policy (if it's exists?) since we are using it for database purposes it should be available for lifetime.
So my question: What is ksqlDB? Is it a kafka topic consumed always from offset earliest, is it comparable to a Redis key value storage, is it comparable to a MongoDB document storage, is it comparable to sql databases, ... ?
Great question. Kafka stores data based on the retention policy which can be configured per topic. You can retain data based on size, duration - or indeed forever (which is totally valid, see www.confluent.io/blog/publishing-apache-kafka-new-york-times/ and www.confluent.io/blog/okay-store-data-apache-kafka/). It's basically down to the use case for the data. You say "data streams should not be stored forever", but it depends on what that data is - there are plenty of examples where you *would* keep that data forever.
There are also compacted topics, in which the latest value of a key is retained forever, whilst earlier values of the key are removed.
In terms of ksqlDB itself, it is built on Kafka topics, so the same principles above apply.
If you have more questions head over to forum.confluent.io/
@@rmoff Thanks a lot for your support