man your style of explanation is just awesome..I mean how can you explain things so easily that too without much animation or something...one of the best instructor I must say
what if we want to store information for each day in the NASDAQ stock market with ~3000 symbols and 1 billon of trades per day. Should we use one topic for each symbol or just one topic with handred of partitions? I want to understand a real case with a hugh amount of data!
What is the relation between events and messages? Messages have key-value pairs? or Events have key-value pairs? what exactly ... Events or messages are stored in partitions with key-value pairs? this was a helpful video Take key ------> HASH Function-------> Output mod (Total no. of partitions) ------> Resulting no. is Partitions number where message going to store. I was unaware of this concept. Thank You Tim.
In Kafka, events are real-world occurrences (e.g., a user purchasing an item), and they are represented as messages in Kafka. A message can have a key-value pair, where the key is used for partitioning, and the value is the data (like JSON) representing the event. Messages are stored in partitions of a Kafka topic. If you provide a key, Kafka uses a hash function on the key, then uses mod (total number of partitions) to determine which partition to store the message in. This ensures that messages with the same key (e.g., "user123") always end up in the same partition, while distributing other messages across partitions. So, it's the messages (representing events) that are stored in partitions, and they can have key-value pairs.
In the Spark producer, is it literally as simple as adding a column as the "key" before write? Is there anything else needing to be considered? Let us assume the data is being read from a table that is already partitioned by a date and then the column you will be using as the key.
hi, thank you so much for this nice course. Might I have one further question. Does the ordering and key mechanism also work for difference topic. Suppose that I have 2 topics A and B. There are 2 partitions with 2 consumers for these 2 partitions. Each consumer consumes both topic A and B but in difference partition. When I produce message 1 to A and 111 to B with the same key AAA. Does it can be stored in the same partition [0] or [1]? When I produce message 2 to A and 222 to B with the same key BBB. Does it can be stored in the same partition [0] or [1]? If the messages 1 and 111 are not consumed by the same consumer then it is a problem since one of them can be consume by other consume which can be consumed earlier it should be. In this case we expected the message 1 should be consume before and then 111. But the 1 and 111 can be consumed at the same time by 2 consumers if the ordering does not work for the difference topic. Thanks a lot!
Hi - what do the Fire logo and wheel logo mean? Are these logos of old Kafka versions? I only could find a slideshow "Kafka - Past, present and future" :)
If the number of partitions changes, does Kafka re-hash and redistribute events? If not then events with the same ID could end up in different partitions?
Do we need to mentioned the number of partition while creating the topics? For example if I create the key with CustomerID and create the Kafka topic with 5 Partition then when there is a 6th Customer comes with different Key then which Partition will stored?
Since Kafka 2.4 the Default Partitioner is set to Sticky Partitioner, and not Round Robin. Hence, if you wonder why all the messages with the null key land in the same partition this is the reason. If you want to set it to Round Robin to how it was before version 2.4 then set it in the props passed to ProducerFactory: props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, RoundRobinPartitioner.class);
finally i found the partition logic in this video. Thanks a lot for the crisp video tim.
man your style of explanation is just awesome..I mean how can you explain things so easily that too without much animation or something...one of the best instructor I must say
It's simply a great explanation. Thanks Man
Thanks Tim.. This is best video over internet for those who just jumped in to Kafka....
Yet again Tim, rock solid short snappy overview.
Nice vid!
Best explanation for partition.
I like this series of videos, thanks a lot🤩
Very well explained ..thank you
incredible visualized video. Thank you so much
what if we want to store information for each day in the NASDAQ stock market with ~3000 symbols and 1 billon of trades per day. Should we use one topic for each symbol or just one topic with handred of partitions?
I want to understand a real case with a hugh amount of data!
Great video guys, very helpful!
Thank so much, it is so clearly
What is the relation between events and messages?
Messages have key-value pairs? or Events have key-value pairs?
what exactly ... Events or messages are stored in partitions with key-value pairs?
this was a helpful video
Take key ------> HASH Function-------> Output mod (Total no. of partitions) ------> Resulting no. is Partitions number where message going to store.
I was unaware of this concept.
Thank You Tim.
In Kafka, events are real-world occurrences (e.g., a user purchasing an item), and they are represented as messages in Kafka. A message can have a key-value pair, where the key is used for partitioning, and the value is the data (like JSON) representing the event.
Messages are stored in partitions of a Kafka topic. If you provide a key, Kafka uses a hash function on the key, then uses mod (total number of partitions) to determine which partition to store the message in. This ensures that messages with the same key (e.g., "user123") always end up in the same partition, while distributing other messages across partitions.
So, it's the messages (representing events) that are stored in partitions, and they can have key-value pairs.
Very well done Tim - Thank you!
In the Spark producer, is it literally as simple as adding a column as the "key" before write? Is there anything else needing to be considered? Let us assume the data is being read from a table that is already partitioned by a date and then the column you will be using as the key.
Thanks to animation in this video now I better understand partition
Amazing explanation!!
Excellent explanation, but one thing that I see just about all videos lack is explaining WHY partitioning is useful and when it is not.
When consumer read the topic, how does it know which partition to read the message out?
good explanation, thanks!
Thank you
Thank you, crystal clear!
hi, thank you so much for this nice course.
Might I have one further question. Does the ordering and key mechanism also work for difference topic.
Suppose that I have 2 topics A and B. There are 2 partitions with 2 consumers for these 2 partitions. Each consumer consumes both topic A and B but in difference partition.
When I produce message 1 to A and 111 to B with the same key AAA. Does it can be stored in the same partition [0] or [1]?
When I produce message 2 to A and 222 to B with the same key BBB. Does it can be stored in the same partition [0] or [1]?
If the messages 1 and 111 are not consumed by the same consumer then it is a problem since one of them can be consume by other consume which can be consumed earlier it should be.
In this case we expected the message 1 should be consume before and then 111. But the 1 and 111 can be consumed at the same time by 2 consumers if the ordering does not work for the difference topic.
Thanks a lot!
clear explanation, thanks
So if you want a FIFO queue, you are limited to only one partition?
awesome .
so it is generally correct , if not always, to say that messages in different partitions within same topic are mutually exclusive?
When I subscribe to a partitioned Topic, I still get all the messages eventually, just not necessarily in the correct order, right?
Hi - what do the Fire logo and wheel logo mean? Are these logos of old Kafka versions? I only could find a slideshow "Kafka - Past, present and future" :)
If the number of partitions changes, does Kafka re-hash and redistribute events? If not then events with the same ID could end up in different partitions?
It would be great to have a crisp explaination also for that
Why would events have the same ID
What if one partition node goes down, then the order can be messed up?
Is it possible one message can belongs to multiple partitions ?
Pessimistically, because fund transfers always go without a hitch, right?
Do we need to mentioned the number of partition while creating the topics?
For example if I create the key with CustomerID and create the Kafka topic with 5 Partition then when there is a 6th Customer comes with different Key then which Partition will stored?
Yes,you have to mention no. of partitions.
What's the name of this song? ps: Kafka is amazing
Why don't use consistent hashing but use hash/ num_partition?
hey tim, can you tell me another advantage using partition
in my humble opinion it is the case in real world, to have a customer acting as a noisy neighbour.
Since Kafka 2.4 the Default Partitioner is set to Sticky Partitioner, and not Round Robin. Hence, if you wonder why all the messages with the null key land in the same partition this is the reason. If you want to set it to Round Robin to how it was before version 2.4 then set it in the props passed to ProducerFactory: props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, RoundRobinPartitioner.class);