In one word: no. There are two issues: 1) one campaign may create more load than one node can handle and 2) there are many more campaigns than nodes or partitions. Because of #1, we will need to spread the campaign onto several partitions. Because of #2, we will map multiple campaigns into fewer partitions.
Very good explanation of the problem statement and the solution aspects.
Very nice self contained talk. Covers the problem and its solution, quite clearly. Thanks.
how about if we need the data for 5 mins of a day. Does this approach still works?
awesome talk...covers all failure scenarios
Can we just have all the data for a given campaign_Id go to the same partition, so we don't have to keep track of the offset on a per partition basis?
Multiple campaign_ids will still end up going to one partition right. So, we should always track offset aggregated per partition
In one word: no. There are two issues: 1) one campaign may create more load than one node can handle and 2) there are many more campaigns than nodes or partitions. Because of #1, we will need to spread the campaign onto several partitions. Because of #2, we will map multiple campaigns into fewer partitions.
Fantastic talk
thanks! great talk :)