Flink is a separate compute engine that is might more scalable and efficient than Kafka Streams / KSQL (because it does not rely directly on Kafka Topics). Other benefits: Support for multiply APIs (SQL, Java, Python), Unified API for streaming and batch, support for CEP (complex event processing, i.e. pattern matching), connectivity to multiple Kafka clusters in one query, etc. Kafka Streams on the other side is a very lightweight library that can be embedded into microservices (e.g., Spring Boot applications operated in its own Docker container). Very different sweet spot than Flink.
Thanks for the video it’s really insightful...Can you just explain in a video or here…How kafka and flink can fit in a realtime scenario with their duties for understanding with more clarity.
One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
Apache Flink, a real-time stream processing framework, can be conceptually compared to the blackboard model of consciousness in the sense that both involve dynamic interactions and collaboration. In the blackboard model, different components (agents or processes) contribute to a shared memory or workspace (the "blackboard") to solve complex problems. Similarly, in Flink, multiple tasks or operators process and transform streams of data in a distributed environment, constantly interacting with the data streams. Drafted by AI
You can also use Spark Streaming together with Kafka. The fundamental difference is that Spark was built for batch and added streaming capabilities while Flink was designed for streaming from the beginning. This fact is combination with some other benefits and mature features for transactional workloads, complex event processing (CEP) capabilities, much better open source community adoption and growth (for streaming data, not for batch data) make Flink the better choice for most data streaming projects.
thanks for your answers i got couple questions: - is it possible to be used with RabbitMQ instead Kafka (and why not?) - what would be a Hello,World project for such field (data streaming projects)?
@@GreatTaiwan One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
@@GreatTaiwan "Hello World" projects can either be a (relatively simple) integration data pipeline for streaming ETL or a simple business application / business logic such as alerting if a threshold (e.g., of a sensor temperature measurement) is reached.
@@kaiwaehner5702 thanks a lot been reading about Msg vs Event Brokers and now RMQ and Kafka diff are more obvious .. thanks once again for your answers
Scala is supported as a JVM language. Even though there is no native support for Scala directly, I have seen teams using Scala with Flink without issues.
Are we going back to overly complex "application servers" like the ones getting such a bad rap in the EJB days? I see a lot of love for flink in other videos when all i can think is how overly complicated they've managed to do things. When i hear snapshots being stored in cloud for instance, my over-complication-radar sounds its alarm. Guess there are a lot of good use cases for it though, since all those big companies are using it. But flink should not be the first thing that comes to mind for aggregating some kafka events into some state (in my opinion, obviously!). Better to just write a streaming application(? - it is very easy). Scaling that is a simple matter of upping stream threads + partitions, and maybe number of pods. Totally flexible. No need for snapshots or replays after downtime since kafka stores state when you do joins etc. In the right scenario, use flink. But think before you act! :) That said, i am very intrigued and am currently looking at maybe introduce flink at work.
Indeed, Flink is complex to operate. That's why a fully managed SaaS cloud service is the best choice. You don't have to worry about operations, scalability, etc. You just pay as you go with consumption-based pricing, elastic scale, etc. And yes, not every use case requires Flink. But in most cases (especially with a SaaS Flink), you are just a SQL query or Python script away from doing stream processing. If you need data consistency, low latency or reliable SLAs, then "just writing another streaming application" is definitely not easier.
@@MUSHIN_888 it depends on who you are talking to and what type of application we are talking about. Everyone have different preconditions. For instance, if i am very flink savvy and we are already using flink (eg it is set up and running), then it would most probably be a lot simpler to use flink. Another example, if i know nothing about flink, it is not set up but i do find it simple to create a new microservice (assuming that is the current setup) then that is easier. TL;DR - i am not you, you are not everyone else
@@MUSHIN_888 Obviously, you need to understand the principles of data streaming and event-driven architectures (compared to databases and APIs) to get the best (and cost-efficient!) results. With that, using Flink in the cloud is just a SQL query. That's it. This is not comparable to the complexity of the Java EE ecosystem with EJBs etc, and all the challenges around the applications servers. If you self-managed Flink, someone also needs to take over the operations. That's indeed pretty hard if you don't know what you are doing.
Why would one need to use Flink when there is already Kafka Streams and KSQLDB?
Same question like you
The answer starts at 3:10
Flink is a separate compute engine that is might more scalable and efficient than Kafka Streams / KSQL (because it does not rely directly on Kafka Topics). Other benefits: Support for multiply APIs (SQL, Java, Python), Unified API for streaming and batch, support for CEP (complex event processing, i.e. pattern matching), connectivity to multiple Kafka clusters in one query, etc.
Kafka Streams on the other side is a very lightweight library that can be embedded into microservices (e.g., Spring Boot applications operated in its own Docker container). Very different sweet spot than Flink.
Kafka Streams and KSQLDB doesn't support analytical job like Flink does
@@RecaAtoz What exactly do you mean with „analytical job“?
Well done. Love the SL FLink offering, now it makes sense how Kafka and Flink can coexist :)
I am a beginner I understood the Concepts with in 10 mins . Very good explanation
clearly explained. thanks for this amazing video.
Thanks for the video it’s really insightful...Can you just explain in a video or here…How kafka and flink can fit in a realtime scenario with their duties for understanding with more clarity.
can we use it with RabbitMQ instead?
One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
More lightboard videos of Flint would be so helpful :D
Very engaging video, with just the right amount of information. Top effort!
Very informative. Thanks!
Apache Flink, a real-time stream processing framework, can be conceptually compared to the blackboard model of consciousness in the sense that both involve dynamic interactions and collaboration. In the blackboard model, different components (agents or processes) contribute to a shared memory or workspace (the "blackboard") to solve complex problems. Similarly, in Flink, multiple tasks or operators process and transform streams of data in a distributed environment, constantly interacting with the data streams.
Drafted by AI
Why not Apache Spark streaming from kafka
You can also use Spark Streaming together with Kafka. The fundamental difference is that Spark was built for batch and added streaming capabilities while Flink was designed for streaming from the beginning. This fact is combination with some other benefits and mature features for transactional workloads, complex event processing (CEP) capabilities, much better open source community adoption and growth (for streaming data, not for batch data) make Flink the better choice for most data streaming projects.
thanks for your answers i got couple questions:
- is it possible to be used with RabbitMQ instead Kafka (and why not?)
- what would be a Hello,World project for such field (data streaming projects)?
@@GreatTaiwan One of the big differences between a message broker (like RabbitMQ) and Apache Kafka is that Kafka provides a persistence layer for the events. Hence, Flink applications can execute events at the right pace depending on the use case (real-time, batch, travel back in time for historical analysis, etc). This is not possible with a push-based message broker.
@@GreatTaiwan "Hello World" projects can either be a (relatively simple) integration data pipeline for streaming ETL or a simple business application / business logic such as alerting if a threshold (e.g., of a sensor temperature measurement) is reached.
@@kaiwaehner5702 thanks a lot been reading about Msg vs Event Brokers and now RMQ and Kafka diff are more obvious ..
thanks once again for your answers
Awesome video.. just suggestion is look straight at screen a feel like you are explaining to viewers
Yes, agreed. The lightboard setup will be improved for future videos.
Brilliant...
Missing Scala as language
Scala is supported as a JVM language. Even though there is no native support for Scala directly, I have seen teams using Scala with Flink without issues.
❤
Are we going back to overly complex "application servers" like the ones getting such a bad rap in the EJB days? I see a lot of love for flink in other videos when all i can think is how overly complicated they've managed to do things. When i hear snapshots being stored in cloud for instance, my over-complication-radar sounds its alarm. Guess there are a lot of good use cases for it though, since all those big companies are using it. But flink should not be the first thing that comes to mind for aggregating some kafka events into some state (in my opinion, obviously!). Better to just write a streaming application(? - it is very easy). Scaling that is a simple matter of upping stream threads + partitions, and maybe number of pods. Totally flexible. No need for snapshots or replays after downtime since kafka stores state when you do joins etc. In the right scenario, use flink. But think before you act! :) That said, i am very intrigued and am currently looking at maybe introduce flink at work.
Indeed, Flink is complex to operate. That's why a fully managed SaaS cloud service is the best choice. You don't have to worry about operations, scalability, etc. You just pay as you go with consumption-based pricing, elastic scale, etc. And yes, not every use case requires Flink. But in most cases (especially with a SaaS Flink), you are just a SQL query or Python script away from doing stream processing. If you need data consistency, low latency or reliable SLAs, then "just writing another streaming application" is definitely not easier.
I don’t think writing an application is easier. It’s not right? Why would u say it is?
@@MUSHIN_888 it depends on who you are talking to and what type of application we are talking about. Everyone have different preconditions. For instance, if i am very flink savvy and we are already using flink (eg it is set up and running), then it would most probably be a lot simpler to use flink. Another example, if i know nothing about flink, it is not set up but i do find it simple to create a new microservice (assuming that is the current setup) then that is easier. TL;DR - i am not you, you are not everyone else
@@MUSHIN_888 Obviously, you need to understand the principles of data streaming and event-driven architectures (compared to databases and APIs) to get the best (and cost-efficient!) results. With that, using Flink in the cloud is just a SQL query. That's it. This is not comparable to the complexity of the Java EE ecosystem with EJBs etc, and all the challenges around the applications servers. If you self-managed Flink, someone also needs to take over the operations. That's indeed pretty hard if you don't know what you are doing.