Data Architecture 101: Kappa (Real-Time Data)

Поделиться
HTML-код
  • Опубликовано: 7 сен 2024

Комментарии • 2

  • @KahanDataSolutions
    @KahanDataSolutions  10 месяцев назад

    ►► The Starter Guide for Modern Data (Free PDF) → www.kahandatasolutions.com/startermds

  • @gatorpika
    @gatorpika 10 месяцев назад +1

    Good summary, but having gone through this fight recently I guess I would say it can get really complex. First it should be use case driven and you may want to lean towards this type of architecture if there is really a need in your company for real time data, which tends to be more operational than analytical in nature. For those use cases, how much of your data does is represent? Like if you need 5% of your data immediately and from one source, you don't need to kludge the other 95% into kappa architecture because you have to pick one. You also need to look at the sources and how data is presented to you. Like in my case the Kappa advocates get an hourly file with millions of records. They wanted to break it down into individual messages and then run it through their event broker as if it was Kappa because at some point in the future it may change. For real time data you need to look at latency requirements and often just stream directly to a consuming application because running it through your whole lake-->transform-->model process adds lag to your processes. This was shown in the lambda model but is applicable to any case where you need to react in seconds or minutes. Finally if you are in a cloud environment you need to consider the cost of running your ingest servers 24/7 rather than just spinning up compute to run your ETL once a day. So you end up with some significant optimization problems trying to minimize the unutilized capacity to reduce costs. So in summary I guess I would say don't fall in love with any specific pattern and maybe try to break down your processing plan based on needs and challenges and use whatever pattern is appropriate (considering maintainability as well).