How Apache Pinot Achieves 200,000 Queries per Second (with Tim Berglund)

Поделиться
HTML-код
  • Опубликовано: 12 сен 2024
  • The likes of LinkedIn and Uber use Pinot to power some astonishingly high-scale queries against realtime data. The numbers alone would make an impressive case-study. But behind the headline lies a fascinating set of architectural decisions and constraints to get there. So how does Pinot work? How does it process queries? How are the various roles split across a cluster? And equally important - what does it not try to achieve.
    Joining me to go through the nuts and bolts of how Pinot handles SQL queries is Tim Berglund, veteran technology explainer of the realtime-data world. He takes us through Pinot step-by-step, covering the roles of brokers, servers, controllers and minions as we build up the picture of a query engine that's interesting in theory and massively performant in practice.
    -
    Apache Pinot: pinot.apache.org/
    Apache Pinot Docs: docs.pinot.apa...
    StarTree: startree.ai/
    Event Driven Design episode with Bobby Calderwood: • Can Event-Driven Archi...
    Tim on Twitter: / tlberglund
    Kris on Mastodon: mastodon.social...
    Kris on LinkedIn: / krisjenkins
    Kris on Twitter: / krisajenkins
    -
    #podcast #softwaredevelopment #apachepinot #database #dataengineering #sql

Комментарии • 6

  • @vicsteiner
    @vicsteiner 5 месяцев назад +2

    Interesting first question. My perspective on it is that from the point of view of the data model it does not matter at all if you have a transactional or analytical database. A data model is a logical formal system and transactional Vs analytical is derived not of limitations of the data model but of its physical implementation challenges. It's nice you start with history, I also have been doing lots of reading on history, especially on the relational data model history and my opinion at this point is that we need to have much care when we say that a SQL database is relationa, that is implements the relational data model. There are many technical and historical reasons to disagree with this.

    • @PeterCorless
      @PeterCorless 5 месяцев назад +1

      I come from the NoSQL world, where data model is everything. Is it a key-value store? Wide column? Graph? Even within SQL - which is a *query language* and not a data model language - you have issues like, is the data normalized [star schema] or denormalized [prejoined / precubed]? The kinds of queries you run and will be optimized for depends greatly on your data model. Also, row-store vs. Column-store matters hugely in terms of data storage and query patterns. Thoughts?

    • @vicsteiner
      @vicsteiner 5 месяцев назад +1

      Hey@@PeterCorless, something that is everything can't be anything. If a data model was everything there would be no purpose for the existence of such a term. Yes SQL is a query language, that was born in the context of initial attempts of implementing Codd's relational data model. You ask for thoughts, I do not want to lecture you or anything I can share with you the readings I have done that I consider improved my understanding and you decide if you want to read it. I consider very basic to have a robust definition of data and it's relation with information (they are not the same) for this my main reference is Luciano Floridi (search for data based definition of information). For understanding the relational data model I would say read E. F. Code, C. J. Date, D. McGoveran and Fabian Pascal. I would start with the last one. In a way a data model is a formal system, that is a a logical system applied for data collection. Relational model is first order predicate logic applied to databases. In any case this is just if you are curious on the theme from an intellectual point of view no necessarily a practical one. If you take Codd's definition of data model, he talks about a combination of three components (1) a collection of data structure types; (2) a collection of operators or inferencing rules, which can be applied to any valid instances of the data types listed in (1), to retrieve or derive data from any parts of those structures in any combinations desired; (3) a collection of general integrity rules, which implicitly or explicitly define the set of consistent database states or changes of state or both-these rules may sometimes be expressed as insert-update-delete rules. NoSQL (crazy how something defines itself as not being other thing) in this context to me would not be considered as a data model - maybe I am completely wrong open to your view on it - and does not hold the properties of a formal system. But sure you can define data model as you like and say the NoSQL databases have a data model backing it. To me, first time I saw Codd's definition was interest to learn that data structures are not data models. You can implement a relational data base and store all your data physically in a NoSQL database, it only might not be the best way of doing it. A nice book you can also check is Applied Mathematics for Database Professionals By Lex de Hahn and Toon Koppelaars (and before someone says that this is maths not practical stuff, I am just suggesting the reading for the pleasure of the mind not for any immediate practical reason, for practical reason you can consider a data model to be everything and follow the guts). By the way first time I crossed with the definition of a data model was reading Pascal's blog dbdebunk. Sure NoSQL database can be given good use and solve practical problems, but on my current understand the NoSQL movement is heuristic based (largely focused on performance not in modelling and in the context software development not with regard for data as carrying information as semantic content), while the relational model was looking at more fundamental (in the sense of applied logic and mathematics) questions. Both have their places, I interact with SQL and NoSQL databases on a daily basis too.

  • @mohamedr1164
    @mohamedr1164 10 дней назад

    Is Apache Pinot ACID compliant ?

  • @vram288
    @vram288 5 месяцев назад

    at 13 good