Column Oriented Storage (with Parquet!) | Systems Design Interview: 0 to 1 with Ex-Google SWE

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 31

  • @tarun4705
    @tarun4705 11 месяцев назад +10

    This whole playlist is a valuable treasure.

  • @mohanakumaar
    @mohanakumaar 3 месяца назад +2

    The way you synthesize the information given in DDIA in 15 Minutes for a topic is amazing:)

  • @abhimanyushekhawat2626
    @abhimanyushekhawat2626 2 месяца назад +1

    The great part about your videos is how you build the concepts from the ground up with first principles & fundamental intuitions.
    I am watching your videos after reading DDIA and I am able to appreciate them so much more.
    Thanks Jordan for your efforts.

  • @dwivedys
    @dwivedys 5 месяцев назад +3

    Not sure if anyone (or may be all of you) noticed the phrase he very smoothly utters at 6:58 -- “at least that’s what she said…” … amazing man!!

    • @pavansrinivas4388
      @pavansrinivas4388 3 месяца назад

      Haha, didnt notice it.. such humor in the context of explaining technical stuff... awesome:)

  • @PavanBommana19-25
    @PavanBommana19-25 9 месяцев назад +5

    Love this series! Thank you, Jordan. Can you please post the pdfs to the ipad notes that you write during the videos? It can help a lot in quickly refreshing the topics.

    • @jordanhasnolife5163
      @jordanhasnolife5163  9 месяцев назад

      Hey Pavan! I will once my current series is complete so that I can do it all in bulk

  • @SomjitNag
    @SomjitNag Месяц назад +1

    The humour makes these videos even better!

  • @ZichengLiu-k4c
    @ZichengLiu-k4c Год назад +4

    Love it Jordan ! Thanks for the video, looks like you are confident regardless of the rejections, as you just want to perform data analysis. One thing tho, I think LSM tree itself should not be an in memory binary tree, it consists of both in-memory component of memtable (binary tree) and the on-disk component SStable

  • @Spreadlove5683
    @Spreadlove5683 Год назад +3

    Additionally if you just need one column, I assume you don't have to load a bunch of extra data from disk into memory just to then filter it all out. So in addition to locality benefits, you also don't have to read as much stuff from disk. Although if latency outweighs bandwidth because you aren't reading much data, it might not matter.

  • @timavilla
    @timavilla 8 месяцев назад +6

    Hey Jordan, thank you for the series, complicated concept stick really well thanks to silly examples!
    As i understood, both compressions are used only if there is a small amount of possible values. But why would i ever use dictionary compression, that doesnt reduce amount of values stored, just makes each value smaller, if regular compression greatly reduces amount of values stored? Wont result of regular compression always take less space?

    • @jordanhasnolife5163
      @jordanhasnolife5163  8 месяцев назад +2

      You can't always do regular compression! It requires similar values to be next to one another on disk

  • @HarshitKumar-zx4dj
    @HarshitKumar-zx4dj 6 месяцев назад +1

    To summarise:
    Column storage - data of a column is stored together in disk. Selecting few column for analysing is faster as we can pick those column in faster way as they are located at same location in disk.
    Column oriented storage also allows column compression.

  • @cricket4671
    @cricket4671 Год назад +2

    Bdw, like your content & thank you for making these videos.
    Some unsolicited skin advice. I heard you mention about bad skin/acne in one of your earlier video. I had something similar & would suggest cutting out all sugar, beer, wine, simple carbs from diet. Increasing fiber intake, adding a probiotic & taking all of your multivitamins. Healing leaky gut takes a while. All the best! 😊

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад

      Thanks man! I'd love to cut alcohol, alas I love it - the main issue for me personally I think is eating tons of dairy for lifting, doesn't help me

  • @TheSdl79
    @TheSdl79 8 месяцев назад +4

    The more that we can fit in there - the better... that's what she said... I'm dying))))

  • @dibll
    @dibll Год назад +1

    Could you pls do a video on different types of indexes - clustered, multi dimensional etc?

  • @radosawmul8963
    @radosawmul8963 8 месяцев назад +1

    What if there is more than 9 rows during conversion from bitmap to run lenght encoding? :D How would one know if 13 denotes "13" or "1" and"3"

    • @jordanhasnolife5163
      @jordanhasnolife5163  8 месяцев назад +1

      Keep in mind that these are really numbers in binary, so assuming we used an int to represent our run size we just read the next 32 bits.

  • @dibll
    @dibll Год назад +1

    Is the compressed data stored along side the actual table on hard disk? How does the client knows that compressed data 011000110(example) denotes real value (let's say)1?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Год назад +1

      There would be a little bit of additional metadata saying that the column is in compressed form - it would defeat the purpose of compression if we stored the original column besides it :)

  • @murike
    @murike 7 месяцев назад +1

    If columns are stored separately, how do I get data from a column that is not where clause? ex: select name where age < 23 // how it matches column name with column age?

    • @jordanhasnolife5163
      @jordanhasnolife5163  7 месяцев назад

      Sort them in the same order

    • @amlanbiswas4526
      @amlanbiswas4526 7 месяцев назад

      Along with the data, the row_id can be present, like:
      Name: Jordan:1, Trump:2
      Age: 25:1, 102:2

  • @art4eigen93
    @art4eigen93 Год назад +1

    nomad filmmaker we know about... but nomad coder? wow!

  • @recursion.
    @recursion. Год назад +2

    Audio is completely out of sync tho 😪

  • @satadhi
    @satadhi 8 месяцев назад +1

    did you choose 322 on purpose are u a dota 2 fan ?

    • @jordanhasnolife5163
      @jordanhasnolife5163  8 месяцев назад

      Haha I did not didn't know there was a reference there