Thanks for this! I've always found the semistructured stuff hard to understand. I just want to point out, though, that the example in the referenced paper for shredding has different values in the columnar decomposition. In particular, for value 'en' in Name.Language.Code, the repetition level is 2, because it is a repetition of the 2nd repeated field (according to the paper).
Great lecture! What would it take for a new file format becomes mainstream? Parquet/ORC are so popular, is it possible for a new format to rise?
Thanks for this! I've always found the semistructured stuff hard to understand. I just want to point out, though, that the example in the referenced paper for shredding has different values in the columnar decomposition. In particular, for value 'en' in Name.Language.Code, the repetition level is 2, because it is a repetition of the 2nd repeated field (according to the paper).
Thank you.
The Link for the notes are not there