22 Comments
User's avatar
John Ryan's avatar

I've been blogging for years at articles.Analytics.Today, and it's terrific to find a fellow tech enthusiast with a real passion and skill for writing. Superb post. I'll include a link to this in my next article - on how Snowflake works with Apache Iceberg over Parquet. Well done!

Expand full comment
Chandu's avatar

Truly awesome post, explaining in detail about parquet. Thanks a million for sharing.

Expand full comment
Ruslan's avatar

Good Parquet overview!

Expand full comment
Hau Suresh's avatar

Very goos article

Expand full comment
Bruno Jander Santos Lima's avatar

Great article (as always)!

Expand full comment
L.B.'s avatar

Thank you for this great article !

Expand full comment
Junaid Effendi's avatar

Very detailed.

Expand full comment
Karthik Subramanian's avatar

Very detailed overview for someone who works with parquet and spark sql at work.

Bookmarking this

Expand full comment
Chris's avatar

Great article!

I think there is a tiny mistake in the last picture:

For row group 2 the system must read the yellow column chunk C, otherwise it wouldn't know if the condition C<10 is met.

Cheers Chris

Expand full comment
Hoang Tran's avatar

hey Vu, great illustrations!

i do wanna add to the `Encoding` part a bit

after RLE, parquet also applies bit-packing to save spaces further, see https://parquet.apache.org/docs/file-format/data-pages/encodings/#run-length-encoding--bit-packing-hybrid-rle--3

it's not just solely RLE, but a combination of both encoding techniques

Expand full comment
Aliyu Aziz's avatar

Great 👍

Expand full comment
Davit Bostoghanashvili's avatar

🙌🙌🙌

Expand full comment
Ngoc Tan Dang's avatar

Nice work

Expand full comment
Vamsi's avatar

well written article!

Expand full comment
Mitch's avatar

You are my benchmark for quality Excalidraw illustrations, they are always so clear and visually pleasing! Thanks for the article.

Expand full comment
Vijayakumar Z's avatar

Good article..

Expand full comment