I spent the weekend learning about one of the most iconic database storage engines.
The one once common in the OLTP world, now gaining more and more adoption in OLAP systems.
I will publish a paid article every Tuesday. I wrote these with one goal in mind: to offer my readers, whether they are feeling overwhelmed when beginning the journey or seeking a deeper understanding of the field, 15 minutes of practical lessons and insights on nearly everything related to data engineering.
Intro
If you work with OLTP databases, a high chance that you’ve heard or even worked with B-Tree, the database storage structure for optimizing query performance in databases like MySQL or PostgreSQL.
It’s excellent at speeding up reads, but the system must perform more work at write time, as it has to locate the data in the B-Tree before making any in-place changes to the entire data page in memory and writing it back to disk, even if the change volume is 1/1000 of the page.
Some workload requires higher throughput data ingestion, such as key-value stores or NoSQL databases. A storage engine, which was first introduced in the 1990s, can offer that.
In this week’s article, we will delve into the Log-Structured Merge-Tree (LSM-tree), the storage engine that supports many popular OLTP databases, such as RocksDB or Cassandra. The cool thing about this engine is that it isn’t exclusively implemented in OLTP databases; many OLAP systems, such as Clickhouse, BigQuery, RisingWave, Apache Hudi, or Apache Doris, have implemented or inspired the idea of the LSM tree for their storage engine.
We will first explore the LSM architecture (in the simplest way), then break down its pros and cons, and finally try to answer why it is getting more and more preferred in OLAP systems (however, not all OLAP systems will implement it).
LSM architecture
Essentially, the LSM-tree organizes data into hierarchical levels; from volatile state in memory to its final, immutable state on disk. In practice, data on disk itself is composed of multiple levels of on-disk files.
The structure that holds data in memory is usually referred to as the Memtable, and the one on disk is called SSTables. When you write to disk, it can be considered safe; if your machine is down, it won’t affect the data written to disk (unless you burn the hard drive).
However, we can’t claim the same thing for the data that lives in memory. Your machine being down also means that whatever is stored in your memory will be lost too. That said, an LSM implementation typically includes a third component called the Write-Ahead Log, which helps with data durability.





