I think most schema become a bit less needed after the introduction of SSD and Python in data analysis.
Still need it in situations but with the cloud computing and the cost/speed of computing is quite cheap the schema only relevant in big team, corporation where you need good data governance, audit, permission control, history tracking ... While for purely data analysis star schema may not really as useful.
Every time I read about the Star Schema I struggle with one thing: How is the process to take the data from OLTP and put it into the OLAP system? I'm not talking about the DMS like a Data Factory, but how to store the data
Now that I'm writing this, another question comes: In a lakehouse (I use Databricks + azure DL), we can just extract everything from the OLTP, put it into a staging or raw layer and then, in the bronze layer, we model our facts and dimensions?
I'd appreciate if you can clarify those questions for me!
I think most schema become a bit less needed after the introduction of SSD and Python in data analysis.
Still need it in situations but with the cloud computing and the cost/speed of computing is quite cheap the schema only relevant in big team, corporation where you need good data governance, audit, permission control, history tracking ... While for purely data analysis star schema may not really as useful.
Such a good consolidation of knowledge!
Every time I read about the Star Schema I struggle with one thing: How is the process to take the data from OLTP and put it into the OLAP system? I'm not talking about the DMS like a Data Factory, but how to store the data
Now that I'm writing this, another question comes: In a lakehouse (I use Databricks + azure DL), we can just extract everything from the OLTP, put it into a staging or raw layer and then, in the bronze layer, we model our facts and dimensions?
I'd appreciate if you can clarify those questions for me!
A good refresh of the basis.
BTW, Start Schema should be Star Schema
Oh, thanks a lot, Jove! I edited it.