At the beginning of a BI project an important question arises about the architecture of the data warehouse. Which model should you choose, star or snowflake? Why?

The choice between schemas will have an impact on performance, maintenance and readability, so keep this in mind before starting a data warehouse project.

Here are some key points to remember:

Physical differences

Star Schema:  

The data in the dimension tables are denormalized, which means that the data belonging to a dimension can be repeated.

The dimensions are connected directly to the fact table with a 1:n relationship.

modèle en étoile

Snowflake Schema: 

The data in the dimension tables are normalized, which means that the data are stored in separate dimension tables without redundancy, which means that the data that belong to a dimension are not repeated.

The dimensions are connected to the lookup dimensions and then connected to the fact table.

1. Normalization and Redundancy

As mentioned, the dimensions on the star schema are denormalized and the dimensions on the flake schema are normalized.

Due to the denormalization, the star schemas tend to have better query performance compared to the flake schema.

 

2. Query complexity

Star schemas require less complex queries because the fact table is linked to each dimension by a single join, which also speeds up query performance.

In snowflake schemas, multiple joins are required between the fact table and the dimension tables, which makes the queries more complex and increases the query execution time.

 

3. Implementation and Maintenance

Data warehouse architecture and implementation is generally easier for star schemas due to its simplicity, however, maintenance is highly related to the volume of redundant data. Fewer maintenance points are required on flake schemas because the dimensions do not have redundant data.

 

4. Disk Storage and Data Integrity

Because the dimensions in the snowflake schema do not have redundant data, this architecture requires less disk storage than star schemas.

Another advantage of non-redundant data is that it is easier to maintain data integrity because inserts and updates only affect one data entry instead of several in star schemas.

It’s important to mention that with the growth of cloud computing, disk storage is becoming less expensive and scalable.

 

Which scheme to choose?

It’s up to you to decide which architecture is best for your business case… We hope this article has helped you understand the main differences between the architectures and the advantages and disadvantages of each model

 

References :

https://www.keboola.com/blog/star-schema-vs-snowflake-schema#:~:text=Star%20schema%20stores%20redundant%20data,tables%20and%20avoids%20data%20redundancy.

https://blog.exsilio.com/all/snowflake-vs-star-schema/

Kimball, Ralph – The Data Warehouse Toolkit.