Data Storages

Architecture	Total cost of solution	Flexibility of scenarios	Complexity of development	Maturity of ecosystem	Organizational maturity required
Cloud data warehouse	High - given cloud data warehouses rely on proprietary data formats and offer an end to end solution together, the cost is high	Low - Cloud data warehouses are optimized for BI/SQL based scenarios, there is some support for data science/exploratory scenarios which is restrictive due to format constraints	Low - there is less moving parts and you can get started almost immediately with an end to end solution	High - for SQL/BI scenarios, Low - for other scenarios	Low - the tools and ecosystem are largely well understood and ready to be consumed by organizations of any shape/size.
Modern data warehouse	Medium - the data preparation and historical data can be moved to the data lake at lower cost, still need a cloud warehouse which is expensive	Medium - diverse ecosystem of tools nad more exploratory scenarios supported in the data lake, correlating data in the warehouse and data lake needs data copies	Medium - the data engineering team needs to ensure that the data lake design is efficient and scalable, plenty of guidance and considerations available, including this book	Medium - the data preparation and data engineering ecosystem, such as Spark/Hadoop has a higher maturity, tuning for performance and scale needed, High - for consumption via data warehouse	Medium - the data platform team needs to be skilled up to understand the needs of the organization and make the right design choices at the least to support the needs of the organization
Data lakehouse	Low - the data lake storage acts as the unified repository with no data movement required, compute engines are largely stateless and can be spun up and down on demand	High - flexibility of running more scenarios with a diverse ecosystem enabling more exploratory analysis such as data science, and ease of sharing of data between BI and data science teams	Medium to High - careful choice of right datasets and the open data format needed to support the lakehouse architecture	Medium to High - while technologies such as Delta Lake, Apache Iceberg, and Apache Hudi are gaining maturity and adoption, today, this architecture requires thoughtful design	Medium to High - the data platform team needs to be skilled up to understand the needs of the organization and the technology choices that are still new
Data mesh	Medium - while the distributed design ensures cost is lower, lot of investment required in automation/blueprint/data governance solutions	High - flexibility in supporting different architectures and solutions in the same organization, and no bottlenecks on a central lean organization	High - this relies on an end to end automated solution and an architecture that scales to 10x growth and sharing across architectures/cloud solutions	Low - relatively nascent in guidance and available toolsets	High - data platform team and product/domain teams need to be skilled up in data lakes.

Cost versus complexity of cloud data lake architectures