Lab: SCD in Lakehouse
Implement slowly changing dimensions in a data lake using AWS Glue and Delta
- Upload initial JSON data into S3 Raw layer
- Process the data from raw JSON to Delta format using Glue ETL Job
- Load the processed data locally and read in pandas dataframe
- Change the raw json by deleting, updating and creating some records
- Process the data again from changed raw JSON to Delta format using Glue ETL Job
- Load the processed data locally and read in pandas dataframe, compare changes and run SQL queries