Skip to main content

Lab: SCD in Lakehouse

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

process-flow drawio

  1. Upload initial JSON data into S3 Raw layer
  2. Process the data from raw JSON to Delta format using Glue ETL Job
  3. Load the processed data locally and read in pandas dataframe
  4. Change the raw json by deleting, updating and creating some records
  5. Process the data again from changed raw JSON to Delta format using Glue ETL Job
  6. Load the processed data locally and read in pandas dataframe, compare changes and run SQL queries