Lab: Databricks Clickstream Analysis
Objective
Databricks AWS Integration and Clickstream Analysis
Introduction
Components and steps we will follow in this lab:
- AWS Glue as our Central Metastore
- We will launch 1 Kinesis Stream ie. User click stream
- Join an already existing user Profile Delta table registered in our Glue metastore
- We will execute a crawler job to pull in an S3 datasets into our AWS Glue metastore
- The pipeline consists of a Data Lake medallion appproach
- We will demonstrate the Full DML support of Delta Lake while curating the Data Lake
- The curated GOLD dataset will be available to Athena and pushed to Redshift for later consumption
- Finally, a QuickSight dashboard