Skip to main content

Lab: Databricks Clickstream Analysis

Objective

Databricks AWS Integration and Clickstream Analysis

Introduction

Components and steps we will follow in this lab:

  1. AWS Glue as our Central Metastore
  2. We will launch 1 Kinesis Stream ie. User click stream
  3. Join an already existing user Profile Delta table registered in our Glue metastore
  4. We will execute a crawler job to pull in an S3 datasets into our AWS Glue metastore
  5. The pipeline consists of a Data Lake medallion appproach
  6. We will demonstrate the Full DML support of Delta Lake while curating the Data Lake
  7. The curated GOLD dataset will be available to Athena and pushed to Redshift for later consumption
  8. Finally, a QuickSight dashboard

Architecture