Project: Athena Federated

Building Federated Query System using Amazon Athena

Activity 1: Athena Lab Environment Setup

In this activity, we will run a cloudformation stack to create the lab environment. The stack will create a sample TPC database running on Amazon RDS, Amazon EMR Cluster with HBase, Amazon Elasticache Redis, Amazon DynamoDB, Glue Database and tables, S3 Bucket, S3 VPC Endpoint, Glue VPC Endpoint, Athena Named Queries, Cloud9 IDE, SageMaker Notebook instance and other IAM resources.

Activity 2: Athena Basics

In this activity, we will do the following items:

Enable Cloudwatch metrics for Athena
Athena Interface - Create tables and run queries
Create tables with Glue
Create Views
Query results and history
ETL with Athena CTAS
Athena Workgroups
Visualize with Quicksight using Athena

Activity 3: Athena Federation

To demonstrate Athena federation capabilities, a sample data set is being used in this activity along with sample tables and sample data sources.

TPCH data, which is public, is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The components of TPC-H consist of eight separate and individual tables (the Base Tables).

Imagine a hypothetical e-commerce company who's architecture uses:

Lineitems processing records stored in HBase on EMR
Redis is used to store nations and active orders so that the processing engine can get fast access to them
Aurora with MySQL engine for Orders, Customer and Suppliers accounts data like email address, shipping addresses, etc.
DynamoDB to host part and partsupp data for high performance

tpch_athena_federated

In this activity, we will learn:

Install db connectors - MySQL, HBase, DynamoDB, and Redis
Run Federated queries
Visualize with Quicksight
Running queries with Quicksight

Activity 4: ACID transactions with Iceberg

Create Iceberg table and insert data into iceberg table
Updating datalake using Athena and Iceberg tables
Deleting data from datalake using Athena and Iceberg table
Time Travel and Version Travel Queries
Schema Evolution
Optimizing Iceberg Tables

Project Structure

├── [ 60K]  01-sa-main.ipynb
├── [2.8K]  README.md
└── [ 47K]  cfn
    ├── [ 33K]  athena_federated_stack.yml
    └── [ 14K]  athena_stack.yml

 110K used in 1 directory, 4 files

Project: Athena Federated

Activity 1: Athena Lab Environment Setup​

Activity 2: Athena Basics​

Activity 3: Athena Federation​

Activity 4: ACID transactions with Iceberg​

Project Structure​

Activity 1: Athena Lab Environment Setup

Activity 2: Athena Basics

Activity 3: Athena Federation

Activity 4: ACID transactions with Iceberg

Project Structure