Capstones

📄️ Serverless Streaming Data on AWS

https://youtu.be/abaVSFFdU-c

📄️ Citi Bike Trip Histories

The goal of this capstone is to build an end-to-end data pipeline:

📄️ Airflow - City Traffic Drone

City Vehicle Trajactories Data extraction and warehousing for Traffic analysis

📄️ Global Historical Climatology Network Daily Data Pipeline

Objective

📄️ Building End to end data pipeline in AWS

Architecture Diagram

📄️ Funflix

You are working as a data engineer in an Australian media company Funflix. You got the following requirements and tasks to solve.

📄️ Log Analytics and Processing in Real-Time

Lab 1: Apache Flink on Amazon Kinesis Data Analytics

📄️ Streaming ETL pipeline with Apache Flink and Amazon Kinesis Data Analytics

We will create an Amazon Kinesis Data Analytics for Apache Flink application with Amazon Kinesis Data Streams as a source and a Amazon S3 bucket as a sink. Random data is ingested using Amazon Kinesis Data Generator. The Apache Flink application code performs a word count on the streaming random data using a tumbling window of 5 minutes. The generated word count is then stored in the specified Amazon S3 bucket. Amazon Athena is used to query data generated in the Amazon S3 bucket to validate the end results.

📄️ Kortex

Objectives

📄️ Lufthansa API

Objective

📄️ Movie Review Sentiment Analysis Pipeline

Build a pipeline that expresses the fact artist review sentiment and film review sentiment, based on the data provided by IMDb and TMDb.

📄️ Multi-touch Attribution

Abstract

📄️ Building Recommender System from Scratch

Overview

📄️ Reddit Submissions, Authors and Subreddits analysis

Problem Statement

📄️ AWS Kafka and DynamoDB for real time fraud detection

Problem Statement

📄️ Data Pipeline with dbt, Airflow and Great Expectations

In this project, we will learn how to combine the functions of three open source tools - Airflow, dbt and Great expectations - to build, test, validate, document, and orchestrate an entire pipeline, end to end, from scratch. We are going to load the NYC Taxi data into Redshift warehouse and then transform + validate the data using dbt and great expectations.

📄️ Serverless Streaming Data on AWS

📄️ ACLED

📄️ Citi Bike Trip Histories

📄️ Air Pollution Pipeline

📄️ Airflow - City Traffic Drone

📄️ Global Historical Climatology Network Daily Data Pipeline

📄️ Building End to end data pipeline in AWS

📄️ README

📄️ DigitalSkola

📄️ Disaster Response Pipeline

📄️ Funflix

📄️ Datalake Schema Correction

📄️ Log Analytics and Processing in Real-Time

📄️ Streaming ETL pipeline with Apache Flink and Amazon Kinesis Data Analytics

📄️ Kortex

📄️ Lufthansa API

📄️ Movie Review Sentiment Analysis Pipeline

📄️ Multi-touch Attribution

📄️ Building Recommender System from Scratch

📄️ Reddit Submissions, Authors and Subreddits analysis

📄️ AWS Kafka and DynamoDB for real time fraud detection

📄️ Data Pipeline with dbt, Airflow and Great Expectations

📄️ Smartcity

📄️ Sparkify

📄️ Spotify

📄️ Twitter data Topic Analysis and Realtime Sentiment Analysis

📄️ US Immigration analysis and data pipeline